aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorjeanpouget-abadie <jean.pougetabadie@gmail.com>2015-02-05 11:40:49 -0500
committerjeanpouget-abadie <jean.pougetabadie@gmail.com>2015-02-05 11:40:49 -0500
commitdccbac2e2b99e46f64470ab7c23fe8f0f50ce9aa (patch)
tree8a0fe67849c7b894b4fecb88df79ec0889369e71
parent3f3630af30845ab7f305638f6cc2a80abb8435f5 (diff)
downloadcascades-dccbac2e2b99e46f64470ab7c23fe8f0f50ce9aa.tar.gz
small changes
-rw-r--r--paper/sections/experiments.tex22
-rw-r--r--src/make_plots.py2
2 files changed, 7 insertions, 17 deletions
diff --git a/paper/sections/experiments.tex b/paper/sections/experiments.tex
index ceb2159..600cac3 100644
--- a/paper/sections/experiments.tex
+++ b/paper/sections/experiments.tex
@@ -17,9 +17,9 @@ In this section, we validate empirically the results and assumptions of Section~
\paragraph{Experimental setup}
-We evaluate the performance of the algorithms on synthetic graphs, chosen for their similarity to real social networks. We therefore consider a Watts-Strogatz graph ($300$ nodes, $4500$ edges), a Barabasi-Albert graph ($300$ nodes, $16200$ edges), the Holme-Kim power law graph ($200$ nodes, $9772$ edges) \cite{Holme:2002}, and the recently introduced Kronecker graph ($256$ nodes, $10000$ edges) \cite{Leskovec:2010}. Undirected graphs are converted to directed graphs by doubling the edges.
+We evaluate the performance of the algorithms on synthetic graphs, chosen for their similarity to real social networks. We therefore consider a Watts-Strogatz graph ($300$ nodes, $4500$ edges), a Barabasi-Albert graph ($300$ nodes, $16200$ edges), a Holme-Kim power law graph ($200$ nodes, $9772$ edges) \cite{Holme:2002}, and the recently introduced Kronecker graph ($256$ nodes, $10000$ edges) \cite{Leskovec:2010}. Undirected graphs are converted to directed graphs by doubling the edges.
-For every reported data point, we generate $n$ cascades from the Independent Cascade model for $n \in \{100, 500, 1000, 2000, 5000\}$, and compute the estimated graph $\hat {\cal G}$ for each algorithm. The initial probability of a node being a source is fixed to $0.05$, i.e. an average of $15$ nodes source nodes per cascades for all experiments. All edge weights are chosen uniformly in the interval $[0.2, 0.7]$, except when testing for approximately sparse graphs (see paragraph on robustness).
+For every reported data point, we generate $n$ cascades from the Independent Cascade model for $n \in \{100, 500, 1000, 2000, 5000\}$, and compare for each algorithm the estimated graph $\hat {\cal G}$ with ${\cal G}$. The initial probability of a node being a source is fixed to $0.05$, i.e. an average of $15$ nodes source nodes per cascades for all experiments. All edge weights are chosen uniformly in the interval $[0.2, 0.7]$, except when testing for approximately sparse graphs (see paragraph on robustness). All data points are reported with a $\pm 1$ error margin.
From the set of nodes that are infected at each time step over the $n$ cascades, we attempt to recover the graph ${\cal G}$. In the case of the \textsc{lasso}, \textsc{mle} and \textsc{sparse mle} algorithms, we construct the edges of $\hat {\cal G} : \cup_{j \in V} \{i : \Theta_{ij} > 0.1\}$, \emph{i.e} by thresholding. The parameter $\lambda$ is chosen to be of the order ${\cal O}(\sqrt{\log m / (\alpha n)})$. The true positives are the edges which appear both in ${\cal G}$ and $\hat {\cal G}$ and the true negatives are the edges which appear in neither. Finally, we report the F1-score$=2 \text{precision}\cdot\text{recall}/(\text{precision}+\text{recall})$, which considers the number of true edges recovered by the algorithm over the total number of edges returned by the algorithm (\emph{precision}) with the number of true edges recovered by the algorithm over the total number of edges it should have recovered (\emph{recall}).
@@ -29,21 +29,11 @@ Over all experiments, \textsc{sparse mle} achieves higher rates of precision, re
\paragraph{The \textsc{lasso} algorithm}
-To achieve faster computation time, we can consider the following program:
-$\hat \theta \in \arg \min \|\|_2$. This algorithm has the merit of being easier and faster to optimize numerically than the other algorithms.
-
-\begin{itemize}
-\item Continuous time + single source -> no comparison with NETRATE...
-\item This is a modeling decision: in practice, we do not have access to the cascade's ``patient 0'', which makes learning the graph harder.
-\item Lasso
-\end{itemize}
+To achieve faster computation time, we considered the following program:
+$$\hat \theta_i \in \arg \min_{\theta} \sum_{t \in {\cal T}} |f(\theta_i\cdot x^t) - x_i^{t+1}|^2 + \lambda \|\theta_i\|_1$$
+This algorithm, which we name \textsc{Lasso}, has the merit of being both easier and faster to optimize numerically than the other convex-optimization based algorithms. It is also highly robust to approximately-sparse graphs. It approximates the $\textsc{sparse mle}$ algorithm by making the assumption that the observations $x_i^{t+1}$ are of the form: $x_i^{t+1} = f(\theta_i\cdot x^t) + \epsilon$, where $\epsilon$ is random white noise. This is not valid in theory since $\epsilon$ \emph{depends on} $f(\theta_i\cdot x^t)$, however the approximation is validated in practice.
\paragraph{Quantifying robustness}
-
-\begin{itemize}
-\item Compare edge recovery to First Edge and Greedy as number of cascades grows on different types of graphs...
-\item Plot upper-bound of parameters in graph as number of cascades grows
-\item Show robustness: plot recall and F1 score for graphs with different number of small edges + "corrected recall' etc...how many strong edges do we get, how many edges to we get in general?
-\end{itemize} \ No newline at end of file
+The previous experiments only considered graphs with strong edges. To test the algorithms in the approximately sparse case, we add sparse edges to the previous graphs according to a bernoulli variable of parameter $1/3$ for every non-edge, and drawing a weight uniformly from $[0,0.1]$. The results are reported in Figure XXX by plotting the convergence of the $\ell2$-norm error, and show that both the \textsc{lasso}, followed by \textsc{sparse mle} are the most robust to noise. \ No newline at end of file
diff --git a/src/make_plots.py b/src/make_plots.py
index 81d581c..d83eb2c 100644
--- a/src/make_plots.py
+++ b/src/make_plots.py
@@ -172,4 +172,4 @@ if __name__=="__main__":
passed_function=
#convex_optimization.sparse_recovery,
convex_optimization.type_lasso,
- sparse_edges=True) \ No newline at end of file
+ sparse_edges=False) \ No newline at end of file