diff options
| author | jeanpouget-abadie <jean.pougetabadie@gmail.com> | 2015-02-05 11:40:49 -0500 |
|---|---|---|
| committer | jeanpouget-abadie <jean.pougetabadie@gmail.com> | 2015-02-05 11:40:49 -0500 |
| commit | dccbac2e2b99e46f64470ab7c23fe8f0f50ce9aa (patch) | |
| tree | 8a0fe67849c7b894b4fecb88df79ec0889369e71 /paper/sections/experiments.tex | |
| parent | 3f3630af30845ab7f305638f6cc2a80abb8435f5 (diff) | |
| download | cascades-dccbac2e2b99e46f64470ab7c23fe8f0f50ce9aa.tar.gz | |
small changes
Diffstat (limited to 'paper/sections/experiments.tex')
| -rw-r--r-- | paper/sections/experiments.tex | 22 |
1 files changed, 6 insertions, 16 deletions
diff --git a/paper/sections/experiments.tex b/paper/sections/experiments.tex index ceb2159..600cac3 100644 --- a/paper/sections/experiments.tex +++ b/paper/sections/experiments.tex @@ -17,9 +17,9 @@ In this section, we validate empirically the results and assumptions of Section~ \paragraph{Experimental setup} -We evaluate the performance of the algorithms on synthetic graphs, chosen for their similarity to real social networks. We therefore consider a Watts-Strogatz graph ($300$ nodes, $4500$ edges), a Barabasi-Albert graph ($300$ nodes, $16200$ edges), the Holme-Kim power law graph ($200$ nodes, $9772$ edges) \cite{Holme:2002}, and the recently introduced Kronecker graph ($256$ nodes, $10000$ edges) \cite{Leskovec:2010}. Undirected graphs are converted to directed graphs by doubling the edges. +We evaluate the performance of the algorithms on synthetic graphs, chosen for their similarity to real social networks. We therefore consider a Watts-Strogatz graph ($300$ nodes, $4500$ edges), a Barabasi-Albert graph ($300$ nodes, $16200$ edges), a Holme-Kim power law graph ($200$ nodes, $9772$ edges) \cite{Holme:2002}, and the recently introduced Kronecker graph ($256$ nodes, $10000$ edges) \cite{Leskovec:2010}. Undirected graphs are converted to directed graphs by doubling the edges. -For every reported data point, we generate $n$ cascades from the Independent Cascade model for $n \in \{100, 500, 1000, 2000, 5000\}$, and compute the estimated graph $\hat {\cal G}$ for each algorithm. The initial probability of a node being a source is fixed to $0.05$, i.e. an average of $15$ nodes source nodes per cascades for all experiments. All edge weights are chosen uniformly in the interval $[0.2, 0.7]$, except when testing for approximately sparse graphs (see paragraph on robustness). +For every reported data point, we generate $n$ cascades from the Independent Cascade model for $n \in \{100, 500, 1000, 2000, 5000\}$, and compare for each algorithm the estimated graph $\hat {\cal G}$ with ${\cal G}$. The initial probability of a node being a source is fixed to $0.05$, i.e. an average of $15$ nodes source nodes per cascades for all experiments. All edge weights are chosen uniformly in the interval $[0.2, 0.7]$, except when testing for approximately sparse graphs (see paragraph on robustness). All data points are reported with a $\pm 1$ error margin. From the set of nodes that are infected at each time step over the $n$ cascades, we attempt to recover the graph ${\cal G}$. In the case of the \textsc{lasso}, \textsc{mle} and \textsc{sparse mle} algorithms, we construct the edges of $\hat {\cal G} : \cup_{j \in V} \{i : \Theta_{ij} > 0.1\}$, \emph{i.e} by thresholding. The parameter $\lambda$ is chosen to be of the order ${\cal O}(\sqrt{\log m / (\alpha n)})$. The true positives are the edges which appear both in ${\cal G}$ and $\hat {\cal G}$ and the true negatives are the edges which appear in neither. Finally, we report the F1-score$=2 \text{precision}\cdot\text{recall}/(\text{precision}+\text{recall})$, which considers the number of true edges recovered by the algorithm over the total number of edges returned by the algorithm (\emph{precision}) with the number of true edges recovered by the algorithm over the total number of edges it should have recovered (\emph{recall}). @@ -29,21 +29,11 @@ Over all experiments, \textsc{sparse mle} achieves higher rates of precision, re \paragraph{The \textsc{lasso} algorithm} -To achieve faster computation time, we can consider the following program: -$\hat \theta \in \arg \min \|\|_2$. This algorithm has the merit of being easier and faster to optimize numerically than the other algorithms. - -\begin{itemize} -\item Continuous time + single source -> no comparison with NETRATE... -\item This is a modeling decision: in practice, we do not have access to the cascade's ``patient 0'', which makes learning the graph harder. -\item Lasso -\end{itemize} +To achieve faster computation time, we considered the following program: +$$\hat \theta_i \in \arg \min_{\theta} \sum_{t \in {\cal T}} |f(\theta_i\cdot x^t) - x_i^{t+1}|^2 + \lambda \|\theta_i\|_1$$ +This algorithm, which we name \textsc{Lasso}, has the merit of being both easier and faster to optimize numerically than the other convex-optimization based algorithms. It is also highly robust to approximately-sparse graphs. It approximates the $\textsc{sparse mle}$ algorithm by making the assumption that the observations $x_i^{t+1}$ are of the form: $x_i^{t+1} = f(\theta_i\cdot x^t) + \epsilon$, where $\epsilon$ is random white noise. This is not valid in theory since $\epsilon$ \emph{depends on} $f(\theta_i\cdot x^t)$, however the approximation is validated in practice. \paragraph{Quantifying robustness} - -\begin{itemize} -\item Compare edge recovery to First Edge and Greedy as number of cascades grows on different types of graphs... -\item Plot upper-bound of parameters in graph as number of cascades grows -\item Show robustness: plot recall and F1 score for graphs with different number of small edges + "corrected recall' etc...how many strong edges do we get, how many edges to we get in general? -\end{itemize}
\ No newline at end of file +The previous experiments only considered graphs with strong edges. To test the algorithms in the approximately sparse case, we add sparse edges to the previous graphs according to a bernoulli variable of parameter $1/3$ for every non-edge, and drawing a weight uniformly from $[0,0.1]$. The results are reported in Figure XXX by plotting the convergence of the $\ell2$-norm error, and show that both the \textsc{lasso}, followed by \textsc{sparse mle} are the most robust to noise.
\ No newline at end of file |
