2 files changed, 25 insertions, 10 deletions
diff --git a/paper/sections/experiments.tex b/paper/sections/experiments.tex
index 3891526..7cb852c 100644
--- a/paper/sections/experiments.tex
+++ b/paper/sections/experiments.tex
@@ -17,25 +17,25 @@
 \label{fig:four_figs}
 \end{table*}
 
-In this section, we validate empirically the results and assumptions of Section~\ref{sec:results} for different initializations of parameters ($n$, $m$, $\lambda$) and for varying levels of sparsity. We compare our algorithm to two different state-of-the-art algorithms: \textsc{greedy} and \textsc{mle} from \cite{Netrapalli:2012}. As an extra benchmark, we also introduce a new algorithm \textsc{lasso}, which approximates our \textsc{sparse mle} algorithm. We find empirically that \textsc{lasso} is highly robust, and can be computed more efficiently than both \textsc{mle} and \textsc{sparse mle} without sacrificing for performance.
+In this section, we validate empirically the results and assumptions of Section~\ref{sec:results} for different initializations of parameters ($n$, $m$, $\lambda$) and for varying levels of sparsity. We compare our algorithm to two different state-of-the-art algorithms: \textsc{greedy} and \textsc{mle} from \cite{Netrapalli:2012}. As an extra benchmark, we also introduce a new algorithm \textsc{lasso}, which approximates our \textsc{sparse mle} algorithm.
 
 \paragraph{Experimental setup}
 We evaluate the performance of the algorithms on synthetic graphs, chosen for their similarity to real social networks. We therefore consider a Watts-Strogatz graph ($300$ nodes, $4500$ edges) \cite{watts:1998}, a Barabasi-Albert graph ($300$ nodes, $16200$ edges) \cite{barabasi:2001}, a Holme-Kim power law graph ($200$ nodes, $9772$ edges) \cite{Holme:2002}, and the recently introduced Kronecker graph ($256$ nodes, $10000$ edges) \cite{Leskovec:2010}. Undirected graphs are converted to directed graphs by doubling the edges.
 
-For every reported data point, we generate $n$ cascades from the Independent Cascade model for $n \in \{100, 500, 1000, 2000, 5000\}$, and compare for each algorithm the estimated graph $\hat {\cal G}$ with ${\cal G}$. The initial probability of a node being a source is fixed to $0.05$, i.e. an average of $15$ nodes source nodes per cascades for all experiments. All edge weights are chosen uniformly in the interval $[0.2, 0.7]$, except when testing for approximately sparse graphs (see paragraph on robustness). All data points are reported with a $\pm 1$ error margin.
+For every reported data point, we sample edge weights and generate $n$ cascades from the Independent Cascade model for $n \in \{100, 500, 1000, 2000, 5000\}$. We compare for each algorithm the estimated graph $\hat {\cal G}$ with ${\cal G}$. The initial probability of a node being a source is fixed to $0.05$, i.e. an average of $15$ nodes source nodes per cascades for all experiments. All edge weights are chosen uniformly in the interval $[0.2, 0.7]$, except when testing for approximately sparse graphs (see paragraph on robustness). Adjusting for the variance of our experiments, all data points are reported with at most a $\pm 1$ error margin. The parameter $\lambda$ is chosen to be of the order ${\cal O}(\sqrt{\log m / (\alpha n)})$.
 
-From the set of nodes that are infected at each time step over the $n$ cascades, we attempt to recover the graph ${\cal G}$. In the case of the \textsc{lasso}, \textsc{mle} and \textsc{sparse mle} algorithms, we construct the edges of $\hat {\cal G} : \cup_{j \in V} \{i : \Theta_{ij} > 0.1\}$, \emph{i.e} by thresholding. The parameter $\lambda$ is chosen to be of the order ${\cal O}(\sqrt{\log m / (\alpha n)})$. The true positives are the edges which appear both in ${\cal G}$ and $\hat {\cal G}$ and the true negatives are the edges which appear in neither. Finally, we report the F1-score$=2 \text{precision}\cdot\text{recall}/(\text{precision}+\text{recall})$, which considers the number of true edges recovered by the algorithm over the total number of edges returned by the algorithm (\emph{precision}) with the number of true edges recovered by the algorithm over the total number of edges it should have recovered (\emph{recall}).
+\paragraph{Benchmarks}
 
-\paragraph{Comparison with other algorithms}
-
-Over all experiments, \textsc{sparse mle} achieves higher rates of precision, recall, and f1-score. \textsc{sparse mle} is also robust to approximate sparsity, and displays a faster convergence of the $\ell2$-norm than any other benchmark. Interestingly, both \textsc{mle} and \textsc{sparse mle} perform exceptionally well on the Watts-Strogatz graph. The recovery rate converges at around $5000$ cascades, which is more than $15$ times the number of nodes. By contrast, \textsc{sparse mle} achieves a reasonable $f1$-score of $.75$ for roughly $500$ observed cascades. We did not benchmark against other known algorithms (\textsc{netrate} and \textsc{first edge}) due to the discrete time assumption. These algorithms also suppose a single-source model, whereas \textsc{sparse mle}, \textsc{mle}, and \textsc{greedy} do not. Learning the graph in the case of a multi-source cascade model is intuitively harder but more realistic, since we rarely have access to ``patient 0'' in practice.
+We compare our \textsc{sparse mle} algorithm to 3 benchmarks: \textsc{greedy} and \textsc{mle} from \cite{Netrapalli:2012} and \textsc{lasso}. The \textsc{mle} algorithm is a maximum-likelihood estimator without $\ell1$-norm penalization. \textsc{greedy} is an iterative algorithm. We introduced the \textsc{lasso} algorithm in our experiments to achieve faster computation time:
+$$\hat \theta_i \in \arg \min_{\theta} \sum_{t \in {\cal T}} |f(\theta_i\cdot x^t) - x_i^{t+1}|^2 + \lambda \|\theta_i\|_1$$
+\textsc{Lasso} has the merit of being both easier and faster to optimize numerically than the other convex-optimization based algorithms. It approximates the $\textsc{sparse mle}$ algorithm by making the assumption that the observations $x_i^{t+1}$ are of the form: $x_i^{t+1} = f(\theta_i\cdot x^t) + \epsilon$, where $\epsilon$ is random white noise. This is not valid in theory since $\epsilon$ \emph{depends on} $f(\theta_i\cdot x^t)$, however the approximation is validated in practice.
 
-\paragraph{The \textsc{lasso} algorithm}
+We did not benchmark against other known algorithms (\textsc{netrate} \cite{gomezbalduzzi:2011} and \textsc{first edge} \cite{Abrahao:13}) due to the discrete time assumption. These algorithms also suppose a single-source model, whereas \textsc{sparse mle}, \textsc{mle}, and \textsc{greedy} do not. Learning the graph in the case of a multi-source cascade model is intuitively harder but more realistic, since we rarely have access to ``patient 0'' in practice.
 
-To achieve faster computation time, we considered the following program:
-$$\hat \theta_i \in \arg \min_{\theta} \sum_{t \in {\cal T}} |f(\theta_i\cdot x^t) - x_i^{t+1}|^2 + \lambda \|\theta_i\|_1$$
-This algorithm, which we name \textsc{Lasso}, has the merit of being both easier and faster to optimize numerically than the other convex-optimization based algorithms. It is also highly robust to approximately-sparse graphs. It approximates the $\textsc{sparse mle}$ algorithm by making the assumption that the observations $x_i^{t+1}$ are of the form: $x_i^{t+1} = f(\theta_i\cdot x^t) + \epsilon$, where $\epsilon$ is random white noise. This is not valid in theory since $\epsilon$ \emph{depends on} $f(\theta_i\cdot x^t)$, however the approximation is validated in practice.
+\paragraph{Graph Estimation}
+In the case of the \textsc{lasso}, \textsc{mle} and \textsc{sparse mle} algorithms, we construct the edges of $\hat {\cal G} : \cup_{j \in V} \{i : \Theta_{ij} > 0.1\}$, \emph{i.e} by thresholding. The true positives are the edges which appear both in ${\cal G}$ and $\hat {\cal G}$ and the true negatives are the edges which appear in neither. Finally, we report the F1-score$=2 \text{precision}\cdot\text{recall}/(\text{precision}+\text{recall})$, which considers the number of true edges recovered by the algorithm over the total number of edges returned by the algorithm (\emph{precision}) with the number of true edges recovered by the algorithm over the total number of edges it should have recovered (\emph{recall}).
 
+Over all experiments, \textsc{sparse mle} achieves higher rates of precision, recall, and f1-score. \textsc{sparse mle} is also robust to approximate sparsity, and displays a faster convergence of the $\ell2$-norm than any other benchmark. Interestingly, both \textsc{mle} and \textsc{sparse mle} perform exceptionally well on the Watts-Strogatz graph. The recovery rate converges at around $5000$ cascades, which is more than $15$ times the number of nodes. By contrast, \textsc{sparse mle} achieves a reasonable $f1$-score of $.75$ for roughly $500$ observed cascades. 
 
 \paragraph{Quantifying robustness}
 
diff --git a/paper/sparse.bib b/paper/sparse.bib
index 6cebf03..1de78d6 100644
--- a/paper/sparse.bib
+++ b/paper/sparse.bib
@@ -400,3 +400,18 @@ year = "2009"
   biburl    = {http://dblp.uni-trier.de/rec/bib/journals/corr/cond-mat-0106096},
   bibsource = {dblp computer science bibliography, http://dblp.org}
 }
+
+
+@article{gomezbalduzzi:2011,
+  author    = {Manuel Gomez{-}Rodriguez and
+               David Balduzzi and
+               Bernhard Sch{\"{o}}lkopf},
+  title     = {Uncovering the Temporal Dynamics of Diffusion Networks},
+  journal   = {CoRR},
+  volume    = {abs/1105.0697},
+  year      = {2011},
+  url       = {http://arxiv.org/abs/1105.0697},
+  timestamp = {Mon, 05 Dec 2011 18:05:23 +0100},
+  biburl    = {http://dblp.uni-trier.de/rec/bib/journals/corr/abs-1105-0697},
+  bibsource = {dblp computer science bibliography, http://dblp.org}
+}