2 files changed, 20 insertions, 7 deletions
diff --git a/paper/sections/experiments.tex b/paper/sections/experiments.tex
index 4876927..ceb2159 100644
--- a/paper/sections/experiments.tex
+++ b/paper/sections/experiments.tex
@@ -8,15 +8,30 @@
 \caption{Watts-Strogatz Model. 200 nodes, 20000 edges.}
 \end{figure}
 
-In this section, we validate empirically the results and assumptions of Section~\ref{sec:results} for different initializations of parameters ($n$, $m$, $\lambda$) and for varying levels of sparsity. We compare our algorithm to two different state-of-the-art algorithms: \textsc{Greedy} and \textsc{MLE} from \cite{Netrapalli:2012}. We introduce an approximate \textsc{Lasso} algorithm and validate its effectiveness empirically.
+\begin{figure}
+\includegraphics[scale=.4]{figures/barabasi_albert.pdf}
+\caption{Barabasi Model.}
+\end{figure}
+
+In this section, we validate empirically the results and assumptions of Section~\ref{sec:results} for different initializations of parameters ($n$, $m$, $\lambda$) and for varying levels of sparsity. We compare our algorithm to two different state-of-the-art algorithms: \textsc{greedy} and \textsc{mle} from \cite{Netrapalli:2012}. As an extra benchmark, we also introduce a new algorithm \textsc{lasso}, which approximates our \textsc{sparse mle} algorithm. We find empirically that \textsc{lasso} is highly robust, and can be computed more efficiently than both \textsc{mle} and \textsc{sparse mle} without sacrificing for performance.
 
 \paragraph{Experimental setup}
 
-We focus our analysis to synthetic graphs, which are known to exhibit the properties of real social networks. We therefore consider a Watts-Strogatz graph ($300$ nodes, $4500$ edges), a Barabasi-Albert graph ($300$ nodes, $16200$ edges), the Holme-Kim power law graph ($200$ nodes, $9772$ edges) \cite{Holme:2002}, and the recently introduced Kronecker graph ($256$ nodes, $10000$ edges) \cite{Leskovec:2010}. For every reported data point, we generate $n$ cascades from the Independent Cascade model for $n \in \{100, 500, 1000, 2000, 5000\}$, and compute the estimated graph $\hat {\cal G}$ for each algorithm. 
+We evaluate the performance of the algorithms on synthetic graphs, chosen for their similarity to real social networks. We therefore consider a Watts-Strogatz graph ($300$ nodes, $4500$ edges), a Barabasi-Albert graph ($300$ nodes, $16200$ edges), the Holme-Kim power law graph ($200$ nodes, $9772$ edges) \cite{Holme:2002}, and the recently introduced Kronecker graph ($256$ nodes, $10000$ edges) \cite{Leskovec:2010}. Undirected graphs are converted to directed graphs by doubling the edges.
 
-In the case of the \textsc{Lasso}, \textsc{MLE} and \textsc{Sparse MLE} algorithms, we construct the edges of $\hat {\cal G} : \cup_{j \in V} \{i : \Theta_{ij} > 0.1\}$. We report the F1-score$=2\frac{\text{precision}\cdot\text{recall}}{\text{precision}+\text{recall}}$. For each graph, the initial probability of a node being a source is fixed to $0.05$, i.e. an average of $15$ nodes source nodes per cascades.  Except when testing for approximately sparse graphs, all edge weights are chosen uniformly in the interval $[0.2, 0.7]$.
+For every reported data point, we generate $n$ cascades from the Independent Cascade model for $n \in \{100, 500, 1000, 2000, 5000\}$, and compute the estimated graph $\hat {\cal G}$ for each algorithm. The initial probability of a node being a source is fixed to $0.05$, i.e. an average of $15$ nodes source nodes per cascades for all experiments. All edge weights are chosen uniformly in the interval $[0.2, 0.7]$, except when testing for approximately sparse graphs (see paragraph on robustness).
+
+From the set of nodes that are infected at each time step over the $n$ cascades, we attempt to recover the graph ${\cal G}$. In the case of the \textsc{lasso}, \textsc{mle} and \textsc{sparse mle} algorithms, we construct the edges of $\hat {\cal G} : \cup_{j \in V} \{i : \Theta_{ij} > 0.1\}$, \emph{i.e} by thresholding. The parameter $\lambda$ is chosen to be of the order ${\cal O}(\sqrt{\log m / (\alpha n)})$. The true positives are the edges which appear both in ${\cal G}$ and $\hat {\cal G}$ and the true negatives are the edges which appear in neither. Finally, we report the F1-score$=2 \text{precision}\cdot\text{recall}/(\text{precision}+\text{recall})$, which considers the number of true edges recovered by the algorithm over the total number of edges returned by the algorithm (\emph{precision}) with the number of true edges recovered by the algorithm over the total number of edges it should have recovered (\emph{recall}).
 
 \paragraph{Comparison with other algorithms}
+
+Over all experiments, \textsc{sparse mle} achieves higher rates of precision, recall, and f1-score. \textsc{sparse mle} is also robust to approximate sparsity, and displays a faster convergence of the $\ell2$-norm than any other benchmark. Interestingly, both \textsc{mle} and \textsc{sparse mle} perform exceptionally well on the Watts-Strogatz graph. The recovery rate converges at around $5000$ cascades, which is more than $15$ times the number of nodes. By contrast, \textsc{sparse mle} achieves a reasonable $f1$-score of $.75$ for roughly $500$ observed cascades. We did not benchmark against other known algorithms (\textsc{netrate} and \textsc{first edge}) due to the discrete time assumption. These algorithms also suppose a single-source model, whereas \textsc{sparse mle}, \textsc{mle}, and \textsc{greedy} do not. Learning the graph in the case of a multi-source cascade model is intuitively harder but more realistic, since we rarely have access to ``patient 0'' in practice.
+
+\paragraph{The \textsc{lasso} algorithm}
+
+To achieve faster computation time, we can consider the following program:
+$\hat \theta \in \arg \min \|\|_2$. This algorithm has the merit of being easier and faster to optimize numerically than the other algorithms.
+
 \begin{itemize}
 \item Continuous time + single source -> no comparison with NETRATE...
 \item This is a modeling decision: in practice, we do not have access to the cascade's ``patient 0'', which makes learning the graph harder.
@@ -27,7 +42,6 @@ In the case of the \textsc{Lasso}, \textsc{MLE} and \textsc{Sparse MLE} algorith
 \paragraph{Quantifying robustness}
 
 
-
 \begin{itemize}
 \item Compare edge recovery to First Edge and Greedy as number of cascades grows on different types of graphs...
 \item Plot upper-bound of parameters in graph as number of cascades grows
diff --git a/paper/sections/results.tex b/paper/sections/results.tex
index 6fba6b2..cd0a5f8 100644
--- a/paper/sections/results.tex
+++ b/paper/sections/results.tex
@@ -20,8 +20,7 @@ a set $\mathcal{T}$ of observations. We will write $n\defeq |\mathcal{T}|$.
 \label{sec:main_theorem}
 
 In this section, we analyze the case where $\theta^*$ is exactly sparse. We
-write $S\defeq\text{supp}(\theta^*)$ and $s=|S|$. Our main theorem will rely
-on the now standard \emph{restricted eigenvalue condition}.
+write $S\defeq\text{supp}(\theta^*)$ and $s=|S|$. In our context, $S$ is the set of all nodes susceptible to influence our node $i$. In other words, if $\theta^*$ is exactly $s$-sparse, then node $i$ has at most $s$ parents. Our main theorem will rely on the now standard \emph{restricted eigenvalue condition}.
 
 \begin{definition}
     Let $\Sigma\in\mathcal{S}_m(\reals)$ be a real symmetric matrix and $S$ be
@@ -55,7 +54,7 @@ In the voter model, $\frac{f'}{f}(z) = \frac{1}{z}$ and $\frac{f'}{f}(1-z)
 1-\alpha$ for all $(i,j)\in E$. Similarly, in the Independent Cascade Model,
 $\frac{f'}{f}(z) = \frac{1}{1-e^z}$ and $\frac{f'}{1-f}(z) = -1$ and (LF) holds
 if $p_{i, j}\leq 1-\alpha$ for all $(i, j)\in E$. Remember that in this case we
-have $\Theta_{i,j} = \log(1-p_{i,j})$.
+have $\Theta_{i,j} = \log(1-p_{i,j})$. These assumptions are reasonable: if an edge has a weight very close to 0, then the ``infection'' will never happen along that edge for our set of observations and we can never hope to recover it.
 
 \begin{theorem}
 \label{thm:main}