paper/sections/experiments.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

\begin{figure}
\includegraphics[scale=.4]{figures/ROC_curve.pdf}
\caption{Precision-Recall curve Holme-Kim Model. 200 nodes, 16200 edges.}
\end{figure}

\begin{figure}
\includegraphics[scale=.4]{figures/watts_strogatz.pdf}
\caption{Watts-Strogatz Model. 200 nodes, 20000 edges.}
\end{figure}

In this section, we validate empirically the results and assumptions of Section~\ref{sec:results} for different initializations of parameters ($n$, $m$, $\lambda$) and for varying levels of sparsity. We compare our algorithm to two different state-of-the-art algorithms: \textsc{Greedy} and \textsc{MLE} from \cite{Netrapalli:2012}. We introduce an approximate \textsc{Lasso} algorithm and validate its effectiveness empirically.

\paragraph{Experimental setup}

We focus our analysis to synthetic graphs, which are known to exhibit the properties of real social networks. We therefore consider a Watts-Strogatz graph ($300$ nodes, $4500$ edges), a Barabasi-Albert graph ($300$ nodes, $16200$ edges), the Holme-Kim power law graph ($200$ nodes, $9772$ edges) \cite{Holme:2002}, and the recently introduced Kronecker graph ($256$ nodes, $10000$ edges) \cite{Leskovec:2010}. For every reported data point, we generate $n$ cascades from the Independent Cascade model for $n \in \{100, 500, 1000, 2000, 5000\}$, and compute the estimated graph $\hat {\cal G}$ for each algorithm. 

In the case of the \textsc{Lasso}, \textsc{MLE} and \textsc{Sparse MLE} algorithms, we construct the edges of $\hat {\cal G} : \cup_{j \in V} \{i : \Theta_{ij} > 0.1\}$. We report the F1-score$=2\frac{\text{precision}\cdot\text{recall}}{\text{precision}+\text{recall}}$. For each graph, the initial probability of a node being a source is fixed to $0.05$, i.e. an average of $15$ nodes source nodes per cascades.  Except when testing for approximately sparse graphs, all edge weights are chosen uniformly in the interval $[0.2, 0.7]$.

\paragraph{Comparison with other algorithms}
\begin{itemize}
\item Continuous time + single source -> no comparison with NETRATE...
\item This is a modeling decision: in practice, we do not have access to the cascade's ``patient 0'', which makes learning the graph harder.
\item Lasso
\end{itemize}


\paragraph{Quantifying robustness}


\begin{itemize}
\item Compare edge recovery to First Edge and Greedy as number of cascades grows on different types of graphs...
\item Plot upper-bound of parameters in graph as number of cascades grows
\item Show robustness: plot recall and F1 score for graphs with different number of small edges + "corrected recall' etc...how many strong edges do we get, how many edges to we get in general?
\end{itemize}