paper/sections/experiments.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126

% \begin{figure}
% \includegraphics[scale=.4]{figures/ROC_curve.pdf}
% \caption{Precision-Recall curve Holme-Kim Model. 200 nodes, 16200 edges.}
% \end{figure}
% & \includegraphics[scale=.23]{figures/kronecker_l2_norm.pdf}
% & \includegraphics[scale=.23]{figures/kronecker_l2_norm_nonsparse.pdf}\\

%{\color{red} TODO:~add running time analysis, theoretical bounds on tested
%graphs, nbr of measurements vs.~number of cascades. One common metric for all
%types of graphs (possibly the least impressive improvement)}

\begin{table*}[t]
\centering
\begin{tabular}{l l l}
    \hspace{-0.5em}\includegraphics[scale=.28]{figures/barabasi_albert.pdf}
& \hspace{-1em}\includegraphics[scale=.28]{figures/watts_strogatz.pdf}
& \hspace{-3em}\includegraphics[scale=.28]{figures/ROC_curve.pdf}  \\

\hspace{-0.5em} (a) Barabasi-Albert (F$1$ \emph{vs.} $n$)
&\hspace{-1em} (b) Watts-Strogatz (F$1$ \emph{vs.} $n$)
&\hspace{-3em} (c) Holme-Kim (Prec-Recall) \\
 \hspace{-0.5em}\includegraphics[scale=.28]{figures/kronecker_l2_norm.pdf}
& \hspace{-1em}\includegraphics[scale=.28]{figures/kronecker_l2_norm_nonsparse.pdf}
& \hspace{-3em}\includegraphics[scale=.28]{figures/watts_strogatz_p_init.pdf} \\

(d) Sparse Kronecker ($\ell_2$-norm \emph{vs.} $n$) & (e) Non-sparse Kronecker
 ($\ell_2$-norm \emph{vs.} $n$) & (f) Watts-Strogatz (F$1$ \emph{vs.}
 $p_{\text{init}}$)
\end{tabular}
\captionof{figure}{Figures (a) and (b) report the F$1$-score in $\log$ scale for
  2 graphs as a function of the number of cascades $n$: (a) Barabasi-Albert
  graph, $300$ nodes, $16200$ edges. (b) Watts-Strogatz graph, $300$ nodes,
  $4500$ edges. Figure (c) plots the Precision-Recall curve for various values
  of $\lambda$ for a Holme-Kim graph ($200$ nodes, $9772$ edges). Figures (d)
  and (e) report the $\ell_2$-norm $\|\hat \Theta - \Theta\|_2$ for a Kronecker
graph which is: (d) exactly sparse (e) non-exactly sparse, as a function of the
number of cascades $n$. Figure (f) plots the F$1$-score for the Watts-Strogatz
graph as a function of $p_{init}$.}~\label{fig:four_figs}
\vspace{-2em}
\end{table*}

In this section, we validate empirically the results and assumptions of
Section~\ref{sec:results} for varying levels of sparsity and different
initializations of parameters ($n$, $m$, $\lambda$, $p_{\text{init}}$), where
$p_{\text{init}}$ is the initial probability of a node being a source node. We
compare our algorithm to two different state-of-the-art algorithms:
\textsc{greedy} and \textsc{mle} from~\cite{Netrapalli:2012}. As an extra
benchmark, we also introduce a new algorithm \textsc{lasso}, which approximates
our \textsc{sparse mle} algorithm.

\paragraph{Experimental setup}
We evaluate the performance of the algorithms on synthetic graphs, chosen for
their similarity to real social networks. We therefore consider a Watts-Strogatz
graph ($300$ nodes, $4500$ edges)~\cite{watts:1998}, a Barabasi-Albert graph
($300$ nodes, $16200$ edges)~\cite{barabasi:2001}, a Holme-Kim power law graph
($200$ nodes, $9772$ edges)~\cite{Holme:2002}, and the recently introduced
Kronecker graph ($256$ nodes, $10000$ edges)~\cite{Leskovec:2010}. Undirected
graphs are converted to directed graphs by doubling the edges.

For every reported data point, we sample edge weights and generate $n$ cascades
from the (IC) model for $n \in \{100, 500, 1000, 2000, 5000\}$. We compare for
each algorithm the estimated graph $\hat {\cal G}$ with ${\cal G}$. The initial
probability of a node being a source is fixed to $0.05$, i.e.\ an average of $15$
nodes source nodes per cascades for all experiments, except for
Figure~\label{fig:four_figs} (f). All edge weights are chosen uniformly in the
interval $[0.2, 0.7]$, except when testing for approximately sparse graphs (see
paragraph on robustness). Adjusting for the variance of our experiments, all
data points are reported with at most a $\pm 1$ error margin. The parameter
$\lambda$ is chosen to be of the order ${\cal O}(\sqrt{\log m / (\alpha n)})$.
We report our results as a function of the number of \emph{cascades} and not the
number of \emph{measurements}: in practice, very few cascades have depth
greater than 3.

\paragraph{Benchmarks}

We compare our \textsc{sparse mle} algorithm to 3 benchmarks: \textsc{greedy}
and \textsc{mle} from~\cite{Netrapalli:2012} and \textsc{lasso}. The
\textsc{mle} algorithm is a maximum-likelihood estimator without $\ell_1$-norm
penalization. \textsc{greedy} is an iterative algorithm. We introduced the
\textsc{lasso} algorithm in our experiments to achieve faster computation time:
$$\hat \theta_i \in \arg \min_{\theta} \sum_{t \in {\cal T}} |f(\theta_i\cdot
x^t) - x_i^{t+1}|^2 + \lambda \|\theta_i\|_1$$ \textsc{Lasso} has the merit of
being both easier and faster to optimize numerically than the other
convex-optimization based algorithms. It approximates the $\textsc{sparse mle}$
algorithm by making the assumption that the observations $x_i^{t+1}$ are of the
form: $x_i^{t+1} = f(\theta_i\cdot x^t) + \epsilon$, where $\epsilon$ is random
white noise. This is not valid in theory since $\epsilon$ \emph{depends on}
$f(\theta_i\cdot x^t)$, however the approximation is validated in practice.

We did not benchmark against other known algorithms (\textsc{netrate}
\cite{gomezbalduzzi:2011} and \textsc{first edge}~\cite{Abrahao:13}) due to the
discrete-time assumption. These algorithms also suppose a single-source model,
whereas \textsc{sparse mle}, \textsc{mle}, and \textsc{greedy} do not. Learning
the graph in the case of a multi-source cascade model is harder (see
Figure~\ref{fig:four_figs} (f)) but more realistic, since we rarely have access
to ``patient 0'' in practice.

\paragraph{Graph Estimation}
In the case of the \textsc{lasso}, \textsc{mle} and \textsc{sparse mle}
algorithms, we construct the edges of $\hat {\cal G} : \cup_{j \in V} \{(i,j) :
\Theta_{ij} > 0.1\}$, \emph{i.e} by thresholding. Finally, we report the
F1-score$=2
\text{precision}\cdot\text{recall}/(\text{precision}+\text{recall})$, which
considers \emph{(1)} the number of true edges recovered by the algorithm over
the total number of edges returned by the algorithm (\emph{precision}) and
\emph{(2)} the number of true edges recovered by the algorithm over the total
number of edges it should have recovered (\emph{recall}).  Over all experiments,
\textsc{sparse mle} achieves higher rates of precision, recall, and F1-score.
Interestingly, both \textsc{mle} and \textsc{sparse mle} perform exceptionally
well on the Watts-Strogatz graph.

\begin{comment}
    The recovery rate converges at around $5000$ cascades, which is more than
    $15$ times the number of nodes. By contrast, \textsc{sparse mle} achieves a
    reasonable F$1$-score of $.75$ for roughly $500$ observed cascades.
\end{comment}

\paragraph{Quantifying robustness}

The previous experiments only considered graphs with strong edges. To test the
algorithms in the approximately sparse case, we add sparse edges to the
previous graphs according to a bernoulli variable of parameter $1/3$ for every
non-edge, and drawing a weight uniformly from $[0,0.1]$. The non-sparse case is
compared to the sparse case in Figure~\ref{fig:four_figs} (d)--(e) for the
$\ell_2$ norm showing that both the \textsc{lasso}, followed by \textsc{sparse
mle} are the most robust to noise.