diff options
Diffstat (limited to 'paper/sections/experiments.tex')
| -rw-r--r-- | paper/sections/experiments.tex | 105 |
1 files changed, 81 insertions, 24 deletions
diff --git a/paper/sections/experiments.tex b/paper/sections/experiments.tex index e641992..b60fe44 100644 --- a/paper/sections/experiments.tex +++ b/paper/sections/experiments.tex @@ -5,6 +5,10 @@ % & \includegraphics[scale=.23]{figures/kronecker_l2_norm.pdf} % & \includegraphics[scale=.23]{figures/kronecker_l2_norm_nonsparse.pdf}\\ +{\color{red} TODO:~add running time analysis, theoretical bounds on tested +graphs, nbr of measurements vs.~number of cascades. One common metric for all +types of graphs (possibly the least impressive improvement)} + \begin{table*}[t] \centering \begin{tabular}{l l l} @@ -12,46 +16,99 @@ & \hspace{-1em}\includegraphics[scale=.28]{figures/watts_strogatz.pdf} & \hspace{-3em}\includegraphics[scale=.28]{figures/ROC_curve.pdf} \\ -\hspace{-0.5em}(a) Barabasi-Albert (F$1$ \emph{vs.} $n$) -&\hspace{-1em} (b) Watts-Strogatz (F$1$ \emph{vs.} $n$) +\hspace{-0.5em} (a) Barabasi-Albert (F$1$ \emph{vs.} $n$) +&\hspace{-1em} (b) Watts-Strogatz (F$1$ \emph{vs.} $n$) &\hspace{-3em} (c) Holme-Kim (Prec-Recall) \\ \hspace{-0.5em}\includegraphics[scale=.28]{figures/kronecker_l2_norm.pdf} & \hspace{-1em}\includegraphics[scale=.28]{figures/kronecker_l2_norm_nonsparse.pdf} & \hspace{-3em}\includegraphics[scale=.28]{figures/watts_strogatz_p_init.pdf} \\ -(d) Sparse Kronecker ($\ell_2$-norm \emph{vs.} $n$ ) & (e) Non-sparse Kronecker ($\ell_2$-norm \emph{vs.} $n$) & (f) Watts-Strogatz (F$1$ \emph{vs.} $p_{\text{init}}$) +(d) Sparse Kronecker ($\ell_2$-norm \emph{vs.} $n$) & (e) Non-sparse Kronecker + ($\ell_2$-norm \emph{vs.} $n$) & (f) Watts-Strogatz (F$1$ \emph{vs.} + $p_{\text{init}}$) \end{tabular} -\captionof{figure}{Figures (a) and (b) report the F$1$-score in $\log$ scale -for 2 graphs as a function of the number of cascades $n$: (a) Barabasi-Albert graph, $300$ nodes, $16200$ edges. (b) -Watts-Strogatz graph, $300$ nodes, $4500$ edges. Figure (c) plots the Precision-Recall curve for various values of $\lambda$ for a Holme-Kim graph ($200$ nodes, $9772$ edges). Figures (d) and (e) report the -$\ell_2$-norm $\|\hat \Theta - \Theta\|_2$ for a Kronecker graph which is: (d) exactly sparse (e) non-exactly sparse, as a function of the number of cascades $n$. Figure (f) plots the F$1$-score for the Watts-Strogatz graph as a function of $p_{init}$.} -\label{fig:four_figs} +\captionof{figure}{Figures (a) and (b) report the F$1$-score in $\log$ scale for + 2 graphs as a function of the number of cascades $n$: (a) Barabasi-Albert + graph, $300$ nodes, $16200$ edges. (b) Watts-Strogatz graph, $300$ nodes, + $4500$ edges. Figure (c) plots the Precision-Recall curve for various values + of $\lambda$ for a Holme-Kim graph ($200$ nodes, $9772$ edges). Figures (d) + and (e) report the $\ell_2$-norm $\|\hat \Theta - \Theta\|_2$ for a Kronecker +graph which is: (d) exactly sparse (e) non-exactly sparse, as a function of the +number of cascades $n$. Figure (f) plots the F$1$-score for the Watts-Strogatz +graph as a function of $p_{init}$.}~\label{fig:four_figs} \end{table*} -In this section, we validate empirically the results and assumptions of Section~\ref{sec:results} for varying levels of sparsity and different initializations of parameters ($n$, $m$, $\lambda$, $p_{\text{init}}$), where $p_{\text{init}}$ is the initial probability of a node being a source node. We compare our algorithm to two different state-of-the-art algorithms: \textsc{greedy} and \textsc{mle} from \cite{Netrapalli:2012}. As an extra benchmark, we also introduce a new algorithm \textsc{lasso}, which approximates our \textsc{sparse mle} algorithm. +In this section, we validate empirically the results and assumptions of +Section~\ref{sec:results} for varying levels of sparsity and different +initializations of parameters ($n$, $m$, $\lambda$, $p_{\text{init}}$), where +$p_{\text{init}}$ is the initial probability of a node being a source node. We +compare our algorithm to two different state-of-the-art algorithms: +\textsc{greedy} and \textsc{mle} from~\cite{Netrapalli:2012}. As an extra +benchmark, we also introduce a new algorithm \textsc{lasso}, which approximates +our \textsc{sparse mle} algorithm. \paragraph{Experimental setup} -We evaluate the performance of the algorithms on synthetic graphs, chosen for their similarity to real social networks. We therefore consider a Watts-Strogatz graph ($300$ nodes, $4500$ edges) \cite{watts:1998}, a Barabasi-Albert graph ($300$ nodes, $16200$ edges) \cite{barabasi:2001}, a Holme-Kim power law graph ($200$ nodes, $9772$ edges) \cite{Holme:2002}, and the recently introduced Kronecker graph ($256$ nodes, $10000$ edges) \cite{Leskovec:2010}. Undirected graphs are converted to directed graphs by doubling the edges. +We evaluate the performance of the algorithms on synthetic graphs, chosen for +their similarity to real social networks. We therefore consider a Watts-Strogatz +graph ($300$ nodes, $4500$ edges)~\cite{watts:1998}, a Barabasi-Albert graph +($300$ nodes, $16200$ edges)~\cite{barabasi:2001}, a Holme-Kim power law graph +($200$ nodes, $9772$ edges)~\cite{Holme:2002}, and the recently introduced +Kronecker graph ($256$ nodes, $10000$ edges)~\cite{Leskovec:2010}. Undirected +graphs are converted to directed graphs by doubling the edges. For every reported data point, we sample edge weights and generate $n$ cascades -from the (IC) model for $n \in \{100, 500, 1000, 2000, 5000\}$. We compare for each algorithm the estimated graph $\hat {\cal G}$ with ${\cal G}$. The initial probability of a node being a source is fixed to $0.05$, i.e. an average of $15$ nodes source nodes per cascades for all experiments, except for Figure~\label{fig:four_figs} (f). All edge weights are chosen uniformly in the interval $[0.2, 0.7]$, except when testing for approximately sparse graphs (see paragraph on robustness). Adjusting for the variance of our experiments, all data points are reported with at most a $\pm 1$ error margin. The parameter $\lambda$ is chosen to be of the order ${\cal O}(\sqrt{\log m / (\alpha n)})$. +from the (IC) model for $n \in \{100, 500, 1000, 2000, 5000\}$. We compare for +each algorithm the estimated graph $\hat {\cal G}$ with ${\cal G}$. The initial +probability of a node being a source is fixed to $0.05$, i.e.\ an average of $15$ +nodes source nodes per cascades for all experiments, except for +Figure~\label{fig:four_figs} (f). All edge weights are chosen uniformly in the +interval $[0.2, 0.7]$, except when testing for approximately sparse graphs (see +paragraph on robustness). Adjusting for the variance of our experiments, all +data points are reported with at most a $\pm 1$ error margin. The parameter +$\lambda$ is chosen to be of the order ${\cal O}(\sqrt{\log m / (\alpha n)})$. \paragraph{Benchmarks} -We compare our \textsc{sparse mle} algorithm to 3 benchmarks: \textsc{greedy} and \textsc{mle} from \cite{Netrapalli:2012} and \textsc{lasso}. The \textsc{mle} algorithm is a maximum-likelihood estimator without $\ell_1$-norm penalization. \textsc{greedy} is an iterative algorithm. We introduced the \textsc{lasso} algorithm in our experiments to achieve faster computation time: -$$\hat \theta_i \in \arg \min_{\theta} \sum_{t \in {\cal T}} |f(\theta_i\cdot x^t) - x_i^{t+1}|^2 + \lambda \|\theta_i\|_1$$ -\textsc{Lasso} has the merit of being both easier and faster to optimize numerically than the other convex-optimization based algorithms. It approximates the $\textsc{sparse mle}$ algorithm by making the assumption that the observations $x_i^{t+1}$ are of the form: $x_i^{t+1} = f(\theta_i\cdot x^t) + \epsilon$, where $\epsilon$ is random white noise. This is not valid in theory since $\epsilon$ \emph{depends on} $f(\theta_i\cdot x^t)$, however the approximation is validated in practice. +We compare our \textsc{sparse mle} algorithm to 3 benchmarks: \textsc{greedy} +and \textsc{mle} from~\cite{Netrapalli:2012} and \textsc{lasso}. The +\textsc{mle} algorithm is a maximum-likelihood estimator without $\ell_1$-norm +penalization. \textsc{greedy} is an iterative algorithm. We introduced the +\textsc{lasso} algorithm in our experiments to achieve faster computation time: +$$\hat \theta_i \in \arg \min_{\theta} \sum_{t \in {\cal T}} |f(\theta_i\cdot +x^t) - x_i^{t+1}|^2 + \lambda \|\theta_i\|_1$$ \textsc{Lasso} has the merit of +being both easier and faster to optimize numerically than the other +convex-optimization based algorithms. It approximates the $\textsc{sparse mle}$ +algorithm by making the assumption that the observations $x_i^{t+1}$ are of the +form: $x_i^{t+1} = f(\theta_i\cdot x^t) + \epsilon$, where $\epsilon$ is random +white noise. This is not valid in theory since $\epsilon$ \emph{depends on} +$f(\theta_i\cdot x^t)$, however the approximation is validated in practice. -We did not benchmark against other known algorithms (\textsc{netrate} \cite{gomezbalduzzi:2011} and \textsc{first edge} \cite{Abrahao:13}) due to the discrete-time assumption. These algorithms also suppose a single-source model, whereas \textsc{sparse mle}, \textsc{mle}, and \textsc{greedy} do not. Learning the graph in the case of a multi-source cascade model is harder (see Figure~\ref{fig:four_figs} (f)) but more realistic, since we rarely have access to ``patient 0'' in practice. +We did not benchmark against other known algorithms (\textsc{netrate} +\cite{gomezbalduzzi:2011} and \textsc{first edge}~\cite{Abrahao:13}) due to the +discrete-time assumption. These algorithms also suppose a single-source model, +whereas \textsc{sparse mle}, \textsc{mle}, and \textsc{greedy} do not. Learning +the graph in the case of a multi-source cascade model is harder (see +Figure~\ref{fig:four_figs} (f)) but more realistic, since we rarely have access +to ``patient 0'' in practice. \paragraph{Graph Estimation} -In the case of the \textsc{lasso}, \textsc{mle} and \textsc{sparse mle} algorithms, we construct the edges of $\hat {\cal G} : \cup_{j \in V} \{(i,j) : \Theta_{ij} > 0.1\}$, \emph{i.e} by thresholding. Finally, we report the F1-score$=2 \text{precision}\cdot\text{recall}/(\text{precision}+\text{recall})$, which considers \emph{(1)} the number of true edges recovered by the algorithm over the total number of edges returned by the algorithm (\emph{precision}) and \emph{(2)} the number of true edges recovered by the algorithm over the total number of edges it should have recovered (\emph{recall}). -Over all experiments, \textsc{sparse mle} achieves higher rates of precision, -recall, and F1-score. Interestingly, both \textsc{mle} and \textsc{sparse mle} perform exceptionally well on the Watts-Strogatz graph. +In the case of the \textsc{lasso}, \textsc{mle} and \textsc{sparse mle} +algorithms, we construct the edges of $\hat {\cal G} : \cup_{j \in V} \{(i,j) : +\Theta_{ij} > 0.1\}$, \emph{i.e} by thresholding. Finally, we report the +F1-score$=2 +\text{precision}\cdot\text{recall}/(\text{precision}+\text{recall})$, which +considers \emph{(1)} the number of true edges recovered by the algorithm over +the total number of edges returned by the algorithm (\emph{precision}) and +\emph{(2)} the number of true edges recovered by the algorithm over the total +number of edges it should have recovered (\emph{recall}). Over all experiments, +\textsc{sparse mle} achieves higher rates of precision, recall, and F1-score. +Interestingly, both \textsc{mle} and \textsc{sparse mle} perform exceptionally +well on the Watts-Strogatz graph. + \begin{comment} - The recovery rate converges at -around $5000$ cascades, which is more than $15$ times the number of nodes. By -contrast, \textsc{sparse mle} achieves a reasonable F$1$-score of $.75$ for roughly $500$ observed cascades. + The recovery rate converges at around $5000$ cascades, which is more than + $15$ times the number of nodes. By contrast, \textsc{sparse mle} achieves a + reasonable F$1$-score of $.75$ for roughly $500$ observed cascades. \end{comment} \paragraph{Quantifying robustness} @@ -60,6 +117,6 @@ The previous experiments only considered graphs with strong edges. To test the algorithms in the approximately sparse case, we add sparse edges to the previous graphs according to a bernoulli variable of parameter $1/3$ for every non-edge, and drawing a weight uniformly from $[0,0.1]$. The non-sparse case is -compared to the sparse case in Figure~\ref{fig:four_figs} (d)--(e) for the $\ell_2$ -norm showing that both the \textsc{lasso}, followed by \textsc{sparse mle} are -the most robust to noise. +compared to the sparse case in Figure~\ref{fig:four_figs} (d)--(e) for the +$\ell_2$ norm showing that both the \textsc{lasso}, followed by \textsc{sparse +mle} are the most robust to noise. |
