jpa's first pass

author: jeanpouget-abadie <jean.pougetabadie@gmail.com> 2015-03-15 13:51:05 -0400
committer: jeanpouget-abadie <jean.pougetabadie@gmail.com> 2015-03-15 13:51:05 -0400
commit: 20cc5bb44d0ae6877d2b5bcd68ec3cde2003d79a (patch)
tree: c43f647a5031b865c450c5752f7c51a5db399fb1 /poster_abstract
parent: 4778a43d367685a952d5ea25d6ecfe29befce944 (diff)
download: cascades-20cc5bb44d0ae6877d2b5bcd68ec3cde2003d79a.tar.gz
1 files changed, 73 insertions, 33 deletions
diff --git a/poster_abstract/main.tex b/poster_abstract/main.tex
index f36423f..054e6f0 100644
--- a/poster_abstract/main.tex
+++ b/poster_abstract/main.tex
@@ -45,7 +45,7 @@ http://dx.doi.org/10.1145/2740908.2744107}
 \alignauthor
 Jean Pouget-Abadie\\
        \affaddr{Harvard University}\\
-       \email{pougetabadie@g.harvard.edu}
+       \email{jeanpougetabadie@g.harvard.edu}
 \alignauthor
 Thibaut Horel\\
        \affaddr{Harvard University}\\
@@ -78,39 +78,42 @@ graphs.
 \section{Introduction}
 
 A recent line of research has focused on the Graph Inference Problem:
-recovering the edges of an unknown graph from the observations of a diffusion
-process propagating on this graph \cite{Netrapalli:2012} CITE. The Independent
-Cascade Model CITE is a famous example of such a diffusion process where the
-edges between a given vertex and its parents are weighted; the weights are
-parameters of the model and capture how much influence a parent vertex has over
-its child vertex.
+recovering the directed edges of an unknown graph from the observations of a diffusion process propagating on this graph \cite{Abrahao:13, Daneshmand:2014,Netrapalli:2012}. For example, the Independent
+Cascade Model, formalised in \cite{Kempe:03}, is a famous diffusion process where each ``infected'' node has a weighted probability to ``infect'' its neighbors in the graph. If we are only able to observe the time step at which nodes are infected over several diffusion processes, can we recover the edges and the edge weights, of the graph?
 
 Here, we propose a sparse recovery framework to not only solve the Graph
 Inference Problem, but also recover the unknown weights of the diffusion
-process. The parallel with sparse recovery problems is as follows: for a given
-vertex, the signal to recover is a vector of weights, the “influence” of its
-(unknown) parents in the graph. This signal is observed indirectly through
-instances of the diffusion process: the only thing known to the observer are
+process. Recall that for every instance of the diffusion process, the only thing known to the observer are
 the time steps at which the vertices in the graph become “infected” by the
-diffusion process. The two main challenges to apply sparse recovery tools to
+diffusion process. The parallel with sparse recovery problems is as follows: for a given
+vertex, the (unknown) “influence” of its
+ parents in the graph is a \emph{signal}, that we observe through a series of \emph{measurements}, which are the instances of the diffusion process.  The two main challenges to apply sparse recovery tools to
 this problem are: (1) contrary to a very common assumption, the measurements
 given by a diffusion process are correlated (2) most diffusion processes lead
 to non-linear sparse recovery problems.
 
 In what follows, we first present a general class of discrete-time diffusion
-processes which encompasses the Influence Cascade Model and the Voter Model
-CITE. For this class of diffusion processes, despite the afore-mentioned
-challenges, we show how to recover the unknown parameters with a convergence
-rate $O(s\log m/\sqrt{n})$ where $s$ is the in-degree of a given vertex, $m$ is
-the number of vertices in the graph and $n$ is the number of observations. In
-particular, given $\Omega(s\log m)$ measurements, we are able to recover the
-edges of the graph. Finally, we validate our approach experimentally, by
-comparing its performance to state of the art algorithms on synthetic data.
+processes which encompasses the famous Influence Cascade Model and the Voter Model. For this class of diffusion processes, despite the aforementioned
+challenges, we show how to recover the unknown parameters with a convergence rate on par with rates observed in the sparse recovery literature.
+Finally, we validate our approach experimentally, by
+comparing its performance to prior algorithms on synthetic data.
+% a convergence
+% rate $O(s\log m/\sqrt{n})$ where $s$ is the in-degree of a given vertex, $m$ is
+% the number of vertices in the graph and $n$ is the number of observations. In
+% particular, given $\Omega(s\log m)$ measurements, we are able to recover the
+% edges of the graph.
+
 
 \section{Model}
 
+\begin{figure}
+\includegraphics[scale=.4]{../presentation/images/drawing.pdf}
+\caption{Illustration of the sparse recovery framework. $\theta_j$ is the unknown weight vector. $b_j$ is the result of the sparse product $\theta_j \cdot X^t$. We observe bernoulli variables {\cal B}($f(b)$). }
+\end{figure}
+
+
 Let consider a graph $G = (V, E)$ with $|V|= m$ and where the set of edges $E$
-is unknown. For a given vertex $j$, the cascade model is parametrized by
+is unknown. For a given vertex $j$, the cascade model is parameterized by
 a vector $\theta_j\in\reals^m$ where the $i$-th coordinate of $\theta_j$
 captures the “influence” of vertex $i$ on $j$. This influence is 0 if $i$ is
 not a parent of $j$.
@@ -128,9 +131,37 @@ be contagious at time $t+1$ is given by:
     = f(\theta_j \cdot X^t)
 \end{equation}
 where $f: \reals \rightarrow [0,1]$. Conditioned on $X^t$, this
-evolution happens independently for each vertex at time $t+1$.
+evolution happens independently for each vertex at time $t+1$. We show below that both the independent cascade model and the voter model can be cast in this framework. \newline
+
+\noindent{\bf Independent Cascade Model} Considering the discrete-time IC model, the probability that a susceptible node $j$ becomes infected at the next time step is given by:
+
+\begin{displaymath}
+    \P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
+    = 1 - \prod_{i = 1}^m (1 - p_{i,j})^{X^t_i}.
+\end{displaymath}
+Defining $\Theta_{i,j} \defeq \log(1-p_{i,j})$, this can be rewritten as:
+\begin{equation}\label{eq:ic}
+    \tag{IC}
+    \P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
+    = 1 - \prod_{i = 1}^m e^{\Theta_{i,j}X^t_i}
+    = 1 - e^{\inprod{\theta_j}{X^t}}
+\end{equation}
+Therefore, the independent cascade model fits into the previous framework with $f : z \mapsto 1 - e^z$. \newline 
+
+\noindent {\bf Voter Model} Here, nodes can be either \emph{red} or \emph{blue}. Each round, every node $j$ independently chooses
+one of its neighbors with probability $\Theta_{i,j}$ and adopts their color. Without loss of generality, we can suppose that the
+\emph{blue} nodes are contagious.
+The
+cascades stops at a fixed horizon time $T$ or if all nodes are of the same color.
+If we denote by $X^t$ the indicator variable of the set of blue nodes at time
+step $t$, then we have:
+\begin{equation}
+\mathbb{P}\left[X^{t+1}_j = 1 | X^t \right] = \sum_{i=1}^m \Theta_{i,j} X_i^t = \inprod{\Theta_j}{X^t}
+\tag{V}
+\end{equation}
+
+Thus, the linear voter model fits into the previous framework with $f: z \mapsto z$.
 
-DO WE HAVE ENOUGH SPACE FOR EXAMPLES HERE?
 
 \section{Results}
 
@@ -142,14 +173,15 @@ $\theta_i$ via $\ell_1$-regularized maximum likelihood estimation:
     - \lambda\|\theta_i\|_1
 \end{equation}
 where $\mathcal{L}_i$ is the log-likelihood of $\theta_i$ given the
-observations:
-\begin{multline*}
-    \mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n) = \frac{1}{|{\cal T}_i|}
-    \sum_{t\in {\cal T}_i } x_i^{t+1}\log f(\inprod{\theta_i}{x^{t}}) \\
-    + (1 - x_i^{t+1})\log\big(1-f(\inprod{\theta_i}{x^t})\big)
-\end{multline*}
-
+observations. 
+% :
+% \begin{multline*}
+%     \mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n) = \frac{1}{|{\cal T}_i|}
+%     \sum_{t\in {\cal T}_i } x_i^{t+1}\log f(\inprod{\theta_i}{x^{t}}) \\
+%     + (1 - x_i^{t+1})\log\big(1-f(\inprod{\theta_i}{x^t})\big)
+% \end{multline*}
 We will need the following assumptions:
+
 \begin{enumerate}
     \item $\log f$ and $\log (1-f)$ are concave functions.
     \item $\log f$ and $\log (1-f)$ have derivatives bounded in absolute value
@@ -165,6 +197,8 @@ We will need the following assumptions:
 for some $\gamma>0$.
 \end{enumerate}
 
+Adapting a result from \cite{Negahban:2009}, we obtain the following finite-sample guarantee:
+
 \begin{theorem}
 \label{thm:main}
 Assume 1. to 3. above and define $n\defeq |\mathcal{T}_i|$. For any
@@ -183,13 +217,19 @@ Assumption 3. above can be replaced by the following data-independent
 assumption:
 \begin{enumerate}
     \item[3.] $\log f$ and $\log (1-f)$ are $\varepsilon$-concave and the
-        expected Hessian $H = \lim_{n\rightarrow\infty\frac{1}{n}X^TX}$ has
-        a smallest eigenvalue bounded from below by $\gamma>0$, where $X$ is
+        expected gram matrix $\lim_{n\rightarrow\infty}\frac{1}{n}X^TX$ has
+        a smallest ``restricted'' eigenvalue bounded from below by $\gamma>0$, where $X$ is
         the $n\times m$ design matrix whose $k$-th row is $X^k$.
 \end{enumerate}
-at the cost of slightly reduced convergence rate:
+supposing that either $\Omega(s^2 \log m)$ cumulative time steps are observed or $\Omega(s \log m \log^3 (s \log m))$ distinct instances of the diffusion process.
+
 \section{Experiments}
 
+\begin{figure}
+\centering
+\includegraphics[scale=.35]{../paper/figures/watts_strogatz.pdf}
+\caption{F1 score as a function of the number of observed cascades for a Watts-Strogatz graph, for the Greedy and MLE algorithm from \cite{Netrapalli:2012}, a Lasso algorithm which approximates \label{eq:pre-mle}, and the penalized log-likelihood program.}
+\end{figure}
 %\section{Acknowledgments}
 
 \bibliographystyle{abbrv}
author	jeanpouget-abadie <jean.pougetabadie@gmail.com>	2015-03-15 13:51:05 -0400
committer	jeanpouget-abadie <jean.pougetabadie@gmail.com>	2015-03-15 13:51:05 -0400
commit	20cc5bb44d0ae6877d2b5bcd68ec3cde2003d79a (patch)
tree	c43f647a5031b865c450c5752f7c51a5db399fb1 /poster_abstract
parent	4778a43d367685a952d5ea25d6ecfe29befce944 (diff)
download	cascades-20cc5bb44d0ae6877d2b5bcd68ec3cde2003d79a.tar.gz