4 files changed, 123 insertions, 101 deletions
diff --git a/paper/sections/abstract.tex b/paper/sections/abstract.tex
index 72a9bf4..3b28765 100644
--- a/paper/sections/abstract.tex
+++ b/paper/sections/abstract.tex
@@ -1,11 +1,12 @@
-In the Graph Inference problem, one seeks to recover the edges of an unknown
-graph from the observations of cascades propagating over this graph.
-In this paper, we approach this problem from the sparse recovery perspective.
-We introduce a general model of cascades, including the voter model and the independent cascade model, for which we provide the first algorithm which recovers the graph's edges with high
-probability and ${\cal O}(s\log m)$ measurements where
-$s$ is the maximum degree of the graph and $m$ is the number of nodes.
-Furthermore, we show that our algorithm also recovers the edge weights (the
-parameters of the diffusion process) and is robust in the context of
-approximate sparsity. Finally we prove an almost matching lower bound of
+In the Network Inference problem, one seeks to recover the edges of an unknown
+graph from the observations of cascades propagating over this graph.  In this
+paper, we approach this problem from the sparse recovery perspective.  We
+introduce a general model of cascades, including the voter model and the
+independent cascade model, for which we provide the first algorithm which
+recovers the graph's edges with high probability and ${\cal O}(s\log m)$
+measurements where $s$ is the maximum degree of the graph and $m$ is the number
+of nodes.  Furthermore, we show that our algorithm also recovers the edge
+weights (the parameters of the diffusion process) and is robust in the context
+of approximate sparsity. Finally we prove an almost matching lower bound of
 $\Omega(s\log\frac{m}{s})$ and validate our approach empirically on synthetic
 graphs.
diff --git a/paper/sections/intro.tex b/paper/sections/intro.tex
index 4d44395..264476b 100644
--- a/paper/sections/intro.tex
+++ b/paper/sections/intro.tex
@@ -1,77 +1,75 @@
-\begin{comment}
-A recent line of research has focused on applying advances in sparse recovery to
-graph analysis. A graph can be interpreted as a signal that one seeks to
-`compress' or `sketch' and then `recovered'. However, we could also consider the
-situation where the graph is unknown to us, and we dispose of few measurements
-to recover the signal. Which real-life processes allow us to `measure' the
-graph?
+%\begin{comment}
+%A recent line of research has focused on applying advances in sparse recovery to
+%graph analysis. A graph can be interpreted as a signal that one seeks to
+%`compress' or `sketch' and then `recovered'. However, we could also consider the
+%situation where the graph is unknown to us, and we dispose of few measurements
+%to recover the signal. Which real-life processes allow us to `measure' the
+%graph?
 
-A diffusion process taking place on a graph can provide valuable information
-about the existence of edges and their edge weights. By observing the sequence
-of nodes which become `infected' over time without knowledge of who has
-`infected' whom, can we recover the graph on which the process takes place? The
-spread of a particular behavior through a network is known as an {\it Influence
-Cascade}. In this context, the {\it Graph Inference}\ problem is the recovery of
-the graph's edges from the observation of few influence cascades. {\color{red}
-Cite references} We propose to show how sparse recovery can be applied to solve
-this recently introduced graph inference problem.
+%A diffusion process taking place on a graph can provide valuable information
+%about the existence of edges and their edge weights. By observing the sequence
+%of nodes which become `infected' over time without knowledge of who has
+%`infected' whom, can we recover the graph on which the process takes place? The
+%spread of a particular behavior through a network is known as an {\it Influence
+%Cascade}. In this context, the {\it Network Inference}\ problem is the recovery
+%of the graph's edges from the observation of few influence cascades.
+%{\color{red} Cite references} We propose to show how sparse recovery can be
+%applied to solve this recently introduced network inference problem.
 
-{\color{red} Graph inference to Network inference}
-Tackling the graph inference problem means constructing a polynomial-time
-algorithm which recovers with high probability a large majority of edges
-correctly from very few observations or {\it cascades}. Prior work shown that
-the graph inference problem can be solved in ${\cal O}(poly(s) \log m)$ number
-of observed cascades, where $s$ is the maximum degree and $m$ is the number of
-nodes in the graph. Almost miraculously, the number of cascades required to
-reconstruct the graph is logarithmic in the number of nodes of the graph.
-Results in the sparse recovery literature lead us to believe that $\Omega(s \log
-m/s)$ cascade observations should be sufficient to recover the graph. In fact,
-we prove this lower bound in a very general sense: any non-adaptive graph
-inference algorithm which succeeds in recovering the graph with constant
-probability requires $\Omega(s \log m/s)$ observations. We show an almost tight
-upper-bound by applying standard sparse recovery techniques and assumptions:
-${\cal O}(s \log m)$ are sufficient to recover the graph. We show that the edge
-weights themselves can also be recovered under the same assumptions.
+%{\color{red} Graph inference to Network inference}
+%Tackling the graph inference problem means constructing a polynomial-time
+%algorithm which recovers with high probability a large majority of edges
+%correctly from very few observations or {\it cascades}. Prior work shown that
+%the graph inference problem can be solved in ${\cal O}(poly(s) \log m)$ number
+%of observed cascades, where $s$ is the maximum degree and $m$ is the number of
+%nodes in the graph. Almost miraculously, the number of cascades required to
+%reconstruct the graph is logarithmic in the number of nodes of the graph.
+%Results in the sparse recovery literature lead us to believe that $\Omega(s \log
+%m/s)$ cascade observations should be sufficient to recover the graph. In fact,
+%we prove this lower bound in a very general sense: any non-adaptive graph
+%inference algorithm which succeeds in recovering the graph with constant
+%probability requires $\Omega(s \log m/s)$ observations. We show an almost tight
+%upper-bound by applying standard sparse recovery techniques and assumptions:
+%${\cal O}(s \log m)$ are sufficient to recover the graph. We show that the edge
+%weights themselves can also be recovered under the same assumptions.
 
- Throughout this paper, we focus on the three main discrete-time diffusion
- processes: the independent cascade model, the voter model, and the linear
- threshold model\dots
- \end{comment}
+ %Throughout this paper, we focus on the three main discrete-time diffusion
+ %processes: the independent cascade model, the voter model, and the linear
+ %threshold model\dots
+ %\end{comment}
 
 Graphs have been extensively studied for their propagative abilities:
 connectivity, routing, gossip algorithms, etc.
-%One question is: can we
-%understand and predict a diffusion process from the graph? Conversely, can we
-%learn a graph from the diffusion process?
 A diffusion process taking place over a graph provides valuable information
 about the presence and weights of its edges. \emph{Influence cascades} are a
 specific type of diffusion processes in which a particular infectious behavior
 spreads over the nodes of the graph.  By only observing the ``infection times''
 of the nodes in the graph, one might hope to recover the underlying graph and
 the parameters of the cascade model. This problem is known in the literature as
-the \emph{Graph Inference problem}.
+the \emph{Network Inference problem}.
 
-More precisely, solving the Graph Inference problem involves designing
-an algorithm taking as input a set of observed cascades (realisations of the
+More precisely, solving the Network Inference problem involves designing an
+algorithm taking as input a set of observed cascades (realisations of the
 diffusion process) and recovers with high probability a large fraction of the
-graph's edges. The goal is then to understand the relationship between the number
-of observations, the probability of success, and the accuracy of the
+graph's edges. The goal is then to understand the relationship between the
+number of observations, the probability of success, and the accuracy of the
 reconstruction.
 
-The Graph Inference problem can be decomposed and analyzed ``node-by-node''.
+The Network Inference problem can be decomposed and analyzed ``node-by-node''.
 Thus, we will focus on a single node of degree $s$ and discuss how to identify
 its parents among the $m$ nodes of the graph. Prior work has shown that the
-required number of observed cascades is $\O(poly(s)\log m)$. {\color{red} Add
-reference}
+required number of observed cascades is $\O(poly(s)\log m)$
+\cite{Netrapalli:2012, Abrahao:13}.
 
-A more recent line of research has focused on applying advances in sparse
-recovery to the graph inference problem. Indeed, the graph can be interpreted
-as a ``sparse signal'' measured through influence cascades and then recovered.
-The challenge is that influence cascade models typically lead to non-linear
-inverse problems. The sparse recovery literature suggests that
-$\Omega(s\log\frac{m}{s})$ cascade observations should be sufficient to recover
-the graph.{\color{red} Add reference} However, the best known upper bound to
-this day is $\O(s^2\log m)$.{\color{red} Add reference}
+A more recent line of research~\cite{Daneshmand:2014} has focused on applying
+advances in sparse recovery to the graph inference problem. Indeed, the graph
+can be interpreted as a ``sparse signal'' measured through influence cascades
+and then recovered.  The challenge is that influence cascade models typically
+lead to non-linear inverse problems. The sparse recovery literature suggests
+that $\Omega(s\log\frac{m}{s})$ cascade observations should be sufficient to
+recover the graph~\cite{donoho2006compressed, candes2006near}. However, the
+best known upper bound to this day is $\O(s^2\log m)$~\cite{Netrapalli:2012,
+Daneshmand:2014}
 
 
 The contributions of this paper are the following:
@@ -82,9 +80,8 @@ The contributions of this paper are the following:
         encompasses the well-studied Independent Cascade Model and Voter Model.
     \item we give an algorithm which recovers the graph's edges using $\O(s\log
         m)$ cascades. Furthermore, we show that our algorithm is also able to
-        recover the edge weights (the parameters of the influence model),
-        a problem which has been seemingly overlooked so far. {\color{red} NOT
-        TRUE}
+        efficiently recover the edge weights (the parameters of the influence
+        model) up to an additive error term,
     \item we show that our algorithm is robust in cases where the signal to
         recover is approximately $s$-sparse by proving guarantees in the
         \emph{stable recovery} setting.
@@ -123,11 +120,6 @@ support recovery algorithm, without the \emph{correlation decay} assumption.
 
 {\color{red} Du et.~al make a citation}
 
-\begin{comment}
-They assume a single-source model, where only one source is selected at random,
-which is less realistic in practice since `patient 0' is rarely unknown to us.
-\end{comment}
-
 {\color{red} say they follow the same model as Gomez and abrahao}
 Closest to this work is a recent paper by \citet{Daneshmand:2014}, wherein the
 authors consider a $\ell_1$-regularized objective function. They adapt standard
diff --git a/paper/sections/model.tex b/paper/sections/model.tex
index 43de785..b478e94 100644
--- a/paper/sections/model.tex
+++ b/paper/sections/model.tex
@@ -2,7 +2,7 @@ We consider a graph ${\cal G}= (V, E, \Theta)$, where $\Theta$ is a $|V|\times
 |V|$ matrix of parameters describing the edge weights of $\mathcal{G}$.
 Intuitively, $\Theta_{i,j}$ captures the ``influence'' of node $i$ on node $j$.
 Let $m\defeq |V|$. For each node $j$, let $\theta_{j}$ be the $j^{th}$ column
-vector of $\Theta$.  A {\color{red} discrete-time} \emph{Cascade model} is a
+vector of $\Theta$.  A discrete-time \emph{Cascade model} is a
 Markov process over a finite state space ${\{0, 1, \dots, K-1\}}^V$ with the
 following properties:
 \begin{enumerate}
@@ -25,16 +25,19 @@ contagious nodes ``influence'' other nodes in the graph to become contagious. An
 the successive states of the nodes in graph ${\cal G}$. Note that both the
 ``single source'' assumption made in~\cite{Daneshmand:2014} and
 \cite{Abrahao:13} as well as the ``uniformly chosen source set'' assumption made
-in~\cite{Netrapalli:2012} verify condition 3. {\color{red} why is it less
-restrictive? explain}
+in~\cite{Netrapalli:2012} verify condition 3. Also note that the multiple-source
+node assumption does not reduce to the single-source assumption: even
+non-overlapping cascades started from two nodes that are far apart cannot in
+practice be attributed to either process without knowledge of the newtork
+topology.
 
-In the context of Graph Inference,~\cite{Netrapalli:2012} focus
+In the context of Network Inference,~\cite{Netrapalli:2012} focus
 on the well-known discrete-time independent cascade model recalled below, which
 \cite{Abrahao:13} and~\cite{Daneshmand:2014} generalize to continuous time. We
 extend the independent cascade model in a different direction by considering
 a more general class of transition probabilities while staying in the
 discrete-time setting. We observe that despite their obvious differences, both
-the independent cascade and the voter models make the graph inference problem
+the independent cascade and the voter models make the network inference problem
 similar to the standard generalized linear model inference problem. In fact, we
 define a class of diffusion processes for which this is true: the
 \emph{Generalized Linear Cascade Models}. The linear threshold model is
@@ -121,18 +124,15 @@ Defining $\Theta_{i,j} \defeq \log(1-p_{i,j})$, this can be rewritten as:
     \tag{IC}
     \P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
     = 1 - \prod_{i = 1}^m e^{\Theta_{i,j}X^t_i}
-    = 1 - e^{\inprod{\theta_j}{X^t}}
+    = 1 - e^{\inprod{\Theta_j}{X^t}}
 \end{equation}
 
 Therefore, the independent cascade model is a Generalized Linear Cascade model
 with inverse link function $f : z \mapsto 1 - e^z$.
 
-Note that interpreting it in the case of the Independent Cascade Model requires
-one more step. Indeed, to cast it as a generalized linear cascade model, we had
-to perform the transformation $\Theta_{i,j} = \log(1-p_{i,j})$, where $p_{i,j}$
-are the infection probabilities. Fortunately, if we estimate $p_j$ through
-$\hat{p}_{j} = \hat{\theta}_j$, a bound on $\|\hat\theta - \theta^*\|_2$
-directly implies a bound on $\|\hat p - p^*\|_2$:
+In Section~\ref{sec:results}, we show bounds on the parameter of the graph.
+Fortunately, a bound on $\|\hat\theta - \theta^*\|_2$ directly implies a bound
+on $\|\hat p - p^*\|_2$ as shown by the following lemma:
 
 \begin{lemma}
     $\|\hat{\theta} - \theta^* \|_2 \geq \|\hat{p} - p^*\|_2$.
@@ -166,39 +166,43 @@ with inverse link function $f: z \mapsto z$.
 
 \subsubsection{Discretization of Continous Model}
 
-Anoter motivation for the Generalized Linear Cascade model is that it captures
+Another motivation for the Generalized Linear Cascade model is that it captures
 the time-discretized formulation of the well-studied  continuous-time
 independent cascade model with exponential transmission function (CICE)
 of~\cite{GomezRodriguez:2010, Abrahao:13, Daneshmand:2014}.  Suppose that time
 is binned into intervals of length $\epsilon$ and let $X^k$ be the set of nodes
 `infected' before or during the $k^{th}$ time interval.  Note that contrary to
-the discrete-time independent cascade model, $X^k_i = 1 \implies X^{k+1}_i =
-1$.  Let $\exp(p)$ be an exponentially-distributed random variable of parameter
-$p$.  By the memoryless property of the exponential, if $X^k_j \neq 1$:
+the discrete-time independent cascade model, $X^k_j = 1 \implies X^{k+1}_j =
+1$.
+
+Let $\exp(p)$ be an exponentially-distributed random variable of parameter $p$
+and let $p_{i,j}$ be the rate of transmission along directed edge $(i,j)$ in the
+CICE model.  By the memoryless property of the exponential, if $X^k_j \neq 1$:
 \begin{align*}
   \mathbb{P}(X^{k+1}_j = 1 | X^k) & = \mathbb{P}(\min_{i \in {\cal N}(j)}
   \exp(p_{i,j}) \leq \epsilon) \\
   & = \mathbb{P}(\exp( \sum_{i=1}^m \Theta_{i,j} X^t_i) \leq \epsilon) \\
-  & = 1 - e^{- \epsilon \cdot \theta_j \cdot X^t}
+  & = 1 - e^{- \epsilon \Theta_j \cdot X^t}
 \end{align*}
 Note that here we implicitly let $\Theta_{i,j} = 0$ if $(i,j)$ is not a
 directed edge of the graph. Note that this formulation is also consistent when
 $X^k_j = 1$ by considering the dummy variables $\forall i, \Theta_{i,i} =
-+\infty$.  Therefore, the $\epsilon-$binned-process of the continuous-time
-model with exponential transmission function is a Generalized Linear Cascade
-model with inverse link function $f:z\mapsto 1-e^{-\epsilon\cdot z}$.
++\infty$.  Therefore, the $\epsilon$-time-binned CICE-induced process is a
+Generalized Linear Cascade model with inverse link function $f:z\mapsto
+1-e^{-\epsilon\cdot z}$.
 
 \subsubsection{Logistic Cascades}
 
 Consider the specific case of ``logistic cascades'' (where $f$ is the logistic
-function). This model captures the idea that there is a threshold after which,
-each additional neighbor's influence contribution, becomes significant. In this
-way, logistic cascades can be thought of approximating the more-commonly
-studied Linear Threshold model TODO: CITE. As we will see later in the
-analysis, the Hessian of our optimization program simplifies to
-$\nabla^2\mathcal{L}(\theta^*) = \frac{1}{|\mathcal{T}|}XX^T$ where $X$ is the
-observation matrix $[x^1 \ldots x^\mathcal{|T|}]$. The (RE)-condition is then
-equivalent to the assumption made in the Lasso analysis of \cite{bickel:2009}.
+function). This model captures the idea that there is a threshold around which
+each additional neighbor's contribution becomes significant: logistic
+cascades can be thought of approximating the more-commonly studied Linear
+Threshold model~\cite{Kempe:03}. As we will see later in the
+analysis, the Hessian of our optimization program simplifies in the case of a
+logistic inverse link function to $\nabla^2\mathcal{L}(\theta^*) =
+\frac{1}{|\mathcal{T}|}XX^T$ where $X$ is the observation matrix $[x^1 \ldots
+x^\mathcal{|T|}]$. The (RE)-condition we introduce subsequently is then
+equivalent to the assumption made in the Lasso analysis of~\cite{bickel:2009}.
 
 
 
@@ -230,7 +234,10 @@ equivalent to the assumption made in the Lasso analysis of \cite{bickel:2009}.
 
 \begin{figure}
   \includegraphics[scale=.4]{figures/drawing.pdf}
-  \caption{Illustration of the sparse-recovery approach}
+  \caption{Illustration of the sparse-recovery approach: the measurement matrix
+  is known, as is a Bernoulli realization of the matrix-vector product via the
+non-linear transformation induced by $f$. Our objective is to recover the
+unknown weight vector $\theta_i$ for each node $i$.}
 \end{figure}
 
 
diff --git a/paper/sparse.bib b/paper/sparse.bib
index 5df4b59..1dd7589 100644
--- a/paper/sparse.bib
+++ b/paper/sparse.bib
@@ -501,3 +501,25 @@ year = "2009"
   year={2014},
   publisher={JMLR. org}
 }
+
+@article{donoho2006compressed,
+  title={Compressed sensing},
+  author={Donoho, David L},
+  journal={Information Theory, IEEE Transactions on},
+  volume={52},
+  number={4},
+  pages={1289--1306},
+  year={2006},
+  publisher={IEEE}
+}
+
+@article{candes2006near,
+  title={Near-optimal signal recovery from random projections: Universal encoding strategies?},
+  author={Candes, Emmanuel J and Tao, Terence},
+  journal={Information Theory, IEEE Transactions on},
+  volume={52},
+  number={12},
+  pages={5406--5425},
+  year={2006},
+  publisher={IEEE}
+}