small reviewer updates

author: Jean Pouget-Abadie <jean.pougetabadie@gmail.com> 2015-04-29 20:58:17 -0400
committer: Jean Pouget-Abadie <jean.pougetabadie@gmail.com> 2015-04-29 20:58:17 -0400
commit: 7fd038c7b0f00f6db411e9c5037133fd09509a8d (patch)
tree: 61bbbae6b68049cd75ecf8596b48d1007bdcd8bc /paper
parent: 42276753eeb857b3c085cead2c10315048c29a62 (diff)
download: cascades-7fd038c7b0f00f6db411e9c5037133fd09509a8d.tar.gz
4 files changed, 159 insertions, 90 deletions
diff --git a/paper/paper.tex b/paper/paper.tex
index 96a1c58..ce8cde9 100644
--- a/paper/paper.tex
+++ b/paper/paper.tex
@@ -43,6 +43,9 @@
 \usepackage[utf8]{inputenc}
 \usepackage{caption}
 \usepackage{array}
+
+% This is only temporary to take the reviewers' comments into account
+\usepackage{color}
 % The \icmltitle you define below is probably too long as a header.
 % Therefore, a short form for the running title is supplied here:
 \icmltitlerunning{Inferring Graphs from Cascades: A Sparse Recovery Framework}
diff --git a/paper/sections/intro.tex b/paper/sections/intro.tex
index d263860..a04b5f1 100644
--- a/paper/sections/intro.tex
+++ b/paper/sections/intro.tex
@@ -1,11 +1,41 @@
 \begin{comment}
-A recent line of research has focused on applying advances in sparse recovery to graph analysis. A graph can be interpreted as a signal that one seeks to `compress' or `sketch' and then `recovered'. However, we could also consider the situation where the graph is unknown to us, and we dispose of few measurements to recover the signal. Which real-life processes allow us to `measure' the graph?
+A recent line of research has focused on applying advances in sparse recovery to
+graph analysis. A graph can be interpreted as a signal that one seeks to
+`compress' or `sketch' and then `recovered'. However, we could also consider the
+situation where the graph is unknown to us, and we dispose of few measurements
+to recover the signal. Which real-life processes allow us to `measure' the
+graph?
 
-A diffusion process taking place on a graph can provide valuable information about the existence of edges and their edge weights. By observing the sequence of nodes which become `infected' over time without knowledge of who has `infected' whom, can we recover the graph on which the process takes place? The spread of a particular behavior through a network is known as an {\it Influence Cascade}. In this context, the {\it Graph Inference} problem is the recovery of the graph's edges from the observation of few influence cascades. We propose to show how sparse recovery can be applied to solve this recently introduced graph inference problem.
+A diffusion process taking place on a graph can provide valuable information
+about the existence of edges and their edge weights. By observing the sequence
+of nodes which become `infected' over time without knowledge of who has
+`infected' whom, can we recover the graph on which the process takes place? The
+spread of a particular behavior through a network is known as an {\it Influence
+Cascade}. In this context, the {\it Graph Inference}\ problem is the recovery of
+the graph's edges from the observation of few influence cascades. We propose to
+show how sparse recovery can be applied to solve this recently introduced graph
+inference problem.
 
-Tackling the graph inference problem means constructing a polynomial-time algorithm which recovers with high probability a large majority of edges correctly from very few observations or {\it cascades}. Prior work shown that the graph inference problem can be solved in ${\cal O}(poly(s) \log m)$ number of observed cascades, where $s$ is the maximum degree and $m$ is the number of nodes in the graph. Almost miraculously, the number of cascades required to reconstruct the graph is logarithmic in the number of nodes of the graph. Results in the sparse recovery literature lead us to believe that $\Omega(s \log m/s)$ cascade observations should be sufficient to recover the graph. In fact, we prove this lower bound in a very general sense: any non-adaptive graph inference algorithm which succeeds in recovering the graph with constant probability requires $\Omega(s \log m/s)$ observations. We show an almost tight upper-bound by applying standard sparse recovery techniques and assumptions: ${\cal O}(s \log m)$ are sufficient to recover the graph. We show that the edge weights themselves can also be recovered under the same assumptions.
+{\color{red} Graph inference to Network inference}
+Tackling the graph inference problem means constructing a polynomial-time
+algorithm which recovers with high probability a large majority of edges
+correctly from very few observations or {\it cascades}. Prior work shown that
+the graph inference problem can be solved in ${\cal O}(poly(s) \log m)$ number
+of observed cascades, where $s$ is the maximum degree and $m$ is the number of
+nodes in the graph. Almost miraculously, the number of cascades required to
+reconstruct the graph is logarithmic in the number of nodes of the graph.
+Results in the sparse recovery literature lead us to believe that $\Omega(s \log
+m/s)$ cascade observations should be sufficient to recover the graph. In fact,
+we prove this lower bound in a very general sense: any non-adaptive graph
+inference algorithm which succeeds in recovering the graph with constant
+probability requires $\Omega(s \log m/s)$ observations. We show an almost tight
+upper-bound by applying standard sparse recovery techniques and assumptions:
+${\cal O}(s \log m)$ are sufficient to recover the graph. We show that the edge
+weights themselves can also be recovered under the same assumptions.
 
- Throughout this paper, we focus on the three main discrete-time diffusion processes: the independent cascade model, the voter model, and the linear threshold model...
+ Throughout this paper, we focus on the three main discrete-time diffusion
+ processes: the independent cascade model, the voter model, and the linear
+ threshold model\dots
  \end{comment}
 
 Graphs have been extensively studied for their propagative abilities:
@@ -31,7 +61,8 @@ reconstruction.
 The Graph Inference problem can be decomposed and analyzed ``node-by-node''.
 Thus, we will focus on a single node of degree $s$ and discuss how to identify
 its parents among the $m$ nodes of the graph. Prior work has shown that the
-required number of observed cascades is $\O(poly(s)\log m)$.
+required number of observed cascades is $\O(poly(s)\log m)$. {\color{red} Add
+reference}
 
 A more recent line of research has focused on applying advances in sparse
 recovery to the graph inference problem. Indeed, the graph can be interpreted
@@ -39,7 +70,9 @@ as a ``sparse signal'' measured through influence cascades and then recovered.
 The challenge is that influence cascade models typically lead to non-linear
 inverse problems. The sparse recovery literature suggests that
 $\Omega(s\log\frac{m}{s})$ cascade observations should be sufficient to recover
-the graph. However, the best known upper bound to this day is $\O(s^2\log m)$.
+the graph. However, the best known upper bound to this day is $\O(s^2\log
+m)$.{\color{red} Add reference}
+
 
 The contributions of this paper are the following:
 \begin{itemize}
@@ -69,44 +102,59 @@ Section~\ref{sec:experiments}.
 \paragraph{Related Work}
 
 The study of edge prediction in graphs has been an active field of research for
-over a decade \cite{Nowell08, Leskovec07, AdarA05}.
-\cite{GomezRodriguez:2010} introduced the {\scshape Netinf} algorithm, which
-approximates the likelihood of cascades represented as a continuous process.
-The algorithm was improved in later work \cite{gomezbalduzzi:2011}, but is not
-known to have any theoretical guarantees beside empirical validation on synthetic
-networks. \citet{Netrapalli:2012} studied the discrete-time version of the
-independent cascade model and obtained the first ${\cal O}(s^2 \log m)$
-recovery guarantee on general networks. The algorithm is based on a
-likelihood function similar to the one we propose, without the $\ell_1$-norm penalty. Their
-analysis depends on a {\it correlation decay} assumption, which limits the
-number of new infections at every step. In this setting, they show a lower
-bound of the number of cascades needed for support recovery with constant
-probability of the order $\Omega(s \log (m/s))$. They also suggest a {\scshape
-Greedy} algorithm, which achieves a ${\cal O}(s \log m)$ guarantee in the case
-of tree graphs. The work of \cite{Abrahao:13} studies the same continuous-model framework as
-\cite{GomezRodriguez:2010} and obtains an ${\cal O}(s^9 \log^2 s \log m)$ support
-recovery algorithm, without the \emph{correlation decay} assumption.
+over a decade~\cite{Nowell08, Leskovec07, AdarA05}.~\cite{GomezRodriguez:2010}
+introduced the {\scshape Netinf} algorithm, which approximates the likelihood of
+cascades represented as a continuous process.  The algorithm was improved in
+later work~\cite{gomezbalduzzi:2011}, but is not known to have any theoretical
+guarantees beside empirical validation on synthetic networks.
+\citet{Netrapalli:2012} studied the discrete-time version of the independent
+cascade model and obtained the first ${\cal O}(s^2 \log m)$ recovery guarantee
+on general networks. The algorithm is based on a likelihood function similar to
+the one we propose, without the $\ell_1$-norm penalty. Their analysis depends on
+a {\it correlation decay\/} assumption, which limits the number of new infections
+at every step. In this setting, they show a lower bound of the number of
+cascades needed for support recovery with constant probability of the order
+$\Omega(s \log (m/s))$. They also suggest a {\scshape Greedy} algorithm, which
+achieves a ${\cal O}(s \log m)$ guarantee in the case of tree graphs. The work
+of~\cite{Abrahao:13} studies the same continuous-model framework as
+\cite{GomezRodriguez:2010} and obtains an ${\cal O}(s^9 \log^2 s \log m)$
+support recovery algorithm, without the \emph{correlation decay} assumption.
 
 \begin{comment}
-They assume a single-source model, where only one source is selected at random, which is less realistic in practice since `patient 0' is rarely unknown to us.
+They assume a single-source model, where only one source is selected at random,
+which is less realistic in practice since `patient 0' is rarely unknown to us.
 \end{comment}
 
-Closest to this work is a recent paper by \citet{Daneshmand:2014}, wherein the authors
-consider a $\ell_1$-regularized objective function. They adapt standard results
-from sparse recovery to obtain a recovery bound of ${\cal O}(s^3 \log m)$ under an irrepresentability condition \cite{Zhao:2006}. Under stronger
-assumptions, they match the \cite{Netrapalli:2012} bound of ${\cal O}(s^2 \log
+{\color{red} say they follow the same model as Gomez and abrahao}
+Closest to this work is a recent paper by \citet{Daneshmand:2014}, wherein the
+authors consider a $\ell_1$-regularized objective function. They adapt standard
+results from sparse recovery to obtain a recovery bound of ${\cal O}(s^3 \log
+m)$ under an irrepresentability condition~\cite{Zhao:2006}. Under stronger
+assumptions, they match the~\cite{Netrapalli:2012} bound of ${\cal O}(s^2 \log
 m)$, by exploiting similar properties of the convex program's KKT conditions.
 In contrast, our work studies discrete-time diffusion processes including the
-Independent Cascade model under weaker assumptions. Furthermore, we analyze
-both the recovery of the graph's edges and the estimation of the model's
-parameters, and achieve close to optimal bounds.
+Independent Cascade model under weaker assumptions. Furthermore, we analyze both
+the recovery of the graph's edges and the estimation of the model's parameters,
+and achieve close to optimal bounds.
 
 \begin{comment}
-Their work has the merit of studying a generalization of the discrete-time independent cascade model to continuous functions. Similarly to \cite{Abrahao:13}, they place themselves in
-the restrictive single-source context.
+Their work has the merit of studying a generalization of the discrete-time
+independent cascade model to continuous functions. Similarly to
+\cite{Abrahao:13}, they place themselves in the restrictive single-source
+context.
 \end{comment}
 
 \begin{comment}
 \paragraph{Our contributions}
-Though our work follows closely the spirit of \cite{Netrapalli:2012} and \cite{Daneshmand:2014}, we believe we provide several significant improvements to their work. We study sparse recovery under less restrictive assumptions and obtain the first ${\cal O}(s \log m)$ algorithm for graph inference from cascades. We also study the seemingly overlooked problem of weight recovery as well as the setting of the relaxed sparsity setting. Finally, we show these results are almost tight, by proving in section~\ref{sec:lowerbound} the first lower bound on the number of observations required to recover the edges and the edge weights of a graph in the general case. We study the case of the two best known diffusion processes for simplicity as outlined in \cite{Kempe:03}, but many arguments can be extended to more general diffusion processes.
+Though our work follows closely the spirit of \cite{Netrapalli:2012} and
+\cite{Daneshmand:2014}, we believe we provide several significant improvements
+to their work. We study sparse recovery under less restrictive assumptions and
+obtain the first ${\cal O}(s \log m)$ algorithm for graph inference from
+cascades. We also study the seemingly overlooked problem of weight recovery as
+well as the setting of the relaxed sparsity setting. Finally, we show these
+results are almost tight, by proving in section~\ref{sec:lowerbound} the first
+lower bound on the number of observations required to recover the edges and the
+edge weights of a graph in the general case. We study the case of the two best
+known diffusion processes for simplicity as outlined in \cite{Kempe:03}, but
+many arguments can be extended to more general diffusion processes.
 \end{comment}
diff --git a/paper/sections/model.tex b/paper/sections/model.tex
index f4bb4c5..19d7506 100644
--- a/paper/sections/model.tex
+++ b/paper/sections/model.tex
@@ -2,8 +2,9 @@ We consider a graph ${\cal G}= (V, E, \Theta)$, where $\Theta$ is a $|V|\times
 |V|$ matrix of parameters describing the edge weights of $\mathcal{G}$.
 Intuitively, $\Theta_{i,j}$ captures the ``influence'' of node $i$ on node $j$.
 Let $m\defeq |V|$. For each node $j$, let $\theta_{j}$ be the $j^{th}$ column
-vector of $\Theta$.  A \emph{Cascade model} is a Markov process over a finite
-state space $\{0, 1, \dots, K-1\}^V$ with the following properties:
+vector of $\Theta$.  A {\color{red} discrete-time} \emph{Cascade model} is a
+Markov process over a finite state space ${\{0, 1, \dots, K-1\}}^V$ with the
+following properties:
 \begin{enumerate}
     \item Conditioned on the previous time step, the transition events between
         two states in $\{0,1,\dots, K-1\}$ for each $i \in V$ are mutually
@@ -12,20 +13,23 @@ state space $\{0, 1, \dots, K-1\}^V$ with the following properties:
     that all transition probabilities of the Markov process are either constant
     or can be expressed as a function of the graph parameters $\Theta$ and the
     set of ``contagious nodes'' at the previous time step.
-\item The initial probability over $\{0, 1, \dots, K-1\}^V$ is such that all nodes can eventually reach a \emph{contagious state}
-    with non-zero probability. The ``contagious'' nodes at $t=0$ are called
+  \item The initial probability over ${\{0, 1, \dots, K-1\}}^V$ is such that all
+  nodes can eventually reach a \emph{contagious state} with non-zero
+  probability. The ``contagious'' nodes at $t=0$ are called
     \emph{source nodes}.
 \end{enumerate}
 
 In other words, a cascade model describes a diffusion process where a set of
-contagious nodes ``influence'' other nodes in the graph to become contagious. An \emph{influence cascade} is a realisation of this random process, \emph{i.e.} the successive states of the nodes in graph ${\cal G}$. Note that both
-the ``single source'' assumption made in \cite{Daneshmand:2014} and
-\cite{Abrahao:13} as well as the ``uniformly chosen source set'' assumption
-made in \cite{Netrapalli:2012} verify condition 3.
+contagious nodes ``influence'' other nodes in the graph to become contagious. An
+\emph{influence cascade} is a realisation of this random process, \emph{i.e.}
+the successive states of the nodes in graph ${\cal G}$. Note that both the
+``single source'' assumption made in~\cite{Daneshmand:2014} and
+\cite{Abrahao:13} as well as the ``uniformly chosen source set'' assumption made
+in~\cite{Netrapalli:2012} verify condition 3.
 
-In the context of Graph Inference, \cite{Netrapalli:2012} focus
+In the context of Graph Inference,~\cite{Netrapalli:2012} focus
 on the well-known discrete-time independent cascade model recalled below, which
-\cite{Abrahao:13} and \cite{Daneshmand:2014} generalize to continuous time. We
+\cite{Abrahao:13} and~\cite{Daneshmand:2014} generalize to continuous time. We
 extend the independent cascade model in a different direction by considering
 a more general class of transition probabilities while staying in the
 discrete-time setting. We observe that despite their obvious differences, both
@@ -51,14 +55,14 @@ becoming ``contagious'' at time step $t+1$ conditioned on $X^t$ is a Bernoulli
 variable of parameter $f(\theta_j \cdot X^t)$:
 \begin{equation}
     \label{eq:glm}
-    \mathbb{P}(X^{t+1}_j = 1|X^t) 
+    \mathbb{P}(X^{t+1}_j = 1|X^t)
     = f(\theta_j \cdot X^t)
 \end{equation}
 where $f: \reals \rightarrow [0,1]$
 \end{definition}
 
 In other words, each generalized linear cascade provides, for each node $j \in
-V$ a series of measurements $(X^t, X^{t+1}_j)_{t \in {\cal T}_j}$ sampled from
+V$ a series of measurements ${(X^t, X^{t+1}_j)}_{t \in {\cal T}_j}$ sampled from
 a generalized linear model. Note also that $\E[X^{t+1}_i\,|\,X^t]
 = f(\inprod{\theta_i}{X^t})$. As such, $f$ can be interpreted as the inverse
 link function of our generalized linear cascade model.
@@ -70,7 +74,7 @@ link function of our generalized linear cascade model.
 % cascade.
 
 % \begin{definition}
-%     \label{def:glcm} 
+%     \label{def:glcm}
 %     Let us denote by $\{\mathcal{F}_t, t\in\ints\}$ the natural
 %     filtration induced by $\{X_t, t\in\ints\}$. A \emph{generalized linear
 %     cascade} is characterized by the following equation:
@@ -99,17 +103,17 @@ In the independent cascade model, nodes can be either susceptible, contagious
 or immune. At $t=0$, all source nodes are ``contagious'' and all remaining
 nodes are ``susceptible''. At each time step $t$, for each edge $(i,j)$ where
 $j$ is susceptible and $i$ is contagious, $i$ attempts to infect $j$ with
-probability $p_{i,j}\in(0,1]$; the infection attempts are mutually independent.
+probability $p_{i,j}\in]0,1]$; the infection attempts are mutually independent.
 If $i$ succeeds, $j$ will become contagious at time step $t+1$. Regardless of
 $i$'s success, node $i$ will be immune at time $t+1$. In other words, nodes
 stay contagious for only one time step. The cascade process terminates when no
 contagious nodes remain.
 
-If we denote by $X^t$ the indicator variable of the set of contagious nodes at time
-step $t$, then if $j$ is susceptible at time step $t+1$, we have:
+If we denote by $X^t$ the indicator variable of the set of contagious nodes at
+time step $t$, then if $j$ is susceptible at time step $t+1$, we have:
 \begin{displaymath}
     \P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
-    = 1 - \prod_{i = 1}^m (1 - p_{i,j})^{X^t_i}.
+    = 1 - \prod_{i = 1}^m {(1 - p_{i,j})}^{X^t_i}.
 \end{displaymath}
 Defining $\Theta_{i,j} \defeq \log(1-p_{i,j})$, this can be rewritten as:
 \begin{equation}\label{eq:ic}
@@ -124,7 +128,8 @@ with inverse link function $f : z \mapsto 1 - e^z$.
 
 \subsubsection{The Linear Voter Model}
 
-In the Linear Voter Model, nodes can be either \emph{red} or \emph{blue}. Without loss of generality, we can suppose that the
+In the Linear Voter Model, nodes can be either \emph{red} or \emph{blue}.
+Without loss of generality, we can suppose that the
 \emph{blue} nodes are contagious. The parameters of the graph are normalized
 such that $\forall i, \ \sum_j \Theta_{i,j} = 1$ and we assume that
 $\Theta_{i,i}$ is always non-zero, meaning that all nodes have self-loops.
@@ -134,7 +139,8 @@ cascades stops at a fixed horizon time $T$ or if all nodes are of the same color
 If we denote by $X^t$ the indicator variable of the set of blue nodes at time
 step $t$, then we have:
 \begin{equation}
-\mathbb{P}\left[X^{t+1}_j = 1 | X^t \right] = \sum_{i=1}^m \Theta_{i,j} X_i^t = \inprod{\Theta_j}{X^t}
+\mathbb{P}\left[X^{t+1}_j = 1 | X^t \right] = \sum_{i=1}^m \Theta_{i,j} X_i^t =
+\inprod{\Theta_j}{X^t}
 \tag{V}
 \end{equation}
 
@@ -143,25 +149,27 @@ with inverse link function $f: z \mapsto z$.
 
 % \subsection{The Linear Threshold Model}
 
-% In the Linear Threshold Model, each node $j\in V$ has a threshold $t_j$ from the interval $[0,1]$ and for each node, the sum of incoming weights is less than $1$: $\forall j\in V$, $\sum_{i=1}^m \Theta_{i,j} \leq 1$. 
+% In the Linear Threshold Model, each node $j\in V$ has a threshold $t_j$ from
+% the interval $[0,1]$ and for each node, the sum of incoming weights is less
+% than $1$: $\forall j\in V$, $\sum_{i=1}^m \Theta_{i,j} \leq 1$.
 
-% Nodes have two states: susceptible or contagious. At each time step, each susceptible
-% node $j$ becomes contagious if the sum of the incoming weights from contagious parents is greater than $j$'s threshold $t_j$. Nodes remain contagious until the end of the cascade, which is reached when no new susceptible nodes become contagious.
+% Nodes have two states: susceptible or contagious. At each time step, each
+% susceptible node $j$ becomes contagious if the sum of the incoming weights
+% from contagious parents is greater than $j$'s threshold $t_j$. Nodes remain
+% contagious until the end of the cascade, which is reached when no new
+% susceptible nodes become contagious.
 
-% As such, the source nodes are chosen, the process is entirely deterministic. Let $X^t$ be the indicator variable of the set of contagious nodes at time step $t-1$, then:
-% \begin{equation}
-% \nonumber
-%     X^{t+1}_j = \mathbbm{1}_{\sum_{i=1}^m \Theta_{i,j}X^t_i > t_j}
-%     = \mathbbm{1}_{\inprod{\theta_j}{X^t} > t_j}
-% \end{equation}
-% where we defined again $\theta_j\defeq (\Theta_{1,j},\ldots,\Theta_{m,j})$. In other words, for every step in the linear threshold model, we observe the following signal:
+% As such, the source nodes are chosen, the process is entirely deterministic.
+% Let $X^t$ be the indicator variable of the set of contagious nodes at time
+% step $t-1$, then: \begin{equation} \nonumber X^{t+1}_j =
+% \mathbbm{1}_{\sum_{i=1}^m \Theta_{i,j}X^t_i > t_j} =
+% \mathbbm{1}_{\inprod{\theta_j}{X^t} > t_j} \end{equation} where we defined
+% again $\theta_j\defeq (\Theta_{1,j},\ldots,\Theta_{m,j})$. In other words, for
+% every step in the linear threshold model, we observe the following signal:
 
-% \begin{equation}
-%     \label{eq:lt}
-%     \tag{LT}
-%     \mathbb{P} \left[X^{t+1}_j = 1 | X^t\right] = \text{sign} \left(\inprod{\theta_j}{X^t} - t_j \right)
-% \end{equation}
-% where ``sign'' is the function $\mathbbm{1}_{\cdot > 0}$.
+% \begin{equation} \label{eq:lt} \tag{LT} \mathbb{P} \left[X^{t+1}_j = 1 |
+% X^t\right] = \text{sign} \left(\inprod{\theta_j}{X^t} - t_j \right)
+% \end{equation} where ``sign'' is the function $\mathbbm{1}_{\cdot > 0}$.
 
 
 
@@ -182,8 +190,8 @@ $\Theta$ via Maximum Likelihood Estimation (MLE). Denoting by $\mathcal{L}$ the
 log-likelihood function, we consider the following $\ell_1$-regularized MLE
 problem:
 \begin{displaymath}
-    \hat{\Theta} \in \argmax_{\Theta} \frac{1}{n} \mathcal{L}(\Theta\,|\,x^1,\ldots,x^n)
-    - \lambda\|\Theta\|_1
+    \hat{\Theta} \in \argmax_{\Theta} \frac{1}{n}
+    \mathcal{L}(\Theta\,|\,x^1,\ldots,x^n) - \lambda\|\Theta\|_1
 \end{displaymath}
 where $\lambda$ is the regularization factor which helps preventing
 overfitting and controls the sparsity of the solution.
@@ -197,10 +205,12 @@ $\Theta$ can be estimated by a separate optimization program:
     \hat{\theta}_i \in \argmax_{\theta} \mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n)
     - \lambda\|\theta_i\|_1
 \end{equation}
-where we denote by ${\cal T}_i$ the time steps at which node $i$ is susceptible and:
+where we denote by ${\cal T}_i$ the time steps at which node $i$ is susceptible
+and:
 \begin{multline*}
-    \mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n) = \frac{1}{|{\cal T}_i|} \sum_{t\in {\cal T}_i } x_i^{t+1}\log f(\inprod{\theta_i}{x^{t}}) \\
-    + (1 - x_i^{t+1})\log\big(1-f(\inprod{\theta_i}{x^t})\big)
+    \mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n) = \frac{1}{|{\cal T}_i|}
+    \sum_{t\in {\cal T}_i } x_i^{t+1}\log f(\inprod{\theta_i}{x^{t}}) \\ + (1 -
+    x_i^{t+1})\log\big(1-f(\inprod{\theta_i}{x^t})\big)
 \end{multline*}
 
 In the case of the voter model, the measurements include all time steps until
@@ -217,3 +227,4 @@ a twice-differentiable function $f$ is log concave iff. $f''f \leq f'^2$. It is
 easy to verify this property for $f$ and $(1-f)$ in the Independent Cascade
 Model and Voter Model.
 
+{\color{red} TODO: talk about the different constraints}
diff --git a/paper/sections/results.tex b/paper/sections/results.tex
index f3c5366..b156897 100644
--- a/paper/sections/results.tex
+++ b/paper/sections/results.tex
@@ -20,7 +20,12 @@ a set $\mathcal{T}$ of observations. We will write $n\defeq |\mathcal{T}|$.
 \label{sec:main_theorem}
 
 In this section, we analyze the case where $\theta^*$ is exactly sparse. We
-write $S\defeq\text{supp}(\theta^*)$ and $s=|S|$. Recall, that $\theta_i$ is the vector of weights for all edges \emph{directed at} the node we are solving for. In other words, $S$ is the set of all nodes susceptible to influence node $i$, also referred to as its parents. Our main theorem will rely on the now standard \emph{restricted eigenvalue condition}.
+write $S\defeq\text{supp}(\theta^*)$ and $s=|S|$. Recall, that $\theta_i$ is the
+vector of weights for all edges \emph{directed at} the node we are solving for.
+In other words, $S$ is the set of all nodes susceptible to influence node $i$,
+also referred to as its parents. Our main theorem will rely on the now standard
+\emph{restricted eigenvalue condition}.
+{\color{red} Add reference}
 
 \begin{definition}
     Let $\Sigma\in\mathcal{S}_m(\reals)$ be a real symmetric matrix and $S$ be
@@ -29,7 +34,7 @@ write $S\defeq\text{supp}(\theta^*)$ and $s=|S|$. Recall, that $\theta_i$ is the
     3\|X_S\|_1\}$. We say that $\Sigma$ satisfies the
     $(S,\gamma)$-\emph{restricted eigenvalue condition} iff:
 \begin{equation}
-    \forall X \in {\cal C(S)}, X^T \Sigma X \geq \gamma \|X\|_2^2 
+    \forall X \in {\cal C(S)}, X^T \Sigma X \geq \gamma \|X\|_2^2
 \tag{RE}
 \label{eq:re}
 \end{equation}
@@ -38,10 +43,12 @@ write $S\defeq\text{supp}(\theta^*)$ and $s=|S|$. Recall, that $\theta_i$ is the
 A discussion of the $(S,\gamma)$-{\bf(RE)} assumption in the context of
 generalized linear cascade models can be found in Section~\ref{sec:re}. In our
 setting we require that the (RE)-condition holds for the Hessian of the
-log-likelihood function $\mathcal{L}$: it essentially captures the fact that the binary
-vectors of the set of active nodes are not \emph{too} collinear.
+log-likelihood function $\mathcal{L}$: it essentially captures the fact that the
+binary vectors of the set of active nodes are not \emph{too} collinear.
 
-We will also need the following assumption on the inverse link function $f$ of the generalized linear cascade model:
+{\color{red} Rewrite the minimal assumptions necessary}
+We will also need the following assumption on the inverse link function $f$ of
+the generalized linear cascade model:
 \begin{equation}
     \tag{LF}
     \max\left(\left|\frac{f'}{f}(\inprod{\theta^*}{x})\right|,
@@ -51,11 +58,11 @@ We will also need the following assumption on the inverse link function $f$ of t
 for some $\alpha\in(0, 1)$ and for all $x\in\reals^m$ such that
 $f(\inprod{\theta^*}{x})\notin\{0,1\}$.
 
-In the voter model, $\frac{f'}{f}(z) = \frac{1}{z}$ and $\frac{f'}{f}(1-z)
-= \frac{1}{1-z}$. Hence (LF) will hold as soon as $\alpha\leq \Theta_{i,j}\leq
-1-\alpha$ for all $(i,j)\in E$ which is always satisfied for some $\alpha$ for non-isolated nodes.
-In the Independent Cascade Model, $\frac{f'}{f}(z)
-= \frac{1}{1-e^{-z}}$ and $\frac{f'}{1-f}(z) = -1$ and (LF) is not restrictive.
+In the voter model, $\frac{f'}{f}(z) = \frac{1}{z}$ and $\frac{f'}{f}(1-z) =
+\frac{1}{1-z}$. Hence (LF) will hold as soon as $\alpha\leq \Theta_{i,j}\leq
+1-\alpha$ for all $(i,j)\in E$ which is always satisfied for some $\alpha$ for
+non-isolated nodes.  In the Independent Cascade Model, $\frac{f'}{f}(z) =
+\frac{1}{1-e^{-z}}$ and $\frac{f'}{1-f}(z) = -1$ and (LF) is not restrictive.
 
 \begin{comment}
 Remember that in this case we have $\Theta_{i,j} = \log(1-p_{i,j})$. These
@@ -189,12 +196,12 @@ solve the Graph Inference problem.
 Under the same assumptions as Theorem~\ref{thm:main}, let $\hat {\cal S}_\eta
 \defeq \{ j \in \{1,\ldots, m\} : \hat{\theta}_j > \eta\}$ for $\eta > 0$. For
 $\epsilon>0$ and $\epsilon < \eta$, let ${\cal S}^*_{\eta + \epsilon} \defeq \{
-i \in \{1,\ldots,m\} :\theta_i^* > \eta +\epsilon \}$ be the set of all true `strong'
-parents. Suppose the number of measurements verifies:
-$ n > \frac{9s\log m}{\alpha\gamma^2\epsilon^2}$.
-Then with probability $1-\frac{1}{m}$, ${\cal S}^*_{\eta + \epsilon} \subseteq
-\hat {\cal S}_\eta \subseteq {\cal S}^*$. In other words we recover all `strong'
-parents and no `false' parents. 
+i \in \{1,\ldots,m\} :\theta_i^* > \eta +\epsilon \}$ be the set of all true
+`strong' parents. Suppose the number of measurements verifies: $ n >
+\frac{9s\log m}{\alpha\gamma^2\epsilon^2}$.  Then with probability
+$1-\frac{1}{m}$, ${\cal S}^*_{\eta + \epsilon} \subseteq \hat {\cal S}_\eta
+\subseteq {\cal S}^*$. In other words we recover all `strong' parents and no
+`false' parents. 
 \end{corollary}
 
 \begin{proof}
author	Jean Pouget-Abadie <jean.pougetabadie@gmail.com>	2015-04-29 20:58:17 -0400
committer	Jean Pouget-Abadie <jean.pougetabadie@gmail.com>	2015-04-29 20:58:17 -0400
commit	7fd038c7b0f00f6db411e9c5037133fd09509a8d (patch)
tree	61bbbae6b68049cd75ecf8596b48d1007bdcd8bc /paper
parent	42276753eeb857b3c085cead2c10315048c29a62 (diff)
download	cascades-7fd038c7b0f00f6db411e9c5037133fd09509a8d.tar.gz