1 files changed, 48 insertions, 41 deletions
diff --git a/paper/sections/model.tex b/paper/sections/model.tex
index 41c00da..01c7860 100644
--- a/paper/sections/model.tex
+++ b/paper/sections/model.tex
@@ -8,11 +8,11 @@ following properties:
 \begin{enumerate}
     \item Conditioned on the previous time step, the transition events between
         two states in $\{0,1,\dots, K-1\}$ for each $i \in V$ are mutually
-        independent.
+        independent across $i\in V$.
 \item Of the $K$ possible states, there exists a \emph{contagious state} such
-    that all transition probabilities of the Markov process are either constant
-    or can be expressed as a function of the graph parameters $\Theta$ and the
-    set of ``contagious nodes'' at the previous time step.
+    that all transition probabilities of the Markov process can be expressed as
+    a function of the graph parameters $\Theta$ and the set of ``contagious
+    nodes'' at the previous time step.
   \item The initial probability over ${\{0, 1, \dots, K-1\}}^V$ is such that all
   nodes can eventually reach a \emph{contagious state} with non-zero
   probability. The ``contagious'' nodes at $t=0$ are called
@@ -20,16 +20,18 @@ following properties:
 \end{enumerate}
 
 In other words, a cascade model describes a diffusion process where a set of
-contagious nodes ``influence'' other nodes in the graph to become contagious. An
-\emph{influence cascade} is a realisation of this random process, \emph{i.e.}
-the successive states of the nodes in graph ${\cal G}$. Note that both the
-``single source'' assumption made in~\cite{Daneshmand:2014} and
-\cite{Abrahao:13} as well as the ``uniformly chosen source set'' assumption made
-in~\cite{Netrapalli:2012} verify condition 3. Also note that the multiple-source
-node assumption does not reduce to the single-source assumption: even
-non-overlapping cascades started from two nodes that are far apart cannot in
-practice be attributed to either process without knowledge of the newtork
-topology.
+contagious nodes ``influence'' other nodes in the graph to become contagious.
+An \emph{influence cascade} is a realisation of this random process,
+\emph{i.e.} the successive states of the nodes in graph ${\cal G}$. Note that
+both the ``single source'' assumption made in~\cite{Daneshmand:2014} and
+\cite{Abrahao:13} as well as the ``uniformly chosen source set'' assumption
+made in~\cite{Netrapalli:2012} verify condition 3. Also note that the
+multiple-source node assumption does not reduce to the single-source
+assumption, even under the assumption that cascades do not overlap. Imagining
+for example two cascades starting from two different nodes; since we do not
+observe which node propagated the contagion to which node, we cannot attribute
+an infected node to either cascade and treat the problem as two independent
+cascades.
 
 In the context of Network Inference,~\cite{Netrapalli:2012} focus
 on the well-known discrete-time independent cascade model recalled below, which
@@ -130,10 +132,12 @@ Defining $\Theta_{i,j} \defeq \log(1-p_{i,j})$, this can be rewritten as:
 Therefore, the independent cascade model is a Generalized Linear Cascade model
 with inverse link function $f : z \mapsto 1 - e^z$.
 
-In Section~\ref{sec:results}, we show bounds on the \emph{transformed}
-parameters $\Theta$ of the graph.  Fortunately, a bound on $\|\hat\theta -
-\theta^*\|_2$ directly implies a bound on $\|\hat p - p^*\|_2$ as shown by the
-following lemma:
+Note that to write the Independent Cascade Model as a Generalized Linear
+Cascade Model, we had to introduce the change of variable $\Theta_{i,j}
+= \log(1-p_{i,j})$. The recovery results in Section~\ref{sec:results} hold on
+the $\Theta_j$ parameters. Fortunately, the following lemma shows that the
+recovery error on $\Theta_j$ is an upper bound on the error on the original
+$p_j$ parameters.
 
 \begin{lemma}
     $\|\hat{\theta} - \theta^* \|_2 \geq \|\hat{p} - p^*\|_2$.
@@ -169,39 +173,42 @@ with inverse link function $f: z \mapsto z$.
 Another motivation for the Generalized Linear Cascade model is that it captures
 the time-discretized formulation of the well-studied  continuous-time
 independent cascade model with exponential transmission function (CICE)
-of~\cite{GomezRodriguez:2010, Abrahao:13, Daneshmand:2014}.  Suppose that time
-is binned into intervals of length $\epsilon$ and let $X^k$ be the set of nodes
-`infected' before or during the $k^{th}$ time interval.  Note that contrary to
-the discrete-time independent cascade model, $X^k_j = 1 \implies X^{k+1}_j =
-1$.
+of~\cite{GomezRodriguez:2010, Abrahao:13, Daneshmand:2014}. Assume that the
+temporal resolution of the discretization is $\varepsilon$, \emph{i.e.} all
+nodes whose (continuous) infection time is within the interval $[k\varepsilon,
+    (k+1)\varepsilon)$ are considered infected at (discrete) time step $t$.
+    pLet $X^k$ be the indicator vector of the set of nodes `infected' before or
+    during the $k^{th}$ time interval.  Note that contrary to the discrete-time
+    independent cascade model, $X^k_j = 1 \implies X^{k+1}_j = 1$, that is,
+    there is no immune state and nodes remain contagious forever.
 
-Let $\exp(p)$ be an exponentially-distributed random variable of parameter $p$
+Let $\text{Exp}(p)$ be an exponentially-distributed random variable of parameter $p$
 and let $\Theta_{i,j}$ be the rate of transmission along directed edge $(i,j)$
 in the CICE model.  By the memoryless property of the exponential, if $X^k_j
 \neq 1$:
 \begin{align*}
   \mathbb{P}(X^{k+1}_j = 1 | X^k) & = \mathbb{P}(\min_{i \in {\cal N}(j)}
-  \exp(\Theta_{i,j}) \leq \epsilon) \\
-  & = \mathbb{P}(\exp( \sum_{i=1}^m \Theta_{i,j} X^t_i) \leq \epsilon) \\
+    \text{Exp}(\Theta_{i,j}) \leq \epsilon) \\
+    & = \mathbb{P}(\text{Exp}( \sum_{i=1}^m \Theta_{i,j} X^t_i) \leq \epsilon) \\
   & = 1 - e^{- \epsilon \Theta_j \cdot X^t}
 \end{align*}
-Note that here we implicitly let $\Theta_{i,j} = 0$ if $(i,j)$ is not a
-directed edge of the graph. Note that this formulation is also consistent when
-$X^k_j = 1$ by considering the dummy variables $\forall i, \Theta_{i,i} =
-+\infty$.  Therefore, the $\epsilon$-time-binned CICE-induced process is a
+Therefore, the $\epsilon$-discretized CICE-induced process is a
 Generalized Linear Cascade model with inverse link function $f:z\mapsto
 1-e^{-\epsilon\cdot z}$.
 
 \subsubsection{Logistic Cascades}
 \label{sec:logistic_cascades}
-``Logistic cascades'' (where the inverse link function $f$ is the logistic
-function) capture the idea that there is a threshold around which each
-additional neighbor's contribution becomes significant: they constitue a
-differentiable approximation to the more-commonly studied Linear Threshold
-model~\cite{Kempe:03}. As we will see later in the analysis, the Hessian of our
-optimization program simplifies in the case of a logistic inverse link function
-to $\nabla^2\mathcal{L}(\theta^*) = \frac{1}{|\mathcal{T}|}XX^T$ where $X$ is
-the observation matrix $[x^1 \ldots x^\mathcal{|T|}]$.
+``Logistic cascades'' is the specific case where the inverse link function is
+given by the logistic function:
+\begin{displaymath}
+    f(z) = \frac{1}{1+e^{-z + t}}
+\end{displaymath}
+Intuitively, this captures the idea that there is a threshold $t$ such that
+when the sum of the parameters of the infected parents of a node is larger than
+the threshold, the probability of getting infected is close to one. This is
+a smooth approximation of the hard threshold rule of the Linear Threshold
+Model~\cite{Kempe:03}. As we will see later in the analysis, for logistic
+cascades, the graph inference problem becomes a linear inverse problem.
 
 % \subsection{The Linear Threshold Model}
 
@@ -233,11 +240,11 @@ the observation matrix $[x^1 \ldots x^\mathcal{|T|}]$.
   \includegraphics[scale=.4]{figures/drawing.pdf}
   \caption{Illustration of the sparse-recovery approach. Our objective is to
   recover the unknown weight vector $\theta_j$ for each node $j$. We observe a
-Bernoulli realization of the $f$ transform of the matrix-vector product, where
-the measurement matrix encodes which nodes are ``contagious'' at each time step}
+Bernoulli realization whose parameters are given by applying $f$ to the
+matrix-vector product, where the measurement matrix encodes which nodes are
+``contagious'' at each time step.}
 \end{figure}
 
-
 Inferring the model parameter $\Theta$ from observed influence cascades is the
 central question of the present work. Recovering the edges in $E$ from observed
 influence cascades is a well-identified problem known as the \emph{Network