1 files changed, 36 insertions, 29 deletions
diff --git a/paper/sections/model.tex b/paper/sections/model.tex
index 43de785..b478e94 100644
--- a/paper/sections/model.tex
+++ b/paper/sections/model.tex
@@ -2,7 +2,7 @@ We consider a graph ${\cal G}= (V, E, \Theta)$, where $\Theta$ is a $|V|\times
 |V|$ matrix of parameters describing the edge weights of $\mathcal{G}$.
 Intuitively, $\Theta_{i,j}$ captures the ``influence'' of node $i$ on node $j$.
 Let $m\defeq |V|$. For each node $j$, let $\theta_{j}$ be the $j^{th}$ column
-vector of $\Theta$.  A {\color{red} discrete-time} \emph{Cascade model} is a
+vector of $\Theta$.  A discrete-time \emph{Cascade model} is a
 Markov process over a finite state space ${\{0, 1, \dots, K-1\}}^V$ with the
 following properties:
 \begin{enumerate}
@@ -25,16 +25,19 @@ contagious nodes ``influence'' other nodes in the graph to become contagious. An
 the successive states of the nodes in graph ${\cal G}$. Note that both the
 ``single source'' assumption made in~\cite{Daneshmand:2014} and
 \cite{Abrahao:13} as well as the ``uniformly chosen source set'' assumption made
-in~\cite{Netrapalli:2012} verify condition 3. {\color{red} why is it less
-restrictive? explain}
+in~\cite{Netrapalli:2012} verify condition 3. Also note that the multiple-source
+node assumption does not reduce to the single-source assumption: even
+non-overlapping cascades started from two nodes that are far apart cannot in
+practice be attributed to either process without knowledge of the newtork
+topology.
 
-In the context of Graph Inference,~\cite{Netrapalli:2012} focus
+In the context of Network Inference,~\cite{Netrapalli:2012} focus
 on the well-known discrete-time independent cascade model recalled below, which
 \cite{Abrahao:13} and~\cite{Daneshmand:2014} generalize to continuous time. We
 extend the independent cascade model in a different direction by considering
 a more general class of transition probabilities while staying in the
 discrete-time setting. We observe that despite their obvious differences, both
-the independent cascade and the voter models make the graph inference problem
+the independent cascade and the voter models make the network inference problem
 similar to the standard generalized linear model inference problem. In fact, we
 define a class of diffusion processes for which this is true: the
 \emph{Generalized Linear Cascade Models}. The linear threshold model is
@@ -121,18 +124,15 @@ Defining $\Theta_{i,j} \defeq \log(1-p_{i,j})$, this can be rewritten as:
     \tag{IC}
     \P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
     = 1 - \prod_{i = 1}^m e^{\Theta_{i,j}X^t_i}
-    = 1 - e^{\inprod{\theta_j}{X^t}}
+    = 1 - e^{\inprod{\Theta_j}{X^t}}
 \end{equation}
 
 Therefore, the independent cascade model is a Generalized Linear Cascade model
 with inverse link function $f : z \mapsto 1 - e^z$.
 
-Note that interpreting it in the case of the Independent Cascade Model requires
-one more step. Indeed, to cast it as a generalized linear cascade model, we had
-to perform the transformation $\Theta_{i,j} = \log(1-p_{i,j})$, where $p_{i,j}$
-are the infection probabilities. Fortunately, if we estimate $p_j$ through
-$\hat{p}_{j} = \hat{\theta}_j$, a bound on $\|\hat\theta - \theta^*\|_2$
-directly implies a bound on $\|\hat p - p^*\|_2$:
+In Section~\ref{sec:results}, we show bounds on the parameter of the graph.
+Fortunately, a bound on $\|\hat\theta - \theta^*\|_2$ directly implies a bound
+on $\|\hat p - p^*\|_2$ as shown by the following lemma:
 
 \begin{lemma}
     $\|\hat{\theta} - \theta^* \|_2 \geq \|\hat{p} - p^*\|_2$.
@@ -166,39 +166,43 @@ with inverse link function $f: z \mapsto z$.
 
 \subsubsection{Discretization of Continous Model}
 
-Anoter motivation for the Generalized Linear Cascade model is that it captures
+Another motivation for the Generalized Linear Cascade model is that it captures
 the time-discretized formulation of the well-studied  continuous-time
 independent cascade model with exponential transmission function (CICE)
 of~\cite{GomezRodriguez:2010, Abrahao:13, Daneshmand:2014}.  Suppose that time
 is binned into intervals of length $\epsilon$ and let $X^k$ be the set of nodes
 `infected' before or during the $k^{th}$ time interval.  Note that contrary to
-the discrete-time independent cascade model, $X^k_i = 1 \implies X^{k+1}_i =
-1$.  Let $\exp(p)$ be an exponentially-distributed random variable of parameter
-$p$.  By the memoryless property of the exponential, if $X^k_j \neq 1$:
+the discrete-time independent cascade model, $X^k_j = 1 \implies X^{k+1}_j =
+1$.
+
+Let $\exp(p)$ be an exponentially-distributed random variable of parameter $p$
+and let $p_{i,j}$ be the rate of transmission along directed edge $(i,j)$ in the
+CICE model.  By the memoryless property of the exponential, if $X^k_j \neq 1$:
 \begin{align*}
   \mathbb{P}(X^{k+1}_j = 1 | X^k) & = \mathbb{P}(\min_{i \in {\cal N}(j)}
   \exp(p_{i,j}) \leq \epsilon) \\
   & = \mathbb{P}(\exp( \sum_{i=1}^m \Theta_{i,j} X^t_i) \leq \epsilon) \\
-  & = 1 - e^{- \epsilon \cdot \theta_j \cdot X^t}
+  & = 1 - e^{- \epsilon \Theta_j \cdot X^t}
 \end{align*}
 Note that here we implicitly let $\Theta_{i,j} = 0$ if $(i,j)$ is not a
 directed edge of the graph. Note that this formulation is also consistent when
 $X^k_j = 1$ by considering the dummy variables $\forall i, \Theta_{i,i} =
-+\infty$.  Therefore, the $\epsilon-$binned-process of the continuous-time
-model with exponential transmission function is a Generalized Linear Cascade
-model with inverse link function $f:z\mapsto 1-e^{-\epsilon\cdot z}$.
++\infty$.  Therefore, the $\epsilon$-time-binned CICE-induced process is a
+Generalized Linear Cascade model with inverse link function $f:z\mapsto
+1-e^{-\epsilon\cdot z}$.
 
 \subsubsection{Logistic Cascades}
 
 Consider the specific case of ``logistic cascades'' (where $f$ is the logistic
-function). This model captures the idea that there is a threshold after which,
-each additional neighbor's influence contribution, becomes significant. In this
-way, logistic cascades can be thought of approximating the more-commonly
-studied Linear Threshold model TODO: CITE. As we will see later in the
-analysis, the Hessian of our optimization program simplifies to
-$\nabla^2\mathcal{L}(\theta^*) = \frac{1}{|\mathcal{T}|}XX^T$ where $X$ is the
-observation matrix $[x^1 \ldots x^\mathcal{|T|}]$. The (RE)-condition is then
-equivalent to the assumption made in the Lasso analysis of \cite{bickel:2009}.
+function). This model captures the idea that there is a threshold around which
+each additional neighbor's contribution becomes significant: logistic
+cascades can be thought of approximating the more-commonly studied Linear
+Threshold model~\cite{Kempe:03}. As we will see later in the
+analysis, the Hessian of our optimization program simplifies in the case of a
+logistic inverse link function to $\nabla^2\mathcal{L}(\theta^*) =
+\frac{1}{|\mathcal{T}|}XX^T$ where $X$ is the observation matrix $[x^1 \ldots
+x^\mathcal{|T|}]$. The (RE)-condition we introduce subsequently is then
+equivalent to the assumption made in the Lasso analysis of~\cite{bickel:2009}.
 
 
 
@@ -230,7 +234,10 @@ equivalent to the assumption made in the Lasso analysis of \cite{bickel:2009}.
 
 \begin{figure}
   \includegraphics[scale=.4]{figures/drawing.pdf}
-  \caption{Illustration of the sparse-recovery approach}
+  \caption{Illustration of the sparse-recovery approach: the measurement matrix
+  is known, as is a Bernoulli realization of the matrix-vector product via the
+non-linear transformation induced by $f$. Our objective is to recover the
+unknown weight vector $\theta_i$ for each node $i$.}
 \end{figure}