1 files changed, 89 insertions, 65 deletions
diff --git a/paper/sections/model.tex b/paper/sections/model.tex
index 5097c63..98712d7 100644
--- a/paper/sections/model.tex
+++ b/paper/sections/model.tex
@@ -1,55 +1,35 @@
-\paragraph{Influence Cascades}
+We consider a graph $G= (V, E)$ and we define $m\defeq |V|$. A \emph{cascade
+model} is a discrete-time homogeneous Markov process over $\{0,1\}^m$. Denoting
+by $X^t$ the state of the process at time step $t-1$, we interpret $X^t_j = 1$
+for some node $j\in V$ as ``node $j$ is infected at time step $t-1$''. An
+\emph{influence cascade} is simply a realisation of the random process
+described by a cascade model.
 
-We consider a weighted graph $G$ over the set of vertices $V$ and we define
-$m\defeq |V|$. The edge weights are given by an (unknown) weight matrix
-$\Theta:V^2\to\reals^+$. $\Theta_{i,j} = 0$ simply expresses that there is no
-edge from $i$ to $j$ For $j\in V$, we write $\theta_j$ the $j$-th column of
-$\Theta$: this is the indicator vector of the parents of $j$ in $G$.
+In what follows, we describe two widely used cascade models, the Independent
+Cascade model (IC) and the Linear Threshold model (LT). In these models, the
+transition probabilities of the Markov process depend on unknown parameters
+describing how much the nodes in $G$ influence each other.
 
-An \emph{influence cascade} is a homogeneous discrete-time Markov process over
-$\{0,1\}^m$.  More precisely, we represent the set of nodes infected at time
-step $t-1\in\ints^+$ by a random vector $X^{t}\in\{0,1\}^{m}$ where $X^{t}_k
-= 1$ can be interpreted as \emph{node $k$ was infected at time step $t-1$}.
-
-\paragraph{Problem} In what follows, we assume that we know how to express the
-transition probabilities as a function of the weight matrix $\Theta$ and the
-goal is to infer this matrix by observing influence cascades.
-
-We use standard maximum likelihood estimation (MLE). However, we assume that
-the parameter we wish to recover, namely the matrix $\Theta$, is sparse in the
-following sense: each node in the graph $G$ has at most $s$ parents. In other
-words, the columns of $G$ have at most $s$ non-zero entries.  To factor this
-additional assumption in the maximum likelihood estimation we use the now
-classic $\ell_1$ regularization. For an observation $(x_1,\ldots,x_n)$, we
-estimate $\hat{\Theta}$ by:
-\begin{equation}\label{eq:mle}
-    \hat{\Theta} = \argmax_{\Theta}\mathcal{L}(\Theta\,|\, x^1,\ldots, x^n)
-    + \lambda\|\Theta\|_1
-\end{equation}
-where $\mathcal{L}$  is the log-likelihood function and $\lambda$ is the
-regularization factor.
-
-The goal of this paper is to characterize $\hat{\Theta}$ in terms of:
-\begin{itemize}
-    \item \emph{efficiency}: how well does $\hat{\Theta}$ approximate the true
-        matrix $\Theta$? In particular can we bound its accuracy as a function
-        of the number of observations?
-    \item \emph{model consistency}: assuming that the true matrix $\Theta$ is
-        $s$-sparse as defined above, can we recover the support of $\Theta$
-        from the estimator $\hat{\Theta}$, and if so, how many observations are
-        required?
-\end{itemize}
-
-We now turn to two widely used influence cascade models, the Independent
-Cascade Model (IC) and the Linear Threshold Model (LT) and specify the MLE
-problem \eqref{eq:mle} for these models.
+Recovering these parameters from observed influence cascades is the central
+question of the present work and is important for the following two reasons:
+\begin{enumerate}
+    \item knowing the influence parameters allows for future prediction of
+        influence cascades.
+    \item when the set of edges $E$ is unknown, it can be recovered from the
+        influence parameters.
+\end{enumerate}
+We note that recovering the edges in $E$ from observed influence casccades is
+a well-identified problem known as the \emph{Graph Inference} problem. However,
+recovering the influence parameters is no less important and has been seemingly
+overlooked so far. The machinery developped in this work addresses both
+problems.
 
 \subsection{Independent Cascade Model}
 
 In the independent cascade model, nodes can be either uninfected, active or
 infected. All nodes start either as uninfected or active. At each time step
-$t$, for each edge $(i,j)$ where $j$ is uninfected and $i$ is active, $j$
-attempts to infect $j$ with probability $p_{i,j}$. If $i$ succeeds, $j$ will
+$t$, for each edge $(i,j)$ where $j$ is uninfected and $i$ is active, $i$
+attempts to infect $j$ with probability $p_{i,j}\in[0,1]$. If $i$ succeeds, $j$ will
 become active at time step $t+1$. Regardless of $i$'s success, node $i$ will be
 infected at time $t+1$: nodes stay active for only one time step. The cascade
 continues until no active nodes remain.
@@ -58,35 +38,79 @@ If we denote by $X^t$ the indicator variable of the set of active nodes at time
 step $t-1$, then if $j$ is uninfected at time step $t-1$, we have:
 \begin{displaymath}
     \P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
-    = 1 - \prod_{i = 1}^m (1 - p_{i,j})^{X^t_i}
-    = 1 - e^{\inprod{\theta_j}{X^t}}
+    = 1 - \prod_{i = 1}^m (1 - p_{i,j})^{X^t_i}.
 \end{displaymath}
-where we defined $\Theta_{i,j} \defeq \log(1-p_{i,j})$.
+Defining $\Theta_{i,j} \defeq \log(1-p_{i,j})$, this can be rewritten
+as:
+\begin{equation}\label{eq:ic}
+    \tag{IC}
+    \P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
+    = 1 - \prod_{i = 1}^m e^{\Theta_{i,j}X^t_i}
+    = 1 - e^{\inprod{\theta_j}{X^t}}
+\end{equation}
+where we defined $\theta_j\defeq (\Theta_{1,j},\ldots,\Theta_{m,j})$.
 
 \subsection{Linear Threshold Model}
 
 In the Linear Threshold Model, each node $j\in V$ has a threshold $t_j$ drawn
-uniformly from the interval $[0,1]$. We assume that the weights given by
-$\Theta$ are such that for each node $j\in V$, $\sum_{i=1}^m \Theta_{i,j} \leq
-1$.
+uniformly from the interval $[0,1]$. Furthermore, there is a weight
+$\Theta_{i,j}\in[0,1]$ for each edge $(i,j)$. We assume that the weights are
+such that for each node $j\in V$, $\sum_{i=1}^m \Theta_{i,j} \leq 1$.
 
-Nodes can be either infected or uninfected. At each time step, each uninfected
-node $j$ becomes infected if the sum of the weights $\Theta_{i,j}$ where $i$ is
-an infected parent of $j$ is larger than $j$'s threshold $t_j$. Denoting by
-$X^t$ the indicator variable of the set of infected nodes at time step $t-1$,
-if $j\in V$ is uninfected at time step $t-1$, then:
-\begin{displaymath}
+Nodes can be either uninfected or active. At each time step, each uninfected
+node $j$ becomes active if the sum of the weights $\Theta_{i,j}$ for $i$ an
+active parent of $j$ is larger than $j$'s threshold $t_j$. Denoting by $X^t$
+the indicator variable of the set of active nodes at time step $t-1$, if
+$j\in V$ is uninfected at time step $t-1$, then:
+\begin{equation}
+    \label{eq:lt}
+    \tag{LT}
     \P\big[X^{t+1}_j = 1\,|\, X^{t}\big] = \sum_{i=1}^m \Theta_{i,j}X^t_i
     = \inprod{\theta_j}{X^t}
-\end{displaymath}
+\end{equation}
+where we defined again $\theta_j\defeq (\Theta_{1,j},\ldots,\Theta_{m,j})$.
 
-\subsection{Unification}
+\subsection{Maximum Likelihood Estimation}
+
+In both models, the unknown influence parameters are described by an $m\times
+m$ matrix $\Theta$. Furthermore, the information of the edges of $G$ is fully
+contained in $\Theta$:
+\begin{displaymath}
+    (i,j)\in E\Leftrightarrow  \Theta_{i,j} \neq 0
+\end{displaymath}
 
-$X_j^{t+1}$ is a Bernoulli variable whose parameter can be written:
-a function $f$ of $\inprod{\theta_j}{x^t}$, where $\inprod{a}{b}$ is the inner
-product of $a$ and $b$ in $\reals^m$:
-$\theta_j$ and $x^t$:
+Given observations $(x^1,\ldots,x^n)$ of a cascade model, we can recover
+$\Theta$ via Maximum Likelihood Estimation (MLE). Denoting by $\mathcal{L}$ the
+log-likelihood function, we consider the following $\ell_1$-regularized MLE
+problem:
+\begin{displaymath}
+    \hat{\Theta} \in \argmax_{\Theta} \mathcal{L}(\Theta\,|\,x^1,\ldots,x^n)
+    + \lambda\|\Theta\|_1
+\end{displaymath}
+where $\lambda$ is the regularization factor which helps at preventing
+overfitting as well as controlling for the sparsity of the solution.
 
-\begin{equation}\label{eq:markov}
-    \P\big[x_j^{t+1} = 1\,|\, x^t\big] \defeq f(\inprod{\theta_j}{x^t}).
+Note that both the (IC) and the (LT) models are decomposable in the following
+sense: the state evolution of each node at each time step happens independently
+of the following nodes \emph{and} the state evolution of node $j\in V$ only
+depends on $\theta_j$, the $j$-th column of $\Theta$. Consequently, the
+log-likelihood function $\mathcal{L}$ can be written as a sum of $m$ terms,
+each term $j\in\{1,\ldots,m\}$ only involving $\theta_j$. Since this is equally
+true for $\|\Theta\|_1$, each column $\theta_j$ of $\Theta$ can be estimated by
+a separate optimization program:
+\begin{equation}\label{eq:pre-mle}
+    \hat{\theta}_j \in \argmax_{\theta} \mathcal{L}_j(\theta_j\,|\,x^1,\ldots,x^n)
+    + \lambda\|\theta_j\|_1
 \end{equation}
+
+Furthermore, the state evolution of a node $j\in V$ has the same structure in
+both models: the transition from uninfected to active at time step $t+1$ is
+controlled by a Bernoulli variable whose parameter can be written
+$f(\inprod{\theta_j}{x^t})$ for some function $f$. Hence, if we denote by $n_j$
+the first step at which $j$ becomes active, we can rewrite the  MLE program
+\eqref{eq:pre-mle}:
+\begin{multline}
+    \hat{\theta}_j \in \argmax_{\theta} 
+    \sum_{t=1}^{n_j-1}\log\big(1-f(\inprod{\theta_j}{x^t})\big)\\
+\log f(\inprod{\theta_j}{x^{n_j}})+ \lambda\|\theta_j\|
+\end{multline}