2 files changed, 48 insertions, 39 deletions
diff --git a/paper/paper.tex b/paper/paper.tex
index fbc6521..c546ed7 100644
--- a/paper/paper.tex
+++ b/paper/paper.tex
@@ -56,6 +56,7 @@
 \newcommand{\prob}[1]{\P\left[#1\right]}
 \newcommand{\inprod}[2]{#1 \cdot #2}
 \newcommand{\defeq}{\equiv}
+\DeclareMathOperator*{\argmax}{argmax}
 
 \newtheorem{theorem}{Theorem}
 \newtheorem{lemma}{Lemma}
diff --git a/paper/sections/model.tex b/paper/sections/model.tex
index 3c0a354..11de7d2 100644
--- a/paper/sections/model.tex
+++ b/paper/sections/model.tex
@@ -1,52 +1,60 @@
-\subsection{Influence Cascades}
+\paragraph{Influence Cascades}
 
-We consider a weighted graph $G$ over the set of vertices $V$. The edge weights
-are given by an (unknown) weight matrix $\Theta:V^2\to\reals^+$. $\Theta_{i,j}
-= 0$ simply expresses that there is no edge from $i$ to $j$. We define $m=|V|$,
-and for $j\in V$, we write $\theta_j$ the $j$-th column of $\Theta$: this is
-the indicator vector of the parents of $j$ in $G$.
+We consider a weighted graph $G$ over the set of vertices $V$ and we define
+$m\defeq |V|$. The edge weights are given by an (unknown) weight matrix
+$\Theta:V^2\to\reals^+$. $\Theta_{i,j} = 0$ simply expresses that there is no
+edge from $i$ to $j$ For $j\in V$, we write $\theta_j$ the $j$-th column of
+$\Theta$: this is the indicator vector of the parents of $j$ in $G$.
 
 An \emph{influence cascade} is a homogeneous discrete-time Markov process over
 $\{0,1\}^m$.  More precisely, we represent the set of nodes infected at time
-step $t-1\in\ints^+$ by a vector $x^{t}\in\{0,1\}^{m}$ where $x^{t}_k = 1$ can
-be interpreted as \emph{node $k$ was infected at time step $t-1$}.
+step $t-1\in\ints^+$ by a random vector $X^{t}\in\{0,1\}^{m}$ where $X^{t}_k
+= 1$ can be interpreted as \emph{node $k$ was infected at time step $t-1$}.
 
-We will restrict ourselves to influence processes such that for $j\in V$,
-$x_j^{t+1}$ is a Bernouilli variable whose parameter can be expressed as
-a function $f$ of $\inprod{\theta_j}{x^t}$, where $\inprod{a}{b}$ is the inner
-product of $a$ and $b$ in $\reals^m$:
-$\theta_j$ and $x^t$:
-\begin{equation}\label{eq:markov}
-    \P\big[x_j^{t+1} = 1\,|\, x^t\big] \defeq f(\inprod{\theta_j}{x^t}).
-\end{equation}
-As we will later see, this type of influence processes encompasses the most
-widely used influence processes: the Voter model (V), the Independent Cascade
-model (IC), and the Linear Threshold model (T).
+\paragraph{Problem} In what follows, we assume that we know how to express the
+transition probabilities as a function of the weight matrix $\Theta$ and the
+goal is to infer this matrix by observing influence cascades.
 
-\subsection{Problem}
-
-Our goal is to infer the weight matrix $\Theta$ by observing influence
-cascades. Given the form of \eqref{eq:markov}, we will focus on a single row
-$\theta_j$ of $\Theta$ and we will write $y^t_j \defeq x_j^{t+1}$. When there
-is no ambiguity, we will simply write $y^t = y^t_j$ and $\theta= \theta_j$.
+We use standard maximum likelihood estimation (MLE). However, we assume that
+the parameter we wish to recover, namely the matrix $\Theta$, is sparse in the
+following sense: each node in the graph $G$ has at most $s$ parents. In other
+words, the columns of $G$ have at most $s$ non-zero entries.  To factor this
+additional assumption in the maximum likelihood estimation we use the now
+classic $\ell_1$ regularization. For an observation $(x_1,\ldots,x_n)$, we
+estimate $\hat{\Theta}$ by:
+\begin{equation}\label{eq:mle}
+    \hat{\Theta} = \argmax_{\Theta}\mathcal{L}(\Theta\,|\, x^1,\ldots, x^n)
+    + \lambda\|\Theta\|_1
+\end{equation}
+where $\mathcal{L}$  is the log-likelihood function and $\lambda$ is the
+regularization factor.
 
-The input of our problem is a set of observations $(x^1, y^1),\ldots,(x^n,
-y^n)$. If we observe more than one cascade, we simply concatenate the
-observations from each cascade to form a single longer cascade.
-Equation~\eqref{eq:markov} still holds for each observation.
+The goal of this paper is to characterize $\hat{\Theta}$ in terms of:
+\begin{itemize}
+    \item \emph{efficiency}: how well does $\hat{\Theta}$ approximate the true
+        matrix $\Theta$? In particular can we bound its accuracy as a function
+        of the number of observations?
+    \item \emph{model consistency}: assuming that the true matrix $\Theta$ is
+        $s$-sparse as defined above, can we recover the support of $\Theta$
+        from the estimator $\hat{\Theta}$, and if so, how many observations are
+        required?
+\end{itemize}
 
-With these notations, the set of observations follows a Generalized Linear
-Model with Bernoulli distribution and link function $g$ such that $g^{-1}
-= f$. The log-likelihood of $\theta$ can be written:
-\begin{displaymath}
-    \mathcal{L}(\theta\,|\, x, y) = \sum_{t=1}^ny^t \log f(\inprod{\theta}{x^t})
-    + (1-y^t)\log \big(1-f(\inprod{\theta}{x^t})\big)
-\end{displaymath}
+We now turn to two widely used influence cascade models, the Independent
+Cascade Model (IC) and the Linear Threshold Model (LT) and specify the MLE
+problem \eqref{eq:mle} for these models
 
-\paragraph{Sparsity}
+\subsection{Linear Threshold Model}
 
+\subsection{Independent Cascade Model}
 
-\subsection{Voter Model}
+\subsection{Unification}
 
-\subsection{Independent Cascade Model}
+$X_j^{t+1}$ is a Bernoulli variable whose parameter can be written:
+a function $f$ of $\inprod{\theta_j}{x^t}$, where $\inprod{a}{b}$ is the inner
+product of $a$ and $b$ in $\reals^m$:
+$\theta_j$ and $x^t$:
 
+\begin{equation}\label{eq:markov}
+    \P\big[x_j^{t+1} = 1\,|\, x^t\big] \defeq f(\inprod{\theta_j}{x^t}).
+\end{equation}