aboutsummaryrefslogtreecommitdiffstats
path: root/paper/sections/model.tex
diff options
context:
space:
mode:
Diffstat (limited to 'paper/sections/model.tex')
-rw-r--r--paper/sections/model.tex63
1 files changed, 31 insertions, 32 deletions
diff --git a/paper/sections/model.tex b/paper/sections/model.tex
index acec974..b704b9e 100644
--- a/paper/sections/model.tex
+++ b/paper/sections/model.tex
@@ -107,9 +107,9 @@ In the independent cascade model, nodes can be either susceptible, contagious
or immune. At $t=0$, all source nodes are ``contagious'' and all remaining
nodes are ``susceptible''. At each time step $t$, for each edge $(i,j)$ where
$j$ is susceptible and $i$ is contagious, $i$ attempts to infect $j$ with
-probability $p_{i,j}\in(0,1]$; the infection attempts are mutually independent.
+probability $p_{i,j}\in[0,1]$; the infection attempts are mutually independent.
If $i$ succeeds, $j$ will become contagious at time step $t+1$. Regardless of
-$i$'s success, node $i$ will be immune at time $t+1$. In other words, nodes
+$i$'s success, node $i$ will be immune at time $t+1$, such that nodes
stay contagious for only one time step. The cascade process terminates when no
contagious nodes remain.
@@ -130,9 +130,10 @@ Defining $\Theta_{i,j} \defeq \log(1-p_{i,j})$, this can be rewritten as:
Therefore, the independent cascade model is a Generalized Linear Cascade model
with inverse link function $f : z \mapsto 1 - e^z$.
-In Section~\ref{sec:results}, we show bounds on the parameter of the graph.
-Fortunately, a bound on $\|\hat\theta - \theta^*\|_2$ directly implies a bound
-on $\|\hat p - p^*\|_2$ as shown by the following lemma:
+In Section~\ref{sec:results}, we show bounds on the \emph{transformed}
+parameters $\Theta$ of the graph. Fortunately, a bound on $\|\hat\theta -
+\theta^*\|_2$ directly implies a bound on $\|\hat p - p^*\|_2$ as shown by the
+following lemma:
\begin{lemma}
$\|\hat{\theta} - \theta^* \|_2 \geq \|\hat{p} - p^*\|_2$.
@@ -146,15 +147,14 @@ $|\log (1 - p) - \log (1-p')| \geq \max(1 - \frac{1-p}{1-p'},
\subsubsection{The Linear Voter Model}
In the Linear Voter Model, nodes can be either \emph{red} or \emph{blue}.
-Without loss of generality, we can suppose that the
-\emph{blue} nodes are contagious. The parameters of the graph are normalized
-such that $\forall i, \ \sum_j \Theta_{i,j} = 1$ and we assume that
-$\Theta_{i,i}$ is always non-zero, meaning that all nodes have self-loops.
-Each round, every node $j$ independently chooses
-one of its neighbors with probability $\Theta_{i,j}$ and adopts their color. The
-cascades stops at a fixed horizon time $T$ or if all nodes are of the same color.
-If we denote by $X^t$ the indicator variable of the set of blue nodes at time
-step $t$, then we have:
+Without loss of generality, we can suppose that the \emph{blue} nodes are
+contagious. The parameters of the graph are normalized such that $\forall i, \
+\sum_j \Theta_{i,j} = 1$ and we assume that $\Theta_{i,i}$ is always non-zero,
+meaning that all nodes have self-loops. Each round, every node $j$
+independently chooses one of its neighbors with probability $\Theta_{i,j}$ and
+adopts their color. The cascades stops at a fixed horizon time $T$ or if all
+nodes are of the same color. If we denote by $X^t$ the indicator variable of
+the set of blue nodes at time step $t$, then we have:
\begin{equation}
\mathbb{P}\left[X^{t+1}_j = 1 | X^t \right] = \sum_{i=1}^m \Theta_{i,j} X_i^t =
\inprod{\Theta_j}{X^t}
@@ -176,11 +176,12 @@ the discrete-time independent cascade model, $X^k_j = 1 \implies X^{k+1}_j =
1$.
Let $\exp(p)$ be an exponentially-distributed random variable of parameter $p$
-and let $p_{i,j}$ be the rate of transmission along directed edge $(i,j)$ in the
-CICE model. By the memoryless property of the exponential, if $X^k_j \neq 1$:
+and let $\Theta_{i,j}$ be the rate of transmission along directed edge $(i,j)$
+in the CICE model. By the memoryless property of the exponential, if $X^k_j
+\neq 1$:
\begin{align*}
\mathbb{P}(X^{k+1}_j = 1 | X^k) & = \mathbb{P}(\min_{i \in {\cal N}(j)}
- \exp(p_{i,j}) \leq \epsilon) \\
+ \exp(\Theta_{i,j}) \leq \epsilon) \\
& = \mathbb{P}(\exp( \sum_{i=1}^m \Theta_{i,j} X^t_i) \leq \epsilon) \\
& = 1 - e^{- \epsilon \Theta_j \cdot X^t}
\end{align*}
@@ -193,15 +194,14 @@ Generalized Linear Cascade model with inverse link function $f:z\mapsto
\subsubsection{Logistic Cascades}
\label{sec:logistic_cascades}
-Consider the specific case of ``logistic cascades'' (where $f$ is the logistic
-function). This model captures the idea that there is a threshold around which
-each additional neighbor's contribution becomes significant: logistic
-cascades can be thought of approximating the more-commonly studied Linear
-Threshold model~\cite{Kempe:03}. As we will see later in the
-analysis, the Hessian of our optimization program simplifies in the case of a
-logistic inverse link function to $\nabla^2\mathcal{L}(\theta^*) =
-\frac{1}{|\mathcal{T}|}XX^T$ where $X$ is the observation matrix $[x^1 \ldots
-x^\mathcal{|T|}]$.
+``Logistic cascades'' (where the inverse link function $f$ is the logistic
+function) capture the idea that there is a threshold around which each
+additional neighbor's contribution becomes significant: they constitue a
+differentiable approximation to the more-commonly studied Linear Threshold
+model~\cite{Kempe:03}. As we will see later in the analysis, the Hessian of our
+optimization program simplifies in the case of a logistic inverse link function
+to $\nabla^2\mathcal{L}(\theta^*) = \frac{1}{|\mathcal{T}|}XX^T$ where $X$ is
+the observation matrix $[x^1 \ldots x^\mathcal{|T|}]$.
% \subsection{The Linear Threshold Model}
@@ -234,18 +234,17 @@ x^\mathcal{|T|}]$.
\caption{Illustration of the sparse-recovery approach. Our objective is to
recover the unknown weight vector $\theta_j$ for each node $j$. We observe a
Bernoulli realization of the $f$ transform of the matrix-vector product, where
-the measurement matrix encodes which nodes are ``contagious''}
+the measurement matrix encodes which nodes are ``contagious'' at each time step}
\end{figure}
Inferring the model parameter $\Theta$ from observed influence cascades is the
central question of the present work. Recovering the edges in $E$ from observed
-influence cascades is a well-identified problem known as the \emph{Graph
+influence cascades is a well-identified problem known as the \emph{Network
Inference} problem. However, recovering the influence parameters is no less
-important and has been seemingly overlooked so far. In this work we focus on
-recovering $\Theta$, noting that the set of edges $E$ can then be recovered
-through the following equivalence: $(i,j)\in E\Leftrightarrow \Theta_{i,j}
-\neq 0$
+important. In this work we focus on recovering $\Theta$, noting that the set of
+edges $E$ can then be recovered through the following equivalence: $(i,j)\in
+E\Leftrightarrow \Theta_{i,j} \neq 0$
Given observations $(x^1,\ldots,x^n)$ of a cascade model, we can recover
$\Theta$ via Maximum Likelihood Estimation (MLE). Denoting by $\mathcal{L}$ the