aboutsummaryrefslogtreecommitdiffstats
path: root/paper/sections/model.tex
diff options
context:
space:
mode:
Diffstat (limited to 'paper/sections/model.tex')
-rw-r--r--paper/sections/model.tex54
1 files changed, 24 insertions, 30 deletions
diff --git a/paper/sections/model.tex b/paper/sections/model.tex
index 5a0a96f..a645fcf 100644
--- a/paper/sections/model.tex
+++ b/paper/sections/model.tex
@@ -1,58 +1,52 @@
-We consider a graph $G= (V, E)$ and we define $m\defeq |V|$. A \emph{cascade
-model} is a discrete-time homogeneous Markov process over $\{0,1\}^m$. Denoting
-by $X^t$ the state of the process at time step $t-1$, we interpret $X^t_j = 1$
-for some node $j\in V$ as ``node $j$ is infected at time step $t-1$''. An
-\emph{influence cascade} is simply a realisation of the random process
-described by a cascade model.
+We consider a graph ${\cal G}= (V, E, \Theta)$, with $\Theta$ a set of parameters to the graph. Let $m\defeq |V|$. A \emph{cascade model} is a finite state space Markov process over $\{0, 1, \dots \}^V$. An \emph{influence cascade} is a realisation of the random process described by a cascade model.
-In what follows, we describe two widely used cascade models, the Independent
-Cascade model (IC) and the Linear Threshold model (LT). In these models, the
-transition probabilities of the Markov process depend on unknown parameters
-describing how much the nodes in $G$ influence each other.
+In practice, we restrict ourselves to discrete-time homogeneous cascade models. $\Theta$ usually represents the edge weights of ${\cal G}$. Finally, the state of each node can often be reduced to two classes: able to infect neighboring nodes at the next round, or unable to infect its neighbors.
-Recovering these parameters from observed influence cascades is the central
-question of the present work and is important for the following two reasons:
+Recovering the intrinsic graph parameters $\Theta$ from observed influence cascades is the central question of the present work and is important for the following two reasons:
\begin{enumerate}
\item knowing the influence parameters allows for future prediction of
influence cascades.
\item when the set of edges $E$ is unknown, it can be recovered from the
influence parameters.
\end{enumerate}
-We note that recovering the edges in $E$ from observed influence casccades is
+We note that recovering the edges in $E$ from observed influence cascades is
a well-identified problem known as the \emph{Graph Inference} problem. However,
recovering the influence parameters is no less important and has been seemingly
overlooked so far. The machinery developped in this work addresses both
problems.
-\subsection{Independent Cascade Model}
+\subsection{Generalized Linear Models}
+
+Denoting by $X^t$ the state of the cascade at time step $t-1$, we interpret $X^t_j = 1$ for some node $j\in V$ as ``node $j$ can infect other nodes at time step $t+1$''. By the Markov property of cascades, $$\mathbb{P}(X^t | X^0, X^1, \dots, X^{t-1}) = \mathbb{P}(X^t | X^{t-1})$$
+We consider the following \emph{generalized linear model} (GLM) framework for cascades.
+
+\begin{definition}
+Let .... A \emph{generalized linear cascade} is:
+\end{definition}
+
+\subsection{Examples}
+\label{subsec:examples}
-In the independent cascade model, nodes can be either susceptible, active or
-infected. All nodes start either as susceptible or active. At each time step
-$t$, for each edge $(i,j)$ where $j$ is susceptible and $i$ is active, $i$
-attempts to infect $j$ with probability $p_{i,j}\in[0,1]$. If $i$ succeeds, $j$ will
-become active at time step $t+1$. Regardless of $i$'s success, node $i$ will be
-infected at time $t+1$: nodes stay active for only one time step. The cascade
-continues until no active nodes remain.
+\subsubsection{Independent Cascade Model}
-If we denote by $X^t$ the indicator variable of the set of active nodes at time
-step $t-1$, then if $j$ is susceptible at time step $t-1$, we have:
+In the independent cascade model, nodes can be either susceptible, active/infected or inactive. All nodes start either as susceptible or active. At each time step $t$, for each edge $(i,j)$ where $j$ is susceptible and $i$ is active, $i$ attempts to infect $j$ with probability $p_{i,j}\in[0,1]$. If $i$ succeeds, $j$ will become active at time step $t+1$. Regardless of $i$'s success, node $i$ will be inactive at time $t+1$: nodes stay active for only one time step. The cascade continues until no active nodes remain.
+
+If we denote by $X^t$ the indicator variable of the set of active nodes at time step $t-1$, then if $j$ is susceptible at time step $t-1$, we have:
\begin{displaymath}
\P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
= 1 - \prod_{i = 1}^m (1 - p_{i,j})^{X^t_i}.
\end{displaymath}
-Defining $\Theta_{i,j} \defeq \log(1-p_{i,j})$, this can be rewritten
-as:
+Defining $\Theta_{i,j} \defeq \log(1-p_{i,j})$, this can be rewritten as:
\begin{equation}\label{eq:ic}
\tag{IC}
\P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
= 1 - \prod_{i = 1}^m e^{\Theta_{i,j}X^t_i}
= 1 - e^{\inprod{\theta_j}{X^t}}
-\end{equation}
-where we defined $\theta_j\defeq (\Theta_{1,j},\ldots,\Theta_{m,j})$.
+\end{equation} where we defined $\theta_j\defeq (\Theta_{1,j},\ldots,\Theta_{m,j})$.
+
+\subsubsection{The Voter Model}
-\subsection{The Voter Model}
-\subsection{Generalized Linear Models}
\subsection{Maximum Likelihood Estimation}
@@ -94,7 +88,7 @@ $f(\inprod{\theta_j}{x^t})$ for some function $f$. Hence, if we denote by $n_j$
the first step at which $j$ becomes active, we can rewrite the MLE program
\eqref{eq:pre-mle}:
\begin{multline}
- \hat{\theta}_j \in \argmax_{\theta}
+ \hat{\theta}_j \in \argmax_{\theta} \frac{1}{n_j}
\sum_{t=1}^{n_j-1}\log\big(1-f(\inprod{\theta_j}{x^t})\big)\\
\log f(\inprod{\theta_j}{x^{n_j}})+ \lambda\|\theta_j\|
\end{multline}