diff options
Diffstat (limited to 'paper/sections/model.tex')
| -rw-r--r-- | paper/sections/model.tex | 54 |
1 files changed, 24 insertions, 30 deletions
diff --git a/paper/sections/model.tex b/paper/sections/model.tex index 5a0a96f..a645fcf 100644 --- a/paper/sections/model.tex +++ b/paper/sections/model.tex @@ -1,58 +1,52 @@ -We consider a graph $G= (V, E)$ and we define $m\defeq |V|$. A \emph{cascade -model} is a discrete-time homogeneous Markov process over $\{0,1\}^m$. Denoting -by $X^t$ the state of the process at time step $t-1$, we interpret $X^t_j = 1$ -for some node $j\in V$ as ``node $j$ is infected at time step $t-1$''. An -\emph{influence cascade} is simply a realisation of the random process -described by a cascade model. +We consider a graph ${\cal G}= (V, E, \Theta)$, with $\Theta$ a set of parameters to the graph. Let $m\defeq |V|$. A \emph{cascade model} is a finite state space Markov process over $\{0, 1, \dots \}^V$. An \emph{influence cascade} is a realisation of the random process described by a cascade model. -In what follows, we describe two widely used cascade models, the Independent -Cascade model (IC) and the Linear Threshold model (LT). In these models, the -transition probabilities of the Markov process depend on unknown parameters -describing how much the nodes in $G$ influence each other. +In practice, we restrict ourselves to discrete-time homogeneous cascade models. $\Theta$ usually represents the edge weights of ${\cal G}$. Finally, the state of each node can often be reduced to two classes: able to infect neighboring nodes at the next round, or unable to infect its neighbors. -Recovering these parameters from observed influence cascades is the central -question of the present work and is important for the following two reasons: +Recovering the intrinsic graph parameters $\Theta$ from observed influence cascades is the central question of the present work and is important for the following two reasons: \begin{enumerate} \item knowing the influence parameters allows for future prediction of influence cascades. \item when the set of edges $E$ is unknown, it can be recovered from the influence parameters. \end{enumerate} -We note that recovering the edges in $E$ from observed influence casccades is +We note that recovering the edges in $E$ from observed influence cascades is a well-identified problem known as the \emph{Graph Inference} problem. However, recovering the influence parameters is no less important and has been seemingly overlooked so far. The machinery developped in this work addresses both problems. -\subsection{Independent Cascade Model} +\subsection{Generalized Linear Models} + +Denoting by $X^t$ the state of the cascade at time step $t-1$, we interpret $X^t_j = 1$ for some node $j\in V$ as ``node $j$ can infect other nodes at time step $t+1$''. By the Markov property of cascades, $$\mathbb{P}(X^t | X^0, X^1, \dots, X^{t-1}) = \mathbb{P}(X^t | X^{t-1})$$ +We consider the following \emph{generalized linear model} (GLM) framework for cascades. + +\begin{definition} +Let .... A \emph{generalized linear cascade} is: +\end{definition} + +\subsection{Examples} +\label{subsec:examples} -In the independent cascade model, nodes can be either susceptible, active or -infected. All nodes start either as susceptible or active. At each time step -$t$, for each edge $(i,j)$ where $j$ is susceptible and $i$ is active, $i$ -attempts to infect $j$ with probability $p_{i,j}\in[0,1]$. If $i$ succeeds, $j$ will -become active at time step $t+1$. Regardless of $i$'s success, node $i$ will be -infected at time $t+1$: nodes stay active for only one time step. The cascade -continues until no active nodes remain. +\subsubsection{Independent Cascade Model} -If we denote by $X^t$ the indicator variable of the set of active nodes at time -step $t-1$, then if $j$ is susceptible at time step $t-1$, we have: +In the independent cascade model, nodes can be either susceptible, active/infected or inactive. All nodes start either as susceptible or active. At each time step $t$, for each edge $(i,j)$ where $j$ is susceptible and $i$ is active, $i$ attempts to infect $j$ with probability $p_{i,j}\in[0,1]$. If $i$ succeeds, $j$ will become active at time step $t+1$. Regardless of $i$'s success, node $i$ will be inactive at time $t+1$: nodes stay active for only one time step. The cascade continues until no active nodes remain. + +If we denote by $X^t$ the indicator variable of the set of active nodes at time step $t-1$, then if $j$ is susceptible at time step $t-1$, we have: \begin{displaymath} \P\big[X^{t+1}_j = 1\,|\, X^{t}\big] = 1 - \prod_{i = 1}^m (1 - p_{i,j})^{X^t_i}. \end{displaymath} -Defining $\Theta_{i,j} \defeq \log(1-p_{i,j})$, this can be rewritten -as: +Defining $\Theta_{i,j} \defeq \log(1-p_{i,j})$, this can be rewritten as: \begin{equation}\label{eq:ic} \tag{IC} \P\big[X^{t+1}_j = 1\,|\, X^{t}\big] = 1 - \prod_{i = 1}^m e^{\Theta_{i,j}X^t_i} = 1 - e^{\inprod{\theta_j}{X^t}} -\end{equation} -where we defined $\theta_j\defeq (\Theta_{1,j},\ldots,\Theta_{m,j})$. +\end{equation} where we defined $\theta_j\defeq (\Theta_{1,j},\ldots,\Theta_{m,j})$. + +\subsubsection{The Voter Model} -\subsection{The Voter Model} -\subsection{Generalized Linear Models} \subsection{Maximum Likelihood Estimation} @@ -94,7 +88,7 @@ $f(\inprod{\theta_j}{x^t})$ for some function $f$. Hence, if we denote by $n_j$ the first step at which $j$ becomes active, we can rewrite the MLE program \eqref{eq:pre-mle}: \begin{multline} - \hat{\theta}_j \in \argmax_{\theta} + \hat{\theta}_j \in \argmax_{\theta} \frac{1}{n_j} \sum_{t=1}^{n_j-1}\log\big(1-f(\inprod{\theta_j}{x^t})\big)\\ \log f(\inprod{\theta_j}{x^{n_j}})+ \lambda\|\theta_j\| \end{multline} |
