paper/sections/model.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100

We consider a graph $G= (V, E)$ and we define $m\defeq |V|$. A \emph{cascade
model} is a discrete-time homogeneous Markov process over $\{0,1\}^m$. Denoting
by $X^t$ the state of the process at time step $t-1$, we interpret $X^t_j = 1$
for some node $j\in V$ as ``node $j$ is infected at time step $t-1$''. An
\emph{influence cascade} is simply a realisation of the random process
described by a cascade model.

In what follows, we describe two widely used cascade models, the Independent
Cascade model (IC) and the Linear Threshold model (LT). In these models, the
transition probabilities of the Markov process depend on unknown parameters
describing how much the nodes in $G$ influence each other.

Recovering these parameters from observed influence cascades is the central
question of the present work and is important for the following two reasons:
\begin{enumerate}
    \item knowing the influence parameters allows for future prediction of
        influence cascades.
    \item when the set of edges $E$ is unknown, it can be recovered from the
        influence parameters.
\end{enumerate}
We note that recovering the edges in $E$ from observed influence casccades is
a well-identified problem known as the \emph{Graph Inference} problem. However,
recovering the influence parameters is no less important and has been seemingly
overlooked so far. The machinery developped in this work addresses both
problems.

\subsection{Independent Cascade Model}

In the independent cascade model, nodes can be either susceptible, active or
infected. All nodes start either as susceptible or active. At each time step
$t$, for each edge $(i,j)$ where $j$ is susceptible and $i$ is active, $i$
attempts to infect $j$ with probability $p_{i,j}\in[0,1]$. If $i$ succeeds, $j$ will
become active at time step $t+1$. Regardless of $i$'s success, node $i$ will be
infected at time $t+1$: nodes stay active for only one time step. The cascade
continues until no active nodes remain.

If we denote by $X^t$ the indicator variable of the set of active nodes at time
step $t-1$, then if $j$ is susceptible at time step $t-1$, we have:
\begin{displaymath}
    \P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
    = 1 - \prod_{i = 1}^m (1 - p_{i,j})^{X^t_i}.
\end{displaymath}
Defining $\Theta_{i,j} \defeq \log(1-p_{i,j})$, this can be rewritten
as:
\begin{equation}\label{eq:ic}
    \tag{IC}
    \P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
    = 1 - \prod_{i = 1}^m e^{\Theta_{i,j}X^t_i}
    = 1 - e^{\inprod{\theta_j}{X^t}}
\end{equation}
where we defined $\theta_j\defeq (\Theta_{1,j},\ldots,\Theta_{m,j})$.

\subsection{The Voter Model}

\subsection{Generalized Linear Models}

\subsection{Maximum Likelihood Estimation}

In both models, the unknown influence parameters are described by an $m\times
m$ matrix $\Theta$. Furthermore, the information of the edges of $G$ is fully
contained in $\Theta$:
\begin{displaymath}
    (i,j)\in E\Leftrightarrow  \Theta_{i,j} \neq 0
\end{displaymath}

Given observations $(x^1,\ldots,x^n)$ of a cascade model, we can recover
$\Theta$ via Maximum Likelihood Estimation (MLE). Denoting by $\mathcal{L}$ the
log-likelihood function, we consider the following $\ell_1$-regularized MLE
problem:
\begin{displaymath}
    \hat{\Theta} \in \argmax_{\Theta} \mathcal{L}(\Theta\,|\,x^1,\ldots,x^n)
    + \lambda\|\Theta\|_1
\end{displaymath}
where $\lambda$ is the regularization factor which helps at preventing
overfitting as well as controlling for the sparsity of the solution.

Note that both the (IC) and the (LT) models are decomposable in the following
sense: the state evolution of each node at each time step happens independently
of the following nodes \emph{and} the state evolution of node $j\in V$ only
depends on $\theta_j$, the $j$-th column of $\Theta$. Consequently, the
log-likelihood function $\mathcal{L}$ can be written as a sum of $m$ terms,
each term $j\in\{1,\ldots,m\}$ only involving $\theta_j$. Since this is equally
true for $\|\Theta\|_1$, each column $\theta_j$ of $\Theta$ can be estimated by
a separate optimization program:
\begin{equation}\label{eq:pre-mle}
    \hat{\theta}_j \in \argmax_{\theta} \mathcal{L}_j(\theta_j\,|\,x^1,\ldots,x^n)
    + \lambda\|\theta_j\|_1
\end{equation}

Furthermore, the state evolution of a node $j\in V$ has the same structure in
both models: the transition from susceptible to active at time step $t+1$ is
controlled by a Bernoulli variable whose parameter can be written
$f(\inprod{\theta_j}{x^t})$ for some function $f$. Hence, if we denote by $n_j$
the first step at which $j$ becomes active, we can rewrite the  MLE program
\eqref{eq:pre-mle}:
\begin{multline}
    \hat{\theta}_j \in \argmax_{\theta} 
    \sum_{t=1}^{n_j-1}\log\big(1-f(\inprod{\theta_j}{x^t})\big)\\
\log f(\inprod{\theta_j}{x^{n_j}})+ \lambda\|\theta_j\|
\end{multline}