paper/sections/model.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105

We consider a graph ${\cal G}= (V, E, \Theta)$, with $\Theta$ a set of parameters to the graph. Let $m\defeq |V|$. A \emph{cascade model} is a finite state space Markov process over $\{0, 1, \dots \}^V$. An \emph{influence cascade} is a realisation of the random process described by a cascade model.

In practice, we restrict ourselves to discrete-time homogeneous cascade models. $\Theta$ usually represents the edge weights of ${\cal G}$. Each influence cascade begins at $t=0$ with a designated source `state' (e.g ``active''). All nodes in the source state at $t=0$ are called \emph{source nodes}.  A standard theoretical assumption we can make is that each node at $t=0$ has a constant probability $p_{\text{init}}$ of being in the source state. This is more realistic and less restrictive than ``single source'' assumption made in \cite{Daneshmand:2014} and \cite{Abrahao:13}.

Recovering the intrinsic graph parameters $\Theta$ from observed influence cascades is the central question of the present work and is important for the following two reasons:
\begin{enumerate}
    \item knowing the influence parameters allows for future prediction of
        influence cascades.
    \item when the set of edges $E$ is unknown, it can be recovered from the
        influence parameters.
\end{enumerate}
We note that recovering the edges in $E$ from observed influence cascades is
a well-identified problem known as the \emph{Graph Inference} problem. However,
recovering the influence parameters is no less important and has been seemingly
overlooked so far. The machinery developped in this work addresses both
problems for a large class of well-known cascade models.

\subsection{Generalized Linear Cascade Models}

Denoting by $X^t$ the state of the cascade at time step $t-1$, we interpret $X^t_j = 1$ for some node $j\in V$ as ``node $j$ exhibits the source nodes' behavior at time step $t+1$''. By the Markov property of cascades, $$\mathbb{P}(X^t | X^0, X^1, \dots, X^{t-1}) = \mathbb{P}(X^t | X^{t-1})$$
We draw inspiration from \emph{generalized linear models} (GLM) to define the following:

\begin{definition}
Let $f: \mathbb{R} \leftarrow \mathbb{R}$ be an inverse link function. Let ${\cal F}_{< t}$ be the filtration defined by $\{ X^0, X^1, \dots, X^{t-1} \}$. A \emph{generalized linear cascade} is defined as:
$$\mathbb{P}(X^t = 1|{\cal F}_{< t}) = \exp(blabla$$
\end{definition}

\subsection{Examples}
\label{subsec:examples}

In this section, we show that both the Independent Cascade Model and the Voter model are Generalized Linear Cascades. We discuss the Linear Threshold model in Section~\ref{sec:linear_threshold}

\subsubsection{Independent Cascade Model}

In the independent cascade model, nodes can be either susceptible, active or inactive. At $t=0$, all source nodes are ``active'' and all remaining nodes are ``susceptible''. At each time step $t$, for each edge $(i,j)$ where $j$ is susceptible and $i$ is active, $i$ attempts to infect $j$ with probability $p_{i,j}\in[0,1]$. If $i$ succeeds, $j$ will become active at time step $t+1$. Regardless of $i$'s success, node $i$ will be inactive at time $t+1$. In other words, nodes stay active for only one time step. The cascade process terminates when no active nodes remain.

If we denote by $X^t$ the indicator variable of the set of active nodes at time step $t-1$, then if $j$ is susceptible at time step $t-1$, we have:
\begin{displaymath}
    \P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
    = 1 - \prod_{i = 1}^m (1 - p_{i,j})^{X^t_i}.
\end{displaymath}
Defining $\Theta_{i,j} \defeq \log(1-p_{i,j})$, this can be rewritten as:
\begin{equation}\label{eq:ic}
    \tag{IC}
    \P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
    = 1 - \prod_{i = 1}^m e^{\Theta_{i,j}X^t_i}
    = 1 - e^{\inprod{\theta_j}{X^t}}
\end{equation} where we defined $\theta_j\defeq (\Theta_{1,j},\ldots,\Theta_{m,j})$.

\subsubsection{The Voter Model}

In the Voter Model, nodes can be either red or blue, where ``blue'' is the source state. The parameters of the graph are normalized such that $\forall i, \ \sum_j \Theta_{i,j} = 1$. Each round, every node $j$ chooses one of its neighbors with probability $\Theta_ij$ and adopts their color. The cascades stops at a fixed horizon time T or if all nodes are of the same color. If we denote by $X^t$ the indicator variable of the set of blue nodes at time step $t-1$, then we have:
\begin{equation}
\mathbb{P}\left[X^{t+1}_j = 1 | X^t \right] = \sum_{i=1}^m \Theta_{i,j} X_i^t = \inprod{\theta_j}{X^t}
\tag{V}
\end{equation}

\subsection{Maximum Likelihood Estimation}

In both models, the unknown influence parameters are described by an $m\times
m$ matrix $\Theta$. Furthermore, the information of the edges of $G$ is fully
contained in $\Theta$:
\begin{displaymath}
    (i,j)\in E\Leftrightarrow  \Theta_{i,j} \neq 0
\end{displaymath}

Given observations $(x^1,\ldots,x^n)$ of a cascade model, we can recover
$\Theta$ via Maximum Likelihood Estimation (MLE). Denoting by $\mathcal{L}$ the
log-likelihood function, we consider the following $\ell_1$-regularized MLE
problem:
\begin{displaymath}
    \hat{\Theta} \in \argmax_{\Theta} \mathcal{L}(\Theta\,|\,x^1,\ldots,x^n)
    + \lambda\|\Theta\|_1
\end{displaymath}
where $\lambda$ is the regularization factor which helps at preventing
overfitting as well as controlling for the sparsity of the solution.

Note that both the (IC) and the (LT) models are decomposable in the following
sense: the state evolution of each node at each time step happens independently
of the following nodes \emph{and} the state evolution of node $j\in V$ only
depends on $\theta_j$, the $j$-th column of $\Theta$. Consequently, the
log-likelihood function $\mathcal{L}$ can be written as a sum of $m$ terms,
each term $j\in\{1,\ldots,m\}$ only involving $\theta_j$. Since this is equally
true for $\|\Theta\|_1$, each column $\theta_j$ of $\Theta$ can be estimated by
a separate optimization program:
\begin{equation}\label{eq:pre-mle}
    \hat{\theta}_j \in \argmax_{\theta} \mathcal{L}_j(\theta_j\,|\,x^1,\ldots,x^n)
    + \lambda\|\theta_j\|_1
\end{equation}

Furthermore, the state evolution of a node $j\in V$ has the same structure in
both models: the transition from susceptible to active at time step $t+1$ is
controlled by a Bernoulli variable whose parameter can be written
$f(\inprod{\theta_j}{x^t})$ for some function $f$. Hence, if we denote by $n_j$
the first step at which $j$ becomes active, we can rewrite the  MLE program
\eqref{eq:pre-mle}:
\begin{multline}
    \hat{\theta}_j \in \argmax_{\theta} \frac{1}{n_j}
    \sum_{t=1}^{n_j-1}\log\big(1-f(\inprod{\theta_j}{x^t})\big)\\
\log f(\inprod{\theta_j}{x^{n_j}})+ \lambda\|\theta_j\|
\end{multline}