paper/sections/model.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135

We consider a graph ${\cal G}= (V, E, \Theta)$, where $\Theta$ is a $|V|\times
|V|$ matrix of paremeters describing the edge weights of $\mathcal{G}$. We
define $m\defeq |V|$. A \emph{cascade model} is a Markov process over the
finite state space $\{0, 1, \dots, K\}^V$. An \emph{influence cascade} is
a realisation of the random process described by a cascade model.

In practice, we restrict ourselves to discrete-time homogeneous cascade models.
At $t=0$, each node in the graph has a constant probability $p_{init}$ of being
in the ``active state''. The active nodes at $t=0$ are called the source nodes.
Our probabilistic model for the source nodes is more realistic and less
restrictive than the ``single source'' assumption made in \cite{Daneshmand:2014}
and \cite{Abrahao:13}.


\subsection{Generalized Linear Cascade Models}

Denoting by $X^t$ the state of the cascade at time step $t$, we interpret
$X^t_j = 1$ for node $j\in V$ as ``node $j$ is active'', \emph{i.e}, ``$j$
exhibits the source nodes' behavior at time step $t$''.  We draw inspiration
from \emph{generalized linear models} (GLM) to define a generalized linear
cascade.

\begin{definition}
    \label{def:glcm} 
    Let us denote by $\{\mathcal{F}_t, t\in\ints\}$ the natural
    filtration induced by $\{X_t, t\in\ints\}$. A \emph{generalized linear
    cascade} is characterized by the following equation:
    \begin{displaymath}
        \P[X^{t+1}=x\,|\, \mathcal{F}_t] =
        \prod_{i=1}^m f(\inprod{\theta_i}{X^{t}})^{x_i}
        \big(1-f(\inprod{\theta_i}{X^{t}}\big)^{1-x_i}
    \end{displaymath}
where $f:\mathbb{R}\to[0,1]$ and $\theta_i$ is the $i$-th column of $\Theta$.
\end{definition}

It follows immediately from this definition that a generalized linear cascade
satisfies the Markov property:
\begin{displaymath}
    \P[X^{t+1}=x\,|\mathcal{F}_t] = \P[X^{t+1}=x\,|\, X^t]
\end{displaymath}
Note also that $\E[X^{t+1}_i\,|\,X^t] = f(\inprod{\theta_i}{X^t})$. As such,
$f$ can be interpreted as the inverse link function of our generalized linear
cascade model.

\subsection{Examples}
\label{subsec:examples}

In this section, we show that both the Independent Cascade Model and the Voter
model are Generalized Linear Cascades. The Linear Threshold model will be
discussed in Section~\ref{sec:linear_threshold}.

\subsubsection{Independent Cascade Model}

In the independent cascade model, nodes can be either susceptible, active or
inactive. At $t=0$, all source nodes are ``active'' and all remaining nodes are
``susceptible''. At each time step $t$, for each edge $(i,j)$ where $j$ is
susceptible and $i$ is active, $i$ attempts to infect $j$ with probability
$p_{i,j}\in[0,1]$, the infection attempts are independent of each other. If $i$
succeeds, $j$ will become active at time step $t+1$. Regardless of $i$'s
success, node $i$ will be inactive at time $t+1$. In other words, nodes stay
active for only one time step. The cascade process terminates when no active
nodes remain.

If we denote by $X^t$ the indicator variable of the set of active nodes at time
step $t$, then if $j$ is susceptible at time step $t+1$, we have:
\begin{displaymath}
    \P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
    = 1 - \prod_{i = 1}^m (1 - p_{i,j})^{X^t_i}.
\end{displaymath}
Defining $\Theta_{i,j} \defeq \log(1-p_{i,j})$, this can be rewritten as:
\begin{equation}\label{eq:ic}
    \tag{IC}
    \P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
    = 1 - \prod_{i = 1}^m e^{\Theta_{i,j}X^t_i}
    = 1 - e^{\inprod{\theta_j}{X^t}}
\end{equation}
which is a Generalized Linear Cascade model with inverse link function $f(z)
= z$.

\subsubsection{The Voter Model}

In the Voter Model, nodes can be either red or blue, where ``blue'' is the
source state. The parameters of the graph are normalized such that $\forall i,
\ \sum_j \Theta_{i,j} = 1$. Each round, every node $j$ independently chooses
one of its neighbors with probability $\Theta_ij$ and adopts their color. The
cascades stops at a fixed horizon time T or if all nodes are of the same color.
If we denote by $X^t$ the indicator variable of the set of blue nodes at time
step $t$, then we have:
\begin{equation}
\mathbb{P}\left[X^{t+1}_j = 1 | X^t \right] = \sum_{i=1}^m \Theta_{i,j} X_i^t = \inprod{\theta_j}{X^t}
\tag{V}
\end{equation}
which is again a Generalized Linear Cascade model with inverse link function
$f(z) = z$.

\subsection{Maximum Likelihood Estimation}

Recovering the model parameter $\Theta$ from observed influence cascades is the
central question of the present work. Recovering the edges in $E$ from observed
influence cascades is a well-identified problem known as the \emph{Graph
Inference} problem. However, recovering the influence parameters is no less
important and has been seemingly overlooked so far. In this work we focus on
recovering $\Theta$, noting that the set of edges $E$ can then be recovered
through the following equivalence:
\begin{displaymath}
    (i,j)\in E\Leftrightarrow  \Theta_{i,j} \neq 0
\end{displaymath}

Given observations $(x^1,\ldots,x^n)$ of a cascade model, we can recover
$\Theta$ via Maximum Likelihood Estimation (MLE). Denoting by $\mathcal{L}$ the
log-likelihood function, we consider the following $\ell_1$-regularized MLE
problem:
\begin{displaymath}
    \hat{\Theta} \in \argmax_{\Theta} \mathcal{L}(\Theta\,|\,x^1,\ldots,x^n)
    + \lambda\|\Theta\|_1
\end{displaymath}
where $\lambda$ is the regularization factor which helps at preventing
overfitting as well as controlling for the sparsity of the solution.

The generalized linear cascade model is decomposable in the following sense:
given Definition~\ref{def:glcm}, the log-likelihood can be written as the sum
of $m$ terms, each term $i\in\{1,\ldots,m\}$ only depending on $\theta_i$.
Since this is equally true for $\|\Theta\|_1$, each column $\theta_i$ of
$\Theta$ can be estimated by a separate optimization program:
\begin{equation}\label{eq:pre-mle}
    \hat{\theta}_i \in \argmax_{\theta}\frac{1}{n_i}\mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n)
    + \lambda\|\theta_i\|_1
\end{equation}
where we denote by $n_i$ the first step at which node $i$ becomes active and
where:
\begin{multline}
    \mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n) = \frac{1}{n_i}
    \sum_{t=1}^{n_i-1}\log\big(1-f(\inprod{\theta_i}{x^t})\big)\\
+\log f(\inprod{\theta_i}{x^{n_i}})+ \lambda\|\theta_i\|_1
\end{multline}