paper/sections/model.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342

We consider a graph ${\cal G}= (V, E, \Theta)$, where $\Theta$ is a $|V|\times
|V|$ matrix of parameters describing the edge weights of $\mathcal{G}$.
Intuitively, $\Theta_{i,j}$ captures the ``influence'' of node $i$ on node $j$.
Let $m\defeq |V|$. For each node $j$, let $\theta_{j}$ be the $j^{th}$ column
vector of $\Theta$.  A discrete-time \emph{Cascade model} is a
Markov process over a finite state space ${\{0, 1, \dots, K-1\}}^V$ with the
following properties:
\begin{enumerate}
    \item Conditioned on the previous time step, the transition events between
        two states in $\{0,1,\dots, K-1\}$ for each $i \in V$ are mutually
        independent across $i\in V$.
\item Of the $K$ possible states, there exists a \emph{contagious state} such
    that all transition probabilities of the Markov process can be expressed as
    a function of the graph parameters $\Theta$ and the set of ``contagious
    nodes'' at the previous time step.
  \item The initial probability over ${\{0, 1, \dots, K-1\}}^V$ is such that all
  nodes can eventually reach a \emph{contagious state} with non-zero
  probability. The ``contagious'' nodes at $t=0$ are called
    \emph{source nodes}.
\end{enumerate}

In other words, a cascade model describes a diffusion process where a set of
contagious nodes ``influence'' other nodes in the graph to become contagious.
An \emph{influence cascade} is a realisation of this random process,
\emph{i.e.} the successive states of the nodes in graph ${\cal G}$. Note that
both the ``single source'' assumption made in~\cite{Daneshmand:2014} and
\cite{Abrahao:13} as well as the ``uniformly chosen source set'' assumption
made in~\cite{Netrapalli:2012} verify condition 3. Also note that the
multiple-source node assumption does not reduce to the single-source
assumption, even under the assumption that cascades do not overlap. Imagining
for example two cascades starting from two different nodes; since we do not
observe which node propagated the contagion to which node, we cannot attribute
an infected node to either cascade and treat the problem as two independent
cascades.

In the context of Network Inference,~\cite{Netrapalli:2012} focus
on the well-known discrete-time independent cascade model recalled below, which
\cite{Abrahao:13} and~\cite{Daneshmand:2014} generalize to continuous time. We
extend the independent cascade model in a different direction by considering
a more general class of transition probabilities while staying in the
discrete-time setting. We observe that despite their obvious differences, both
the independent cascade and the voter models make the network inference problem
similar to the standard generalized linear model inference problem. In fact, we
define a class of diffusion processes for which this is true: the
\emph{Generalized Linear Cascade Models}. The linear threshold model is
a special case and is discussed in Section~\ref{sec:linear_threshold}.

\subsection{Generalized Linear Cascade Models}
\label{sec:GLC}

Let \emph{susceptible} denote any state which can become contagious at
the next time step with a non-zero probability. We draw inspiration from
generalized linear models to introduce Generalized Linear Cascades:

\begin{definition}
\label{def:glcm}
Let $X^t$ be the indicator variable of ``contagious nodes'' at time step $t$.
A \emph{generalized linear cascade model} is a cascade model such that for each
susceptible node $j$ in state $s$ at time step $t$, the probability of $j$
becoming ``contagious'' at time step $t+1$ conditioned on $X^t$ is a Bernoulli
variable of parameter $f(\theta_j \cdot X^t)$:
\begin{equation}
    \label{eq:glm}
    \mathbb{P}(X^{t+1}_j = 1|X^t)
    = f(\theta_j \cdot X^t)
\end{equation}
where $f: \reals \rightarrow [0,1]$
\end{definition}

In other words, each generalized linear cascade provides, for each node $j \in
V$ a series of measurements ${(X^t, X^{t+1}_j)}_{t \in {\cal T}_j}$ sampled from
a generalized linear model. Note also that $\E[X^{t+1}_i\,|\,X^t]
= f(\inprod{\theta_i}{X^t})$. As such, $f$ can be interpreted as the inverse
link function of our generalized linear cascade model.

% Denoting by $X^t$ the state of the cascade at time step $t$, we interpret
% $X^t_j = 1$ for node $j\in V$ as ``node $j$ is contagious'', \emph{i.e}, ``$j$
% exhibits the source nodes' behavior at time step $t$''.  We draw inspiration
% from \emph{generalized linear models} (GLM) to define a generalized linear
% cascade.

% \begin{definition}
%     \label{def:glcm}
%     Let us denote by $\{\mathcal{F}_t, t\in\ints\}$ the natural
%     filtration induced by $\{X_t, t\in\ints\}$. A \emph{generalized linear
%     cascade} is characterized by the following equation:
%     \begin{displaymath}
%         \P[X^{t+1}=x\,|\, \mathcal{F}_t] =
%         \prod_{i=1}^m f(\inprod{\theta_i}{X^{t}})^{x_i}
%         \big(1-f(\inprod{\theta_i}{X^{t}}\big)^{1-x_i}
%     \end{displaymath}
% where $f:\mathbb{R}\to[0,1]$ and $\theta_i$ is the $i$-th column of $\Theta$.
% \end{definition}

% It follows immediately from this definition that a generalized linear cascade
% satisfies the Markov property:
% \begin{displaymath}
%     \P[X^{t+1}=x\,|\mathcal{F}_t] = \P[X^{t+1}=x\,|\, X^t]
% \end{displaymath}
% Note also that $\E[X^{t+1}_i\,|\,X^t] = f(\inprod{\theta_i}{X^t})$. As such,
% $f$ can be interpreted as the inverse link function of our generalized linear
% cascade model.

\subsection{Examples}

\subsubsection{Independent Cascade Model}

In the independent cascade model, nodes can be either susceptible, contagious
or immune. At $t=0$, all source nodes are ``contagious'' and all remaining
nodes are ``susceptible''. At each time step $t$, for each edge $(i,j)$ where
$j$ is susceptible and $i$ is contagious, $i$ attempts to infect $j$ with
probability $p_{i,j}\in[0,1]$; the infection attempts are mutually independent.
If $i$ succeeds, $j$ will become contagious at time step $t+1$. Regardless of
$i$'s success, node $i$ will be immune at time $t+1$, such that nodes
stay contagious for only one time step. The cascade process terminates when no
contagious nodes remain.

If we denote by $X^t$ the indicator variable of the set of contagious nodes at
time step $t$, then if $j$ is susceptible at time step $t+1$, we have:
\begin{displaymath}
    \P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
    = 1 - \prod_{i = 1}^m {(1 - p_{i,j})}^{X^t_i}.
\end{displaymath}
Defining $\Theta_{i,j} \defeq \log(\frac{1}{1-p_{i,j}})$, this can be rewritten as:
\begin{align*}\label{eq:ic}
    \tag{IC}
    \P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
    &= 1 - \prod_{i = 1}^m e^{-\Theta_{i,j}X^t_i}\\
    &= 1 - e^{-\inprod{\Theta_j}{X^t}}
\end{align*}

Therefore, the independent cascade model is a Generalized Linear Cascade model
with inverse link function $f : z \mapsto 1 - e^{-z}$. Note that to write the
Independent Cascade Model as a Generalized Linear Cascade Model, we had to
introduce the change of variable $\Theta_{i,j} = \log(\frac{1}{1-p_{i,j}})$.
The recovery results in Section~\ref{sec:results} pertain to the $\Theta_j$
parameters. Fortunately, the following lemma shows that the recovery error on
$\Theta_j$ is an upper bound on the error on the original $p_j$ parameters.

\begin{lemma}
    \label{lem:transform}
    $\|\hat{\theta} - \theta^* \|_2 \geq \|\hat{p} - p^*\|_2$.
\end{lemma}

\subsubsection{The Linear Voter Model}

In the Linear Voter Model, nodes can be either \emph{red} or \emph{blue}.
Without loss of generality, we can suppose that the \emph{blue} nodes are
contagious. The parameters of the graph are normalized such that $\forall i, \
\sum_j \Theta_{i,j} = 1$.  Each round, every node $j$
independently chooses one of its neighbors with probability $\Theta_{i,j}$ and
adopts their color. The cascades stops at a fixed horizon time $T$ or if all
nodes are of the same color.  If we denote by $X^t$ the indicator variable of
the set of blue nodes at time step $t$, then we have:
\begin{equation}
\mathbb{P}\left[X^{t+1}_j = 1 | X^t \right] = \sum_{i=1}^m \Theta_{i,j} X_i^t =
\inprod{\Theta_j}{X^t}
\tag{V}
\end{equation}

Thus, the linear voter model is a Generalized Linear Cascade model
with inverse link function $f: z \mapsto z$.

\subsubsection{Discretization of Continuous Model}

Another motivation for the Generalized Linear Cascade model is that it captures
the time-discretized formulation of the well-studied  continuous-time
independent cascade model with exponential transmission function (CICE)
of~\cite{GomezRodriguez:2010, Abrahao:13, Daneshmand:2014}. Assume that the
temporal resolution of the discretization is $\varepsilon$, \emph{i.e.} all
nodes whose (continuous) infection time is within the interval $[k\varepsilon,
    (k+1)\varepsilon)$ are considered infected at (discrete) time step $k$. Let
    $X^k$ be the indicator vector of the set of nodes `infected' before or
    during the $k^{th}$ time interval.  Note that contrary to the discrete-time
    independent cascade model, $X^k_j = 1 \implies X^{k+1}_j = 1$, that is,
    there is no immune state and nodes remain contagious forever.

Let $\text{Exp}(p)$ be an exponentially-distributed random variable of parameter $p$
and let $\Theta_{i,j}$ be the rate of transmission along directed edge $(i,j)$
in the CICE model.  By the memoryless property of the exponential, if $X^k_j
\neq 1$:
\begin{multline*}
  \mathbb{P}(X^{k+1}_j = 1 | X^k) = \mathbb{P}(\min_{i \in {\cal N}(j)}
    \text{Exp}(\Theta_{i,j}) \leq \epsilon) \\
    = \mathbb{P}(\text{Exp}( \sum_{i=1}^m \Theta_{i,j} X^t_i) \leq \epsilon)
  = 1 - e^{- \epsilon \Theta_j \cdot X^t}
\end{multline*}
Therefore, the $\epsilon$-discretized CICE-induced process is a
Generalized Linear Cascade model with inverse link function $f:z\mapsto
1-e^{-\epsilon\cdot z}$.

\subsubsection{Logistic Cascades}
\label{sec:logistic_cascades}
``Logistic cascades'' is the specific case where the inverse link function is
given by the logistic function
$f(z) = 1/(1+e^{-z + t})$.
Intuitively, this captures the idea that there is a threshold $t$ such that
when the sum of the parameters of the infected parents of a node is larger than
the threshold, the probability of getting infected is close to one. This is
a smooth approximation of the hard threshold rule of the Linear Threshold
Model~\cite{Kempe:03}. As we will see later in the analysis, for logistic
cascades, the graph inference problem becomes a linear inverse problem.

% \subsection{The Linear Threshold Model}

% In the Linear Threshold Model, each node $j\in V$ has a threshold $t_j$ from
% the interval $[0,1]$ and for each node, the sum of incoming weights is less
% than $1$: $\forall j\in V$, $\sum_{i=1}^m \Theta_{i,j} \leq 1$.

% Nodes have two states: susceptible or contagious. At each time step, each
% susceptible node $j$ becomes contagious if the sum of the incoming weights
% from contagious parents is greater than $j$'s threshold $t_j$. Nodes remain
% contagious until the end of the cascade, which is reached when no new
% susceptible nodes become contagious.

% As such, the source nodes are chosen, the process is entirely deterministic.
% Let $X^t$ be the indicator variable of the set of contagious nodes at time
% step $t-1$, then: \begin{equation} \nonumber X^{t+1}_j =
% \mathbbm{1}_{\sum_{i=1}^m \Theta_{i,j}X^t_i > t_j} =
% \mathbbm{1}_{\inprod{\theta_j}{X^t} > t_j} \end{equation} where we defined
% again $\theta_j\defeq (\Theta_{1,j},\ldots,\Theta_{m,j})$. In other words, for
% every step in the linear threshold model, we observe the following signal:

% \begin{equation} \label{eq:lt} \tag{LT} \mathbb{P} \left[X^{t+1}_j = 1 |
% X^t\right] = \text{sign} \left(\inprod{\theta_j}{X^t} - t_j \right)
% \end{equation} where ``sign'' is the function $\mathbbm{1}_{\cdot > 0}$.
\subsection{Maximum Likelihood Estimation}
\label{sec:mle}

\begin{figure}
  \includegraphics[scale=.4]{figures/drawing.pdf}
  \caption{Illustration of the sparse-recovery approach. Our objective is to
  recover the unknown weight vector $\theta_j$ for each node $j$. We observe a
Bernoulli realization whose parameters are given by applying $f$ to the
matrix-vector product, where the measurement matrix encodes which nodes are
``contagious'' at each time step.}
\vspace{-1em}
\end{figure}

Inferring the model parameter $\Theta$ from observed influence cascades is the
central question of the present work. Recovering the edges in $E$ from observed
influence cascades is a well-identified problem known as the \emph{Network
Inference} problem. However, recovering the influence parameters is no less
important. In this work we focus on recovering $\Theta$, noting that the set of
edges $E$ can then be recovered through the following equivalence: $(i,j)\in
E\Leftrightarrow  \Theta_{i,j} \neq 0$

Given observations $(x^1,\ldots,x^n)$ of a cascade model, we can recover
$\Theta$ via Maximum Likelihood Estimation (MLE). Denoting by $\mathcal{L}$ the
log-likelihood function, we consider the following $\ell_1$-regularized MLE
problem:
\begin{displaymath}
    \hat{\Theta} \in \argmax_{\Theta} \frac{1}{n}
    \mathcal{L}(\Theta\,|\,x^1,\ldots,x^n) - \lambda\|\Theta\|_1
\end{displaymath}
where $\lambda$ is the regularization factor which helps prevent
overfitting and controls the sparsity of the solution.

The generalized linear cascade model is decomposable in the following sense:
given Definition~\ref{def:glcm}, the log-likelihood can be written as the sum
of $m$ terms, each term $i\in\{1,\ldots,m\}$ only depending on $\theta_i$.
Since this is equally true for $\|\Theta\|_1$, each column $\theta_i$ of
$\Theta$ can be estimated by a separate optimization program:
\begin{equation}\label{eq:pre-mle}
    \hat{\theta}_i \in \argmax_{\theta} \mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n)
    - \lambda\|\theta_i\|_1
\end{equation}
where we denote by ${\cal T}_i$ the time steps at which node $i$ is susceptible
and:
\begin{multline*}
    \mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n) = \frac{1}{|{\cal T}_i|}
    \sum_{t\in {\cal T}_i } x_i^{t+1}\log f(\inprod{\theta_i}{x^{t}}) \\ + (1 -
    x_i^{t+1})\log\big(1-f(\inprod{\theta_i}{x^t})\big)
\end{multline*}

In the case of the voter model, the measurements include all time steps until
we reach the time horizon $T$ or the graph coalesces to a single state. For the
independent cascade model, the measurements include all time steps until node
$i$ becomes contagious, after which its behavior is deterministic.  Contrary to
prior work, our results depend on the number of measurements and not the number
of cascades.

\paragraph{Regularity assumptions}

To solve program~\eqref{eq:pre-mle} efficiently, we would like it to be convex.
A sufficient condition is to assume that $\mathcal{L}_i$ is concave, which is
the case if $f$ and $(1-f)$ are both log-concave. Remember that a
twice-differentiable function $f$ is log-concave iff. $f''f \leq f'^2$.  It is
easy to verify this property for $f$ and $(1-f)$ in the Independent Cascade
Model and Voter Model.

Furthermore, the data-dependent bounds in Section~\ref{sec:main_theorem} will
require the following regularity assumption on the inverse link function $f$:
there exists $\alpha\in(0,1)$ such that
\begin{equation}
  \tag{LF}
  \max \big\{ | (\log f)'(z_x) |, |(\log (1-f))'(z_x) | \big\}
  \leq \frac{1}{\alpha}
\end{equation}
for all $z_x\defeq\inprod{\theta^*}{x}$ such that
$f(z_x)\notin\{0,1\}$.

In the voter model, $\frac{f'(z)}{f(z)} = \frac{1}{z}$ and
$\frac{f'(z)}{(1-f)(z)} =
\frac{1}{1-z}$. Hence (LF) will hold as soon as $\alpha\leq \Theta_{i,j}\leq
1-\alpha$ for all $(i,j)\in E$ which is always satisfied for some $\alpha$ for
non-isolated nodes.  In the Independent Cascade Model, $\frac{f'(z)}{f(z)} =
\frac{1}{e^{z}-1}$ and $\frac{f'(z)}{(1-f)(z)} = 1$. Hence (LF) holds as soon
as $p_{i,j}\geq \alpha$ for all $(i,j)\in E$ which is always satisfied for some
$\alpha\in(0,1)$.

For the data-independent bound of Proposition~\ref{prop:fi}, we will require the
following additional regularity assumption:
\begin{equation}
  \tag{LF2}
  \max \big\{ | (\log f)''(z_x) |, |(\log (1-f))''(z_x) | \big\}
  \leq \frac{1}{\alpha}
\end{equation}
for some $\alpha\in(0, 1)$ and for all $z_x\defeq\inprod{\theta^*}{x}$ such
that $f(z_x)\notin\{0,1\}$. It is again easy to see that this condition is
verified for the Independent Cascade Model and the Voter model for the same
$\alpha\in(0,1)$.

\paragraph{Convex constraints} The voter model is only defined when
$\Theta_{i,j}\in (0,1)$ for all $(i,j)\in E$. Similarly the independent cascade
model is only defined when $\Theta_{i,j}> 0$. Because the likelihood function
$\mathcal{L}_i$ is equal to $-\infty$ when the parameters are outside of the
domain of definition of the models, these contraints do not need to appear
explicitly in the optimization program.

In the specific case of the voter model, the constraint $\sum_j \Theta_{i,j}
= 1$ will not necessarily be verified by the estimator obtained in
\eqref{eq:pre-mle}. In some applications, the experimenter might not need this
constraint to be verified, in which case the results in
Section~\ref{sec:results} still give a bound on the recovery error. If this
constraint needs to be satisfied, then by Lagrangian duality, there exists
a $\lambda\in \reals$ such that adding $\lambda\big(\sum_{j}\theta_j
- 1\big)$ to the objective function of~\eqref{eq:pre-mle} enforces the
constraint. Then, it suffices to apply the results of Section~\ref{sec:results}
to the augmented objective to obtain the same recovery guarantees. Note that
the added term is linear and will easily satisfy all the required regularity
assumptions.