paper/sections/model.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291

We consider a graph ${\cal G}= (V, E, \Theta)$, where $\Theta$ is a $|V|\times
|V|$ matrix of parameters describing the edge weights of $\mathcal{G}$.
Intuitively, $\Theta_{i,j}$ captures the ``influence'' of node $i$ on node $j$.
Let $m\defeq |V|$. For each node $j$, let $\theta_{j}$ be the $j^{th}$ column
vector of $\Theta$.  A discrete-time \emph{Cascade model} is a
Markov process over a finite state space ${\{0, 1, \dots, K-1\}}^V$ with the
following properties:
\begin{enumerate}
    \item Conditioned on the previous time step, the transition events between
        two states in $\{0,1,\dots, K-1\}$ for each $i \in V$ are mutually
        independent.
\item Of the $K$ possible states, there exists a \emph{contagious state} such
    that all transition probabilities of the Markov process are either constant
    or can be expressed as a function of the graph parameters $\Theta$ and the
    set of ``contagious nodes'' at the previous time step.
  \item The initial probability over ${\{0, 1, \dots, K-1\}}^V$ is such that all
  nodes can eventually reach a \emph{contagious state} with non-zero
  probability. The ``contagious'' nodes at $t=0$ are called
    \emph{source nodes}.
\end{enumerate}

In other words, a cascade model describes a diffusion process where a set of
contagious nodes ``influence'' other nodes in the graph to become contagious. An
\emph{influence cascade} is a realisation of this random process, \emph{i.e.}
the successive states of the nodes in graph ${\cal G}$. Note that both the
``single source'' assumption made in~\cite{Daneshmand:2014} and
\cite{Abrahao:13} as well as the ``uniformly chosen source set'' assumption made
in~\cite{Netrapalli:2012} verify condition 3. Also note that the multiple-source
node assumption does not reduce to the single-source assumption: even
non-overlapping cascades started from two nodes that are far apart cannot in
practice be attributed to either process without knowledge of the newtork
topology.

In the context of Network Inference,~\cite{Netrapalli:2012} focus
on the well-known discrete-time independent cascade model recalled below, which
\cite{Abrahao:13} and~\cite{Daneshmand:2014} generalize to continuous time. We
extend the independent cascade model in a different direction by considering
a more general class of transition probabilities while staying in the
discrete-time setting. We observe that despite their obvious differences, both
the independent cascade and the voter models make the network inference problem
similar to the standard generalized linear model inference problem. In fact, we
define a class of diffusion processes for which this is true: the
\emph{Generalized Linear Cascade Models}. The linear threshold model is
a special case and is discussed in Section~\ref{sec:linear_threshold}.

\subsection{Generalized Linear Cascade Models}
\label{sec:GLC}

Let \emph{susceptible} denote any state which can become contagious at
the next time step with a non-zero probability. We draw inspiration from
generalized linear models to introduce Generalized Linear Cascades:

\begin{definition}
\label{def:glcm}
Let $X^t$ be the indicator variable of ``contagious nodes'' at time step $t$.
A \emph{generalized linear cascade model} is a cascade model such that for each
susceptible node $j$ in state $s$ at time step $t$, the probability of $j$
becoming ``contagious'' at time step $t+1$ conditioned on $X^t$ is a Bernoulli
variable of parameter $f(\theta_j \cdot X^t)$:
\begin{equation}
    \label{eq:glm}
    \mathbb{P}(X^{t+1}_j = 1|X^t)
    = f(\theta_j \cdot X^t)
\end{equation}
where $f: \reals \rightarrow [0,1]$
\end{definition}

In other words, each generalized linear cascade provides, for each node $j \in
V$ a series of measurements ${(X^t, X^{t+1}_j)}_{t \in {\cal T}_j}$ sampled from
a generalized linear model. Note also that $\E[X^{t+1}_i\,|\,X^t]
= f(\inprod{\theta_i}{X^t})$. As such, $f$ can be interpreted as the inverse
link function of our generalized linear cascade model.

% Denoting by $X^t$ the state of the cascade at time step $t$, we interpret
% $X^t_j = 1$ for node $j\in V$ as ``node $j$ is contagious'', \emph{i.e}, ``$j$
% exhibits the source nodes' behavior at time step $t$''.  We draw inspiration
% from \emph{generalized linear models} (GLM) to define a generalized linear
% cascade.

% \begin{definition}
%     \label{def:glcm}
%     Let us denote by $\{\mathcal{F}_t, t\in\ints\}$ the natural
%     filtration induced by $\{X_t, t\in\ints\}$. A \emph{generalized linear
%     cascade} is characterized by the following equation:
%     \begin{displaymath}
%         \P[X^{t+1}=x\,|\, \mathcal{F}_t] =
%         \prod_{i=1}^m f(\inprod{\theta_i}{X^{t}})^{x_i}
%         \big(1-f(\inprod{\theta_i}{X^{t}}\big)^{1-x_i}
%     \end{displaymath}
% where $f:\mathbb{R}\to[0,1]$ and $\theta_i$ is the $i$-th column of $\Theta$.
% \end{definition}

% It follows immediately from this definition that a generalized linear cascade
% satisfies the Markov property:
% \begin{displaymath}
%     \P[X^{t+1}=x\,|\mathcal{F}_t] = \P[X^{t+1}=x\,|\, X^t]
% \end{displaymath}
% Note also that $\E[X^{t+1}_i\,|\,X^t] = f(\inprod{\theta_i}{X^t})$. As such,
% $f$ can be interpreted as the inverse link function of our generalized linear
% cascade model.

\subsection{Examples}

\subsubsection{Independent Cascade Model}

In the independent cascade model, nodes can be either susceptible, contagious
or immune. At $t=0$, all source nodes are ``contagious'' and all remaining
nodes are ``susceptible''. At each time step $t$, for each edge $(i,j)$ where
$j$ is susceptible and $i$ is contagious, $i$ attempts to infect $j$ with
probability $p_{i,j}\in[0,1]$; the infection attempts are mutually independent.
If $i$ succeeds, $j$ will become contagious at time step $t+1$. Regardless of
$i$'s success, node $i$ will be immune at time $t+1$, such that nodes
stay contagious for only one time step. The cascade process terminates when no
contagious nodes remain.

If we denote by $X^t$ the indicator variable of the set of contagious nodes at
time step $t$, then if $j$ is susceptible at time step $t+1$, we have:
\begin{displaymath}
    \P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
    = 1 - \prod_{i = 1}^m {(1 - p_{i,j})}^{X^t_i}.
\end{displaymath}
Defining $\Theta_{i,j} \defeq \log(1-p_{i,j})$, this can be rewritten as:
\begin{equation}\label{eq:ic}
    \tag{IC}
    \P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
    = 1 - \prod_{i = 1}^m e^{\Theta_{i,j}X^t_i}
    = 1 - e^{\inprod{\Theta_j}{X^t}}
\end{equation}

Therefore, the independent cascade model is a Generalized Linear Cascade model
with inverse link function $f : z \mapsto 1 - e^z$.

In Section~\ref{sec:results}, we show bounds on the \emph{transformed}
parameters $\Theta$ of the graph.  Fortunately, a bound on $\|\hat\theta -
\theta^*\|_2$ directly implies a bound on $\|\hat p - p^*\|_2$ as shown by the
following lemma:

\begin{lemma}
    $\|\hat{\theta} - \theta^* \|_2 \geq \|\hat{p} - p^*\|_2$.
\end{lemma}
\begin{proof}
Using the inequality $\forall x>0, \; \log x \geq 1 - \frac{1}{x}$, we have
$|\log (1 - p) - \log (1-p')| \geq \max(1 - \frac{1-p}{1-p'},
1 - \frac{1-p'}{1-p}) \geq \max( p-p', p'-p)$.
\end{proof}

\subsubsection{The Linear Voter Model}

In the Linear Voter Model, nodes can be either \emph{red} or \emph{blue}.
Without loss of generality, we can suppose that the \emph{blue} nodes are
contagious. The parameters of the graph are normalized such that $\forall i, \
\sum_j \Theta_{i,j} = 1$ and we assume that $\Theta_{i,i}$ is always non-zero,
meaning that all nodes have self-loops.  Each round, every node $j$
independently chooses one of its neighbors with probability $\Theta_{i,j}$ and
adopts their color. The cascades stops at a fixed horizon time $T$ or if all
nodes are of the same color.  If we denote by $X^t$ the indicator variable of
the set of blue nodes at time step $t$, then we have:
\begin{equation}
\mathbb{P}\left[X^{t+1}_j = 1 | X^t \right] = \sum_{i=1}^m \Theta_{i,j} X_i^t =
\inprod{\Theta_j}{X^t}
\tag{V}
\end{equation}

Thus, the linear voter model is a Generalized Linear Cascade model
with inverse link function $f: z \mapsto z$.

\subsubsection{Discretization of Continous Model}

Another motivation for the Generalized Linear Cascade model is that it captures
the time-discretized formulation of the well-studied  continuous-time
independent cascade model with exponential transmission function (CICE)
of~\cite{GomezRodriguez:2010, Abrahao:13, Daneshmand:2014}.  Suppose that time
is binned into intervals of length $\epsilon$ and let $X^k$ be the set of nodes
`infected' before or during the $k^{th}$ time interval.  Note that contrary to
the discrete-time independent cascade model, $X^k_j = 1 \implies X^{k+1}_j =
1$.

Let $\exp(p)$ be an exponentially-distributed random variable of parameter $p$
and let $\Theta_{i,j}$ be the rate of transmission along directed edge $(i,j)$
in the CICE model.  By the memoryless property of the exponential, if $X^k_j
\neq 1$:
\begin{align*}
  \mathbb{P}(X^{k+1}_j = 1 | X^k) & = \mathbb{P}(\min_{i \in {\cal N}(j)}
  \exp(\Theta_{i,j}) \leq \epsilon) \\
  & = \mathbb{P}(\exp( \sum_{i=1}^m \Theta_{i,j} X^t_i) \leq \epsilon) \\
  & = 1 - e^{- \epsilon \Theta_j \cdot X^t}
\end{align*}
Note that here we implicitly let $\Theta_{i,j} = 0$ if $(i,j)$ is not a
directed edge of the graph. Note that this formulation is also consistent when
$X^k_j = 1$ by considering the dummy variables $\forall i, \Theta_{i,i} =
+\infty$.  Therefore, the $\epsilon$-time-binned CICE-induced process is a
Generalized Linear Cascade model with inverse link function $f:z\mapsto
1-e^{-\epsilon\cdot z}$.

\subsubsection{Logistic Cascades}
\label{sec:logistic_cascades}
``Logistic cascades'' (where the inverse link function $f$ is the logistic
function) capture the idea that there is a threshold around which each
additional neighbor's contribution becomes significant: they constitue a
differentiable approximation to the more-commonly studied Linear Threshold
model~\cite{Kempe:03}. As we will see later in the analysis, the Hessian of our
optimization program simplifies in the case of a logistic inverse link function
to $\nabla^2\mathcal{L}(\theta^*) = \frac{1}{|\mathcal{T}|}XX^T$ where $X$ is
the observation matrix $[x^1 \ldots x^\mathcal{|T|}]$.

% \subsection{The Linear Threshold Model}

% In the Linear Threshold Model, each node $j\in V$ has a threshold $t_j$ from
% the interval $[0,1]$ and for each node, the sum of incoming weights is less
% than $1$: $\forall j\in V$, $\sum_{i=1}^m \Theta_{i,j} \leq 1$.

% Nodes have two states: susceptible or contagious. At each time step, each
% susceptible node $j$ becomes contagious if the sum of the incoming weights
% from contagious parents is greater than $j$'s threshold $t_j$. Nodes remain
% contagious until the end of the cascade, which is reached when no new
% susceptible nodes become contagious.

% As such, the source nodes are chosen, the process is entirely deterministic.
% Let $X^t$ be the indicator variable of the set of contagious nodes at time
% step $t-1$, then: \begin{equation} \nonumber X^{t+1}_j =
% \mathbbm{1}_{\sum_{i=1}^m \Theta_{i,j}X^t_i > t_j} =
% \mathbbm{1}_{\inprod{\theta_j}{X^t} > t_j} \end{equation} where we defined
% again $\theta_j\defeq (\Theta_{1,j},\ldots,\Theta_{m,j})$. In other words, for
% every step in the linear threshold model, we observe the following signal:

% \begin{equation} \label{eq:lt} \tag{LT} \mathbb{P} \left[X^{t+1}_j = 1 |
% X^t\right] = \text{sign} \left(\inprod{\theta_j}{X^t} - t_j \right)
% \end{equation} where ``sign'' is the function $\mathbbm{1}_{\cdot > 0}$.
\subsection{Maximum Likelihood Estimation}
\label{sec:mle}

\begin{figure}
  \includegraphics[scale=.4]{figures/drawing.pdf}
  \caption{Illustration of the sparse-recovery approach. Our objective is to
  recover the unknown weight vector $\theta_j$ for each node $j$. We observe a
Bernoulli realization of the $f$ transform of the matrix-vector product, where
the measurement matrix encodes which nodes are ``contagious'' at each time step}
\end{figure}


Inferring the model parameter $\Theta$ from observed influence cascades is the
central question of the present work. Recovering the edges in $E$ from observed
influence cascades is a well-identified problem known as the \emph{Network
Inference} problem. However, recovering the influence parameters is no less
important. In this work we focus on recovering $\Theta$, noting that the set of
edges $E$ can then be recovered through the following equivalence: $(i,j)\in
E\Leftrightarrow  \Theta_{i,j} \neq 0$

Given observations $(x^1,\ldots,x^n)$ of a cascade model, we can recover
$\Theta$ via Maximum Likelihood Estimation (MLE). Denoting by $\mathcal{L}$ the
log-likelihood function, we consider the following $\ell_1$-regularized MLE
problem:
\begin{displaymath}
    \hat{\Theta} \in \argmax_{\Theta} \frac{1}{n}
    \mathcal{L}(\Theta\,|\,x^1,\ldots,x^n) - \lambda\|\Theta\|_1
\end{displaymath}
where $\lambda$ is the regularization factor which helps preventing
overfitting and controls the sparsity of the solution.

The generalized linear cascade model is decomposable in the following sense:
given Definition~\ref{def:glcm}, the log-likelihood can be written as the sum
of $m$ terms, each term $i\in\{1,\ldots,m\}$ only depending on $\theta_i$.
Since this is equally true for $\|\Theta\|_1$, each column $\theta_i$ of
$\Theta$ can be estimated by a separate optimization program:
\begin{equation}\label{eq:pre-mle}
    \hat{\theta}_i \in \argmax_{\theta} \mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n)
    - \lambda\|\theta_i\|_1
\end{equation}
where we denote by ${\cal T}_i$ the time steps at which node $i$ is susceptible
and:
\begin{multline*}
    \mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n) = \frac{1}{|{\cal T}_i|}
    \sum_{t\in {\cal T}_i } x_i^{t+1}\log f(\inprod{\theta_i}{x^{t}}) \\ + (1 -
    x_i^{t+1})\log\big(1-f(\inprod{\theta_i}{x^t})\big)
\end{multline*}

In the case of the voter model, the measurements include all time steps until
we reach the time horizon $T$ or the graph coalesces to a single state. In the
case of the independent cascade model, the measurements include all time steps
until node $i$ becomes contagious, after which its behavior is deterministic.
Contrary to prior work, our results depend on the number of
measurements and not the number of cascades.

A sufficient condition for program~\eqref{eq:pre-mle} to be a convex program is
that $\mathcal{L}_i$ is a concave function. This will be the case if, for
example, $f$ and $(1-f)$ are both log-concave functions. Remember that
a twice-differentiable function $f$ is log concave iff. $f''f \leq f'^2$. It is
easy to verify this property for $f$ and $(1-f)$ in the Independent Cascade
Model and Voter Model.

{\color{red} TODO:~talk about the different constraints}