paper/sections/model.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92

\paragraph{Influence Cascades}

We consider a weighted graph $G$ over the set of vertices $V$ and we define
$m\defeq |V|$. The edge weights are given by an (unknown) weight matrix
$\Theta:V^2\to\reals^+$. $\Theta_{i,j} = 0$ simply expresses that there is no
edge from $i$ to $j$ For $j\in V$, we write $\theta_j$ the $j$-th column of
$\Theta$: this is the indicator vector of the parents of $j$ in $G$.

An \emph{influence cascade} is a homogeneous discrete-time Markov process over
$\{0,1\}^m$.  More precisely, we represent the set of nodes infected at time
step $t-1\in\ints^+$ by a random vector $X^{t}\in\{0,1\}^{m}$ where $X^{t}_k
= 1$ can be interpreted as \emph{node $k$ was infected at time step $t-1$}.

\paragraph{Problem} In what follows, we assume that we know how to express the
transition probabilities as a function of the weight matrix $\Theta$ and the
goal is to infer this matrix by observing influence cascades.

We use standard maximum likelihood estimation (MLE). However, we assume that
the parameter we wish to recover, namely the matrix $\Theta$, is sparse in the
following sense: each node in the graph $G$ has at most $s$ parents. In other
words, the columns of $G$ have at most $s$ non-zero entries.  To factor this
additional assumption in the maximum likelihood estimation we use the now
classic $\ell_1$ regularization. For an observation $(x_1,\ldots,x_n)$, we
estimate $\hat{\Theta}$ by:
\begin{equation}\label{eq:mle}
    \hat{\Theta} = \argmax_{\Theta}\mathcal{L}(\Theta\,|\, x^1,\ldots, x^n)
    + \lambda\|\Theta\|_1
\end{equation}
where $\mathcal{L}$  is the log-likelihood function and $\lambda$ is the
regularization factor.

The goal of this paper is to characterize $\hat{\Theta}$ in terms of:
\begin{itemize}
    \item \emph{efficiency}: how well does $\hat{\Theta}$ approximate the true
        matrix $\Theta$? In particular can we bound its accuracy as a function
        of the number of observations?
    \item \emph{model consistency}: assuming that the true matrix $\Theta$ is
        $s$-sparse as defined above, can we recover the support of $\Theta$
        from the estimator $\hat{\Theta}$, and if so, how many observations are
        required?
\end{itemize}

We now turn to two widely used influence cascade models, the Independent
Cascade Model (IC) and the Linear Threshold Model (LT) and specify the MLE
problem \eqref{eq:mle} for these models.

\subsection{Independent Cascade Model}

In the independent cascade model, nodes can be either uninfected, active or
infected. All nodes start either as uninfected or active. At each time step
$t$, for each edge $(i,j)$ where $j$ is uninfected and $i$ is active, $j$
attempts to infect $j$ with probability $p_{i,j}$. If $i$ succeeds, $j$ will
become active at time step $t+1$. Regardless of $i$'s success, node $i$ will be
infected at time $t+1$: nodes stay active for only one time step. The cascade
continues until no active nodes remain.

If we denote by $X^t$ the indicator variable of the set of active nodes at time
step $t-1$, then if $j$ is uninfected at time step $t-1$, we have:
\begin{displaymath}
    \P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
    = 1 - \prod_{i = 1}^m (1 - p_{i,j})^{X^t_i}
    = 1 - e^{\inprod{\theta_j}{X^t}}
\end{displaymath}
where we defined $\Theta_{i,j} \defeq \log(1-p_{i,j})$.

\subsection{Linear Threshold Model}

In the Linear Threshold Model, each node $j\in V$ has a threshold $t_j$ drawn
uniformly from the interval $[0,1]$. We assume that the weights given by
$\Theta$ are such that for each node $j\in V$, $\sum_{i=1}^m \Theta_{i,j} \leq
1$.

Nodes can be either infected or uninfected. At each time step, each uninfected
node $j$ becomes infected if the sum of the weights $\Theta_{i,j}$ where $i$ is
an infected parent of $j$ is larger than $j$'s threshold $t_j$. Denoting by
$X^t$ the indicator variable of the set of infected nodes at time step $t-1$,
if $j\in V$ is uninfected at time step $t-1$, then:
\begin{displaymath}
    \P\big[X^{t+1}_j = 1\,|\, X^{t}\big] = \sum_{i=1}^m \Theta_{i,j}X^t_i
    = \inprod{\theta_j}{X^t}
\end{displaymath}

\subsection{Unification}

$X_j^{t+1}$ is a Bernoulli variable whose parameter can be written:
a function $f$ of $\inprod{\theta_j}{x^t}$, where $\inprod{a}{b}$ is the inner
product of $a$ and $b$ in $\reals^m$:
$\theta_j$ and $x^t$:

\begin{equation}\label{eq:markov}
    \P\big[x_j^{t+1} = 1\,|\, x^t\big] \defeq f(\inprod{\theta_j}{x^t}).
\end{equation}