paper/sections/model.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

\paragraph{Influence Cascades}

We consider a weighted graph $G$ over the set of vertices $V$ and we define
$m\defeq |V|$. The edge weights are given by an (unknown) weight matrix
$\Theta:V^2\to\reals^+$. $\Theta_{i,j} = 0$ simply expresses that there is no
edge from $i$ to $j$ For $j\in V$, we write $\theta_j$ the $j$-th column of
$\Theta$: this is the indicator vector of the parents of $j$ in $G$.

An \emph{influence cascade} is a homogeneous discrete-time Markov process over
$\{0,1\}^m$.  More precisely, we represent the set of nodes infected at time
step $t-1\in\ints^+$ by a random vector $X^{t}\in\{0,1\}^{m}$ where $X^{t}_k
= 1$ can be interpreted as \emph{node $k$ was infected at time step $t-1$}.

\paragraph{Problem} In what follows, we assume that we know how to express the
transition probabilities as a function of the weight matrix $\Theta$ and the
goal is to infer this matrix by observing influence cascades.

We use standard maximum likelihood estimation (MLE). However, we assume that
the parameter we wish to recover, namely the matrix $\Theta$, is sparse in the
following sense: each node in the graph $G$ has at most $s$ parents. In other
words, the columns of $G$ have at most $s$ non-zero entries.  To factor this
additional assumption in the maximum likelihood estimation we use the now
classic $\ell_1$ regularization. For an observation $(x_1,\ldots,x_n)$, we
estimate $\hat{\Theta}$ by:
\begin{equation}\label{eq:mle}
    \hat{\Theta} = \argmax_{\Theta}\mathcal{L}(\Theta\,|\, x^1,\ldots, x^n)
    + \lambda\|\Theta\|_1
\end{equation}
where $\mathcal{L}$  is the log-likelihood function and $\lambda$ is the
regularization factor.

The goal of this paper is to characterize $\hat{\Theta}$ in terms of:
\begin{itemize}
    \item \emph{efficiency}: how well does $\hat{\Theta}$ approximate the true
        matrix $\Theta$? In particular can we bound its accuracy as a function
        of the number of observations?
    \item \emph{model consistency}: assuming that the true matrix $\Theta$ is
        $s$-sparse as defined above, can we recover the support of $\Theta$
        from the estimator $\hat{\Theta}$, and if so, how many observations are
        required?
\end{itemize}

We now turn to two widely used influence cascade models, the Independent
Cascade Model (IC) and the Linear Threshold Model (LT) and specify the MLE
problem \eqref{eq:mle} for these models

\subsection{Linear Threshold Model}

\subsection{Independent Cascade Model}

\subsection{Unification}

$X_j^{t+1}$ is a Bernoulli variable whose parameter can be written:
a function $f$ of $\inprod{\theta_j}{x^t}$, where $\inprod{a}{b}$ is the inner
product of $a$ and $b$ in $\reals^m$:
$\theta_j$ and $x^t$:

\begin{equation}\label{eq:markov}
    \P\big[x_j^{t+1} = 1\,|\, x^t\big] \defeq f(\inprod{\theta_j}{x^t}).
\end{equation}