finale/sections/bayesian.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65

\begin{figure}
\centering
\label{fig:graphical}
\includegraphics[scale=.8]{graphical.pdf}
\caption{Graphical model representation of the Network Inference Problem with
  edge weights $\theta_{ij}$, observed cascade indicator vectors $X^c_t$, edge
prior parameters $\mu_{ij}$ and $\sigma_{ij}$. The source distribution,
parameterized by $\phi$, is considered fixed here.}
\end{figure}

In this section, we develop a Bayesian approach to the Network Inference Problem
by placing priors on the edge weights of the graph. The quantity of interest is
the posterior distribution, given through Bayes' rule by:
\begin{equation}
  \label{eq:bayesrule}
  \Theta | \bx \propto \text{prior}(\Theta) \times \mathcal{L}_\Theta(\bx)
\end{equation}
where $\mathcal{L}_\Theta(\bx)$ is the likelihood expressed in
Eq.~\ref{eq:dist}.

One advantage of the Bayesian approach is its ability to convey information
about the uncertainty surrounding each edge parameters. In the next section, we
will explore how to exploit this knowledge to improve the rate at which we
decrease our uncertainty by focusing on the most relevant parts of the network.

Another advantage of the Bayesian approach is the ability to encode
domain-knowledge through well-chosen prior distributions. For example, there is
an extensive literature~\cite{} on parametric representations of social
networks, which attempt to reproduce certain properties of such networks:
density of triangles, diameter, degree distribution, clustering coefficient etc.
Accounting for known graph properties, such as reciprocal links or the high
density of triangles has the potential to greatly increase the information we
leverage from each cascade.  Of course, such priors no longer allow us to
perform inference in parallel, which was leveraged in prior work.

\paragraph{The IC model.}
As mentioned above, the IC model (cf. Section~\ref{sec:model}) has no conjugate
priors. We consider here a truncated product gaussian prior here:
\begin{equation}
  \label{eq:gaussianprior}
  \text{prior}(\Theta) = \prod_{ij} \mathcal{N}^+(\theta_{ij} | \mu_{ij},
  \sigma_{ij})
\end{equation}
where $\mathcal{N}^+(\cdot)$ is a gaussian truncated to lied on $\mathbb{R}^+$
since $\Theta$ is a transformed parameter $z \mapsto -\log(1 - z)$. This model
is represented in the graphical model of Figure~\ref{fig:graphical}.

Since the prior in Eq.~\ref{eq:gaussianprior} is non-conjuate, we will
resort to the use of sampling algorithms (MCMC) and approximate Bayesian methods
(variational inference), which we cover here.

\paragraph{MCMC}
The Metropolis-Hastings (MCMC) algorithm allows us to draw samples from the
posterior directly using the un-normalized posterior distribution. The advantage
of this method is the ability to sample from the exact posterior and the wide
availability of software packages which will work `out-of-the-box'. However,
vanilla MCMC scales badly and is unsuitable for Bayesian learning of large
networks ($\geq 100$ nodes). We resort to fitting approximate posterior
distribution using a variational inference algorithm.

\paragraph{Variational Inference}

Variational inference algorithms consist in fitting an approximate family of
distributions to the exact posterior.