finale/sections/intro.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

Graphs have been extensively studied for their propagative abilities:
connectivity, routing, gossip algorithms, etc.  A diffusion process taking
place over a graph provides valuable information about the presence and weights
of its edges. \emph{Cascades} are a specific type of diffusion processes in
which a particular infectious behavior spreads over the nodes of the graph.  By
only observing the ``infection times'' of the nodes in the graph, one might
hope to recover the underlying graph and the parameters of the cascade model.
This problem is known in the literature as the \emph{Network Inference problem}.

More precisely, the cascade models studied here will be discrete-time random
processes and the observations are snapshots of the states of the nodes in the
networks at each time step. The cascade model specifies the probabilities of
transition between states as a function of the edge weights of the network.
Recovering the edge weights of the network given observations of cascades then
amounts to doing parametric inference of the cascade model.

A recent line of works \cite{GomezRodriguez:2010,Netrapalli:2012,pouget} has
focused on doing MLE estimation of the edge weights and obtaining guarantees on
the convergence rate of the MLE estimator. In this work we depart from this
line of work by studying the Graph Inference in the Bayesian setting.
Specifically:
\begin{itemize}
    \item we propose a Bayesian Inference formulation of the NIP problem in the
        Generalized Linear Cascade (GLC) Model of \cite{pouget} and show how to apply
        MCMC and variationel inference to it.
    \item we show how to leverage this Bayesian formulation to design active
        learning heuristics where the experimenter is able to dynamically
        choose the source node at which the observe cascades originate.
    \item we show empirically that active learning greatly improves the speed
        of learning compared to i.i.d observations.
\end{itemize}

The organization of the paper is as follows: we conclude this introduction by
a review of the related works. Section 2 introduces the notations and the
Generalized Linear Model, Section 3 presents our Bayesian Inference
formulation. The active learning approach is described in Section 4. Section
5 gives our experimental results. Finally we conclude by a discussion in
  Section 6.
\input{sections/related.tex}