finale/sections/related.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

\paragraph{Related works.}
The study of edge prediction in graphs has been an active field of research for
over a decade~\cite{Nowell08, Leskovec07, AdarA05}.~\cite{GomezRodriguez:2010}
introduced the {\scshape Netinf} algorithm, which approximates the likelihood of
cascades represented as a continuous process.  The algorithm, relying on
Maximum-Likelihood and convex relaxations was improved in later
work~\cite{gomezbalduzzi:2011}, but is not known to have any theoretical
guarantees beside empirical validation on synthetic networks.

Let us denote by $m$ the number of nodes in the graph and for a given node, we
denote by $s$ its in-degree. In what follows, we call a \emph{recovery
guarantee} an upper bound on the number of observations required (expressed as
a function of $s$ and $m$) to learn the incoming edges of a node to a fixed
arbitrary accuracy level. \cite{Netrapalli:2012} studied the discrete-time
version of the independent cascade model and obtained the first ${\cal O}(s^2
\log m)$ recovery guarantee on general networks, by using unregularized MLE and
making a {\it correlation decay\/} assumption, which limits the number of new
infections at every step.  They also suggest a {\scshape Greedy} algorithm,
with provably good performance in tree graphs, which maintains a counter of
each node's antecedents, i.e. the set of nodes infected at a prior time step to
its infection time. The work of~\cite{Abrahao:13} studies the same
continuous-model framework as \cite{GomezRodriguez:2010} and obtains an ${\cal
O}(s^9 \log^2 s \log m)$ support recovery algorithm, without the
\emph{correlation decay} assumption.  \cite{du2013uncover,Daneshmand:2014}
propose Lasso regularization of MLE for recovering the weights of the graph
under a continuous-time independent cascade model. The work
of~\cite{du2014influence} is slightly orthogonal to ours since they suggest
learning the \emph{influence} function, rather than the parameters of the
network directly.

More recently, the continuous process studied in previous work has been
reformulated as a Hawkes process, with recent papers
\cite{linderman2014discovering, linderman2015scalable, simma2012modeling},
focusing on Expectation-Maximization, and Variational Inference methods to learn
these processes. Because of the discrete nature of the GLC model, we hope to
bring a better understanding to the link between the properties of a graph and
its \emph{learnability}. We wish to further explore the idea of non-product
priors raised in \cite{linderman2014discovering, linderman2015scalable}, since
the experimental validation of their work focused on simple graph priors.
Finally, the \emph{Active Learning} formulation is, to the best of the
authors' knowledge, novel in this context.