paper/sections/intro.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

Many real-world phenomena can be modeled as complex diffusion processes on networks: from behavior adoption, sharing of internet memes, citations of articles, and the spread of infectious diseases. Oftentimes, the exact network is unknown to us: we observe only the behavior of the nodes through time, without knowledge of who `influenced' whom. The spread of a particular behavior through a network is known as an {\it Influence Cascade}. In this context, the {\it Graph Inference} problem is to recover the edges of the graph, and as we will argue, along with its weights, from the observation of few influence cascades.

An effective algorithm recovers most of the edge links correctly from very few cascades. This problem is made all the more interesting by the fact that graph inference has shown to be possible, under various assumptions, in $\Omega(poly(\Delta) \log p)$ number of observed cascades, where $\Delta$ is the maximum degree of the graph and $p$ is the number of nodes. In other words, the dependence on $p$ is (almost miraculously) logarithmic.

Since the first few papers on link prediction in networks, the research community has made good progress in defining the {\it Graph Inference} problem more clearly and suggesting effective algorithms. However, not only can the known results be improved to $\Omega(\Delta \log p)$, but it can be shown that this is tight to a certain extent. Furthermore, not only can the support of the graph be recovered effectively, but it can be shown that the edge weights themselves can be estimated under the same assumptions, and that this is also tight to a certain extent.

Throughout this paper, we focus on the two main diffusion processes, outlined by Kempe et al. (...) in their seminal work [cite]: the independent cascade model (IC) and the linear threshold model. More generally, we aim to cast the graph inference problem as a {\it generalized linear model} regression problem.

\paragraph{Cascades and parameters}

\subsection{Related Work}

\paragraph{Past work}

\begin{itemize}
\item Gomez
\item Netrapalli: $d^2 $log n + correlation decay + very bad dependence on 
\item Kleinberg/Abrahao: $d^9$ log n: single source model...
\item Gomez: $d^3$ log n + assumptions on Hessian of diffusion process: upper and lower bound on eigenvalues + same proof concept as Netrapalli
\end{itemize}

\paragraph{Our contribution}

\begin{itemize}
\item Better assumptions: easy to understand, verify?, and much less restrictive
\item Oracle inequality rather than support recovery -> First one
\item Algorithm for recovery in Omega(d log n) -> First one
\item Practical Confidence Intervals
\item Practical Lower bound
\item Compare on generic networks
\end{itemize}

To justify: 
\begin{itemize}
\item why discrete isn't so bad; 
\item why Gomez's assumptions are not reasonable;
\end{itemize}