aboutsummaryrefslogtreecommitdiffstats
path: root/paper
diff options
context:
space:
mode:
authorjeanpouget-abadie <jean.pougetabadie@gmail.com>2015-02-05 16:40:10 -0500
committerjeanpouget-abadie <jean.pougetabadie@gmail.com>2015-02-05 16:40:10 -0500
commit7f07f11fb307909dc68bb679b0370b3f06866dc8 (patch)
tree6480b2756f456dca578658a96d307265e6076b32 /paper
parent5d70de2deb5f7e8c0184c816a6d9da8d2d7d309f (diff)
downloadcascades-7f07f11fb307909dc68bb679b0370b3f06866dc8.tar.gz
adding citations to intro
Diffstat (limited to 'paper')
-rw-r--r--paper/sections/intro.tex13
-rw-r--r--paper/sparse.bib50
2 files changed, 55 insertions, 8 deletions
diff --git a/paper/sections/intro.tex b/paper/sections/intro.tex
index 9da8b3d..20b40f3 100644
--- a/paper/sections/intro.tex
+++ b/paper/sections/intro.tex
@@ -8,7 +8,6 @@ Tackling the graph inference problem means constructing a polynomial-time algori
Throughout this paper, we focus on the three main discrete-time diffusion processes: the independent cascade model, the voter model, and the linear threshold model...
\end{comment}
-
A diffusion process taking place over a graph provides valuable information
about the presence and weights of its edges. \emph{Influence cascades} are a
specific type of diffusions processes in which a particular infection behavior
@@ -29,7 +28,7 @@ Thus, we will focus on a single node of degree $s$ and discuss how to identify
its parents among the $m$ nodes of the graph. Prior work has shown that the
required number of observed cascades is $\O(poly(s)\log m)$.
-A more recent line of research has focused on applying advances on sparse
+A more recent line of research has focused on applying advances in sparse
recovery to the graph inference problem. Indeed, the graph can be interpreted
as a ``sparse signal'' measured through influence cascades and then recovered.
The challenge is that influence cascade models typically lead to non-linear
@@ -58,19 +57,17 @@ The organization of the paper is as follows: ...
\paragraph{Related Work}
-The study of edge prediction in graph has been an active field of research for over a decade. \cite{GomezRodriguez:2010} was one of the first papers to study graph prediction from cascades. They introduce the {\scshape netinf} algorithm, which approximates the likelihood of cascades represented as a continuous process. The algorithm was later improved/modified in later work. Beside validation on generic networks, {\scshape netinf} is not known to have any theoretical recovery guarantees. \cite{Netrapalli:2012} studied solely the independent cascade model and obtained the first ${\cal O}(s^2 \log m)$ guarantee on general networks. The algorithm is based around the same likelihood function we suggest, without the $\ell1$-norm penalty. However, the analysis depended strongly on a restrictive {\it correlation decay} assumption, which strongly restricts the number of new infections at every step. In this restricted setting, they show a complex lower bound of the number of cascades needed for perfect support recovery with constant probability of the order $\Omega(s \log (m/s))$.
+The study of edge prediction in graph has been an active field of research for over a decade \cite{Nowell08} \cite{Leskovec07} \cite{AdarA05}. \cite{GomezRodriguez:2010} introduced the submodular {\scshape netinf} algorithm, which approximates the likelihood of cascades represented as a continuous process. The algorithm was later improved in later work \cite{gomezbalduzzi:2011}, but is not known to have any recovery guarantees beside empirical validation on synthetic networks. \cite{Netrapalli:2012} studied the discrete-time version of the independent cascade model and obtained the first ${\cal O}(s^2 \log m)$ guarantee on general networks. The algorithm is based around the same likelihood function we suggest, without the $\ell1$-norm penalty. However, the analysis depended strongly on a restrictive {\it correlation decay} assumption, which strongly limits the number of new infections at every step. In this restricted setting, they show a complex lower bound of the number of cascades needed for perfect support recovery with constant probability of the order $\Omega(s \log (m/s))$. They also suggest a {\scshape greedy} algorithm, which matches ${\cal O}(s \log m)$ guarantee in the case of \emph{tree} graphs.
-The work of \cite{Abrahao:13} study the same continuous-model framework as \cite{GomezRodriguez:2010} and obtain a ${\cal O}(s^9 \log^2 s \log m)$ support recovery algorithm. Their work also studies the information leveraged by different `parts' of the cascade, showing that a surprisingly important amount of information can be gleaned from the first `infections' of the cascade. We will reach a similar conclusion in section~\ref{sec:assumptions}. However, their model supposes a single-source model, where only one source is selected at random, which is less realistic in practice. Oftentimes, the `patient 0' is unknown to us, and a multi-source model intuitively models the more common situation of a delayed observation of the cascade.
+The work of \cite{Abrahao:13} study the same continuous-model framework as \cite{GomezRodriguez:2010} and obtain a ${\cal O}(s^9 \log^2 s \log m)$ support recovery algorithm without the \emph{correlation decay} assumption. Their work also studies the information leveraged by different `parts' of the cascade, showing that a surprisingly important amount of information can be gleaned from the first and last `infections' of the cascade. They assume a single-source model, where only one source is selected at random, which is less realistic in practice since `patient 0' is rarely unknown to us.
Closest to this work is a recent paper by \cite{Daneshmand:2014}, they consider
a $\ell_1$-norm penalty to their objective function, adapting standard results
from sparse recovery to obtain a ${\cal O}(s^3 \log m)$ algorithm under an
irrepresentability condition. With stronger assumptions, they match the
\cite{Netrapalli:2012} bound of ${\cal O}(s^2 \log m)$, by exploiting
-a similar proof technique based around the KKT conditions of the objective
-function. Their work has the merit of studying a general framework of
-continuous functions. Similarly to \cite{Abrahao:13}, they place themselves in
-the restrictive single-source context.
+a similar properties of the objective's KKT conditions. Their work has the merit of studying a generalization of the discrete-time independent cascade model to continuous functions. Similarly to \cite{Abrahao:13}, they place themselves in
+the restrictive single-source context.
TODO: add related works on lower bounds.
diff --git a/paper/sparse.bib b/paper/sparse.bib
index 1de78d6..e622b52 100644
--- a/paper/sparse.bib
+++ b/paper/sparse.bib
@@ -415,3 +415,53 @@ year = "2009"
biburl = {http://dblp.uni-trier.de/rec/bib/journals/corr/abs-1105-0697},
bibsource = {dblp computer science bibliography, http://dblp.org}
}
+
+@article{Nowell08,
+ author = {Liben-Nowell, David and Kleinberg, Jon},
+ biburl = {http://www.bibsonomy.org/bibtex/250b9b1ca1849fa9cb8bb92d6d9031436/mkroell},
+ doi = {10.1073/pnas.0708471105},
+ eprint = {http://www.pnas.org/content/105/12/4633.full.pdf+html},
+ journal = {Proceedings of the National Academy of Sciences},
+ keywords = {SNA graph networks},
+ number = 12,
+ pages = {4633-4638},
+ timestamp = {2008-10-09T10:32:56.000+0200},
+ title = {{Tracing information flow on a global scale using Internet chain-letter data}},
+ url = {http://www.pnas.org/content/105/12/4633.abstract},
+ volume = 105,
+ year = 2008
+}
+
+@inproceedings{Leskovec07,
+ author = {Jure Leskovec and
+ Mary McGlohon and
+ Christos Faloutsos and
+ Natalie S. Glance and
+ Matthew Hurst},
+ title = {Patterns of Cascading Behavior in Large Blog Graphs},
+ booktitle = {Proceedings of the Seventh {SIAM} International Conference on Data
+ Mining, April 26-28, 2007, Minneapolis, Minnesota, {USA}},
+ pages = {551--556},
+ year = {2007},
+ url = {http://dx.doi.org/10.1137/1.9781611972771.60},
+ doi = {10.1137/1.9781611972771.60},
+ timestamp = {Wed, 12 Feb 2014 17:08:15 +0100},
+ biburl = {http://dblp.uni-trier.de/rec/bib/conf/sdm/LeskovecMFGH07},
+ bibsource = {dblp computer science bibliography, http://dblp.org}
+}
+
+
+@inproceedings{AdarA05,
+ author = {Eytan Adar and
+ Lada A. Adamic},
+ title = {Tracking Information Epidemics in Blogspace},
+ booktitle = {2005 {IEEE} / {WIC} / {ACM} International Conference on Web Intelligence
+ {(WI} 2005), 19-22 September 2005, Compiegne, France},
+ pages = {207--214},
+ year = {2005},
+ url = {http://dx.doi.org/10.1109/WI.2005.151},
+ doi = {10.1109/WI.2005.151},
+ timestamp = {Tue, 12 Aug 2014 16:59:16 +0200},
+ biburl = {http://dblp.uni-trier.de/rec/bib/conf/webi/AdarA05},
+ bibsource = {dblp computer science bibliography, http://dblp.org}
+} \ No newline at end of file