adding citations to intro

author: jeanpouget-abadie <jean.pougetabadie@gmail.com> 2015-02-05 16:40:10 -0500
committer: jeanpouget-abadie <jean.pougetabadie@gmail.com> 2015-02-05 16:40:10 -0500
commit: 7f07f11fb307909dc68bb679b0370b3f06866dc8 (patch)
tree: 6480b2756f456dca578658a96d307265e6076b32
parent: 5d70de2deb5f7e8c0184c816a6d9da8d2d7d309f (diff)
download: cascades-7f07f11fb307909dc68bb679b0370b3f06866dc8.tar.gz
3 files changed, 59 insertions, 12 deletions
diff --git a/paper/sections/intro.tex b/paper/sections/intro.tex
index 9da8b3d..20b40f3 100644
--- a/paper/sections/intro.tex
+++ b/paper/sections/intro.tex
@@ -8,7 +8,6 @@ Tackling the graph inference problem means constructing a polynomial-time algori
  Throughout this paper, we focus on the three main discrete-time diffusion processes: the independent cascade model, the voter model, and the linear threshold model...
  \end{comment}
 
-
 A diffusion process taking place over a graph provides valuable information
 about the presence and weights of its edges. \emph{Influence cascades} are a
 specific type of diffusions processes in which a particular infection behavior
@@ -29,7 +28,7 @@ Thus, we will focus on a single node of degree $s$ and discuss how to identify
 its parents among the $m$ nodes of the graph. Prior work has shown that the
 required number of observed cascades is $\O(poly(s)\log m)$.
 
-A more recent line of research has focused on applying advances on sparse
+A more recent line of research has focused on applying advances in sparse
 recovery to the graph inference problem. Indeed, the graph can be interpreted
 as a ``sparse signal'' measured through influence cascades and then recovered.
 The challenge is that influence cascade models typically lead to non-linear
@@ -58,19 +57,17 @@ The organization of the paper is as follows: ...
 
 \paragraph{Related Work}
 
-The study of edge prediction in graph has been an active field of research for over a decade. \cite{GomezRodriguez:2010} was one of the first papers to study graph prediction from cascades. They introduce the {\scshape netinf} algorithm, which approximates the likelihood of cascades represented as a continuous process. The algorithm was later improved/modified in later work. Beside validation on generic networks, {\scshape netinf} is not known to have any theoretical recovery guarantees. \cite{Netrapalli:2012} studied solely the independent cascade model and obtained the first ${\cal O}(s^2 \log m)$ guarantee on general networks. The algorithm is based around the same likelihood function we suggest, without the $\ell1$-norm penalty. However, the analysis depended strongly on a restrictive {\it correlation decay} assumption, which strongly restricts the number of new infections at every step. In this restricted setting, they show a complex lower bound of the number of cascades needed for perfect support recovery with constant probability of the order $\Omega(s \log (m/s))$.
+The study of edge prediction in graph has been an active field of research for over a decade \cite{Nowell08} \cite{Leskovec07} \cite{AdarA05}. \cite{GomezRodriguez:2010} introduced the submodular {\scshape netinf} algorithm, which approximates the likelihood of cascades represented as a continuous process. The algorithm was later improved in later work \cite{gomezbalduzzi:2011}, but is not known to have any recovery guarantees beside empirical validation on synthetic networks. \cite{Netrapalli:2012} studied the discrete-time version of the independent cascade model and obtained the first ${\cal O}(s^2 \log m)$ guarantee on general networks. The algorithm is based around the same likelihood function we suggest, without the $\ell1$-norm penalty. However, the analysis depended strongly on a restrictive {\it correlation decay} assumption, which strongly limits the number of new infections at every step. In this restricted setting, they show a complex lower bound of the number of cascades needed for perfect support recovery with constant probability of the order $\Omega(s \log (m/s))$. They also suggest a {\scshape greedy} algorithm, which matches ${\cal O}(s \log m)$ guarantee in the case of \emph{tree} graphs.
 
-The work of \cite{Abrahao:13} study the same continuous-model framework as \cite{GomezRodriguez:2010} and obtain a ${\cal O}(s^9 \log^2 s \log m)$ support recovery algorithm. Their work also studies the information leveraged by different `parts' of the cascade, showing that a surprisingly important amount of information can be gleaned from the first `infections' of the cascade. We will reach a similar conclusion in section~\ref{sec:assumptions}. However, their model supposes a single-source model, where only one source is selected at random, which is less realistic in practice. Oftentimes, the `patient 0' is unknown to us, and a multi-source model intuitively models the more common situation of a delayed observation of the cascade.
+The work of \cite{Abrahao:13} study the same continuous-model framework as \cite{GomezRodriguez:2010} and obtain a ${\cal O}(s^9 \log^2 s \log m)$ support recovery algorithm without the \emph{correlation decay} assumption. Their work also studies the information leveraged by different `parts' of the cascade, showing that a surprisingly important amount of information can be gleaned from the first and last `infections' of the cascade. They assume a single-source model, where only one source is selected at random, which is less realistic in practice since `patient 0' is rarely unknown to us.
 
 Closest to this work is a recent paper by \cite{Daneshmand:2014}, they consider
 a $\ell_1$-norm penalty to their objective function, adapting standard results
 from sparse recovery to obtain a ${\cal O}(s^3 \log m)$ algorithm under an
 irrepresentability condition. With stronger assumptions, they match the
 \cite{Netrapalli:2012} bound of ${\cal O}(s^2 \log m)$, by exploiting
-a similar proof technique based around the KKT conditions of the objective
-function. Their work has the merit of studying a general framework of
-continuous functions. Similarly to \cite{Abrahao:13}, they place themselves in
-the restrictive single-source context. 
+a similar properties of the objective's KKT conditions. Their work has the merit of studying a generalization of the discrete-time independent cascade model to continuous functions. Similarly to \cite{Abrahao:13}, they place themselves in
+the restrictive single-source context.
 
 TODO: add related works on lower bounds.
 
diff --git a/paper/sparse.bib b/paper/sparse.bib
index 1de78d6..e622b52 100644
--- a/paper/sparse.bib
+++ b/paper/sparse.bib
@@ -415,3 +415,53 @@ year = "2009"
   biburl    = {http://dblp.uni-trier.de/rec/bib/journals/corr/abs-1105-0697},
   bibsource = {dblp computer science bibliography, http://dblp.org}
 }
+
+@article{Nowell08,
+  author = {Liben-Nowell, David and Kleinberg, Jon},
+  biburl = {http://www.bibsonomy.org/bibtex/250b9b1ca1849fa9cb8bb92d6d9031436/mkroell},
+  doi = {10.1073/pnas.0708471105},
+  eprint = {http://www.pnas.org/content/105/12/4633.full.pdf+html},
+  journal = {Proceedings of the National Academy of Sciences},
+  keywords = {SNA graph networks},
+  number = 12,
+  pages = {4633-4638},
+  timestamp = {2008-10-09T10:32:56.000+0200},
+  title = {{Tracing information flow on a global scale using Internet chain-letter data}},
+  url = {http://www.pnas.org/content/105/12/4633.abstract},
+  volume = 105,
+  year = 2008
+}
+
+@inproceedings{Leskovec07,
+  author    = {Jure Leskovec and
+               Mary McGlohon and
+               Christos Faloutsos and
+               Natalie S. Glance and
+               Matthew Hurst},
+  title     = {Patterns of Cascading Behavior in Large Blog Graphs},
+  booktitle = {Proceedings of the Seventh {SIAM} International Conference on Data
+               Mining, April 26-28, 2007, Minneapolis, Minnesota, {USA}},
+  pages     = {551--556},
+  year      = {2007},
+  url       = {http://dx.doi.org/10.1137/1.9781611972771.60},
+  doi       = {10.1137/1.9781611972771.60},
+  timestamp = {Wed, 12 Feb 2014 17:08:15 +0100},
+  biburl    = {http://dblp.uni-trier.de/rec/bib/conf/sdm/LeskovecMFGH07},
+  bibsource = {dblp computer science bibliography, http://dblp.org}
+}
+
+
+@inproceedings{AdarA05,
+  author    = {Eytan Adar and
+               Lada A. Adamic},
+  title     = {Tracking Information Epidemics in Blogspace},
+  booktitle = {2005 {IEEE} / {WIC} / {ACM} International Conference on Web Intelligence
+               {(WI} 2005), 19-22 September 2005, Compiegne, France},
+  pages     = {207--214},
+  year      = {2005},
+  url       = {http://dx.doi.org/10.1109/WI.2005.151},
+  doi       = {10.1109/WI.2005.151},
+  timestamp = {Tue, 12 Aug 2014 16:59:16 +0200},
+  biburl    = {http://dblp.uni-trier.de/rec/bib/conf/webi/AdarA05},
+  bibsource = {dblp computer science bibliography, http://dblp.org}
+}
+\ No newline at end of file
diff --git a/src/make_plots.py b/src/make_plots.py
index 792e325..840cc94 100644
--- a/src/make_plots.py
+++ b/src/make_plots.py
@@ -197,13 +197,13 @@ def plot_ROC_curve(figure_name):
 
 
 if __name__=="__main__":
-    if 0:
+    if 1:
         compute_graph("../datasets/watts_strogatz_300_30_point3.txt",
-                    n_cascades=100, lbda=.01, min_proba=.2, max_proba=.7,
+                    n_cascades=300, lbda=.01, min_proba=.2, max_proba=.7,
                     passed_function=
                     #convex_optimization.sparse_recovery)
                     #algorithms.greedy_prediction)
-                    convex_optimization.sparse_recovery)
+                    convex_optimization.sparse_recovery, p_init=.15)
     if 0:
         compute_graph("../datasets/powerlaw_200_30_point3.txt",
             n_cascades=200, lbda=.01, min_proba=.2, max_proba=.7,
@@ -218,7 +218,7 @@ if __name__=="__main__":
             convex_optimization.sparse_recovery)
             #algorithms.greedy_prediction)
             #convex_optimization.type_lasso)
-    if 1:
+    if 0:
         compute_graph("../datasets/kronecker_graph_256_cross.txt", 
             n_cascades=50, lbda=0., min_proba=.2, max_proba=.7,
             passed_function=
author	jeanpouget-abadie <jean.pougetabadie@gmail.com>	2015-02-05 16:40:10 -0500
committer	jeanpouget-abadie <jean.pougetabadie@gmail.com>	2015-02-05 16:40:10 -0500
commit	7f07f11fb307909dc68bb679b0370b3f06866dc8 (patch)
tree	6480b2756f456dca578658a96d307265e6076b32
parent	5d70de2deb5f7e8c0184c816a6d9da8d2d7d309f (diff)
download	cascades-7f07f11fb307909dc68bb679b0370b3f06866dc8.tar.gz