diff options
| author | jeanpouget-abadie <jean.pougetabadie@gmail.com> | 2015-02-05 16:40:10 -0500 |
|---|---|---|
| committer | jeanpouget-abadie <jean.pougetabadie@gmail.com> | 2015-02-05 16:40:10 -0500 |
| commit | 7f07f11fb307909dc68bb679b0370b3f06866dc8 (patch) | |
| tree | 6480b2756f456dca578658a96d307265e6076b32 | |
| parent | 5d70de2deb5f7e8c0184c816a6d9da8d2d7d309f (diff) | |
| download | cascades-7f07f11fb307909dc68bb679b0370b3f06866dc8.tar.gz | |
adding citations to intro
| -rw-r--r-- | paper/sections/intro.tex | 13 | ||||
| -rw-r--r-- | paper/sparse.bib | 50 | ||||
| -rw-r--r-- | src/make_plots.py | 8 |
3 files changed, 59 insertions, 12 deletions
diff --git a/paper/sections/intro.tex b/paper/sections/intro.tex index 9da8b3d..20b40f3 100644 --- a/paper/sections/intro.tex +++ b/paper/sections/intro.tex @@ -8,7 +8,6 @@ Tackling the graph inference problem means constructing a polynomial-time algori Throughout this paper, we focus on the three main discrete-time diffusion processes: the independent cascade model, the voter model, and the linear threshold model... \end{comment} - A diffusion process taking place over a graph provides valuable information about the presence and weights of its edges. \emph{Influence cascades} are a specific type of diffusions processes in which a particular infection behavior @@ -29,7 +28,7 @@ Thus, we will focus on a single node of degree $s$ and discuss how to identify its parents among the $m$ nodes of the graph. Prior work has shown that the required number of observed cascades is $\O(poly(s)\log m)$. -A more recent line of research has focused on applying advances on sparse +A more recent line of research has focused on applying advances in sparse recovery to the graph inference problem. Indeed, the graph can be interpreted as a ``sparse signal'' measured through influence cascades and then recovered. The challenge is that influence cascade models typically lead to non-linear @@ -58,19 +57,17 @@ The organization of the paper is as follows: ... \paragraph{Related Work} -The study of edge prediction in graph has been an active field of research for over a decade. \cite{GomezRodriguez:2010} was one of the first papers to study graph prediction from cascades. They introduce the {\scshape netinf} algorithm, which approximates the likelihood of cascades represented as a continuous process. The algorithm was later improved/modified in later work. Beside validation on generic networks, {\scshape netinf} is not known to have any theoretical recovery guarantees. \cite{Netrapalli:2012} studied solely the independent cascade model and obtained the first ${\cal O}(s^2 \log m)$ guarantee on general networks. The algorithm is based around the same likelihood function we suggest, without the $\ell1$-norm penalty. However, the analysis depended strongly on a restrictive {\it correlation decay} assumption, which strongly restricts the number of new infections at every step. In this restricted setting, they show a complex lower bound of the number of cascades needed for perfect support recovery with constant probability of the order $\Omega(s \log (m/s))$. +The study of edge prediction in graph has been an active field of research for over a decade \cite{Nowell08} \cite{Leskovec07} \cite{AdarA05}. \cite{GomezRodriguez:2010} introduced the submodular {\scshape netinf} algorithm, which approximates the likelihood of cascades represented as a continuous process. The algorithm was later improved in later work \cite{gomezbalduzzi:2011}, but is not known to have any recovery guarantees beside empirical validation on synthetic networks. \cite{Netrapalli:2012} studied the discrete-time version of the independent cascade model and obtained the first ${\cal O}(s^2 \log m)$ guarantee on general networks. The algorithm is based around the same likelihood function we suggest, without the $\ell1$-norm penalty. However, the analysis depended strongly on a restrictive {\it correlation decay} assumption, which strongly limits the number of new infections at every step. In this restricted setting, they show a complex lower bound of the number of cascades needed for perfect support recovery with constant probability of the order $\Omega(s \log (m/s))$. They also suggest a {\scshape greedy} algorithm, which matches ${\cal O}(s \log m)$ guarantee in the case of \emph{tree} graphs. -The work of \cite{Abrahao:13} study the same continuous-model framework as \cite{GomezRodriguez:2010} and obtain a ${\cal O}(s^9 \log^2 s \log m)$ support recovery algorithm. Their work also studies the information leveraged by different `parts' of the cascade, showing that a surprisingly important amount of information can be gleaned from the first `infections' of the cascade. We will reach a similar conclusion in section~\ref{sec:assumptions}. However, their model supposes a single-source model, where only one source is selected at random, which is less realistic in practice. Oftentimes, the `patient 0' is unknown to us, and a multi-source model intuitively models the more common situation of a delayed observation of the cascade. +The work of \cite{Abrahao:13} study the same continuous-model framework as \cite{GomezRodriguez:2010} and obtain a ${\cal O}(s^9 \log^2 s \log m)$ support recovery algorithm without the \emph{correlation decay} assumption. Their work also studies the information leveraged by different `parts' of the cascade, showing that a surprisingly important amount of information can be gleaned from the first and last `infections' of the cascade. They assume a single-source model, where only one source is selected at random, which is less realistic in practice since `patient 0' is rarely unknown to us. Closest to this work is a recent paper by \cite{Daneshmand:2014}, they consider a $\ell_1$-norm penalty to their objective function, adapting standard results from sparse recovery to obtain a ${\cal O}(s^3 \log m)$ algorithm under an irrepresentability condition. With stronger assumptions, they match the \cite{Netrapalli:2012} bound of ${\cal O}(s^2 \log m)$, by exploiting -a similar proof technique based around the KKT conditions of the objective -function. Their work has the merit of studying a general framework of -continuous functions. Similarly to \cite{Abrahao:13}, they place themselves in -the restrictive single-source context. +a similar properties of the objective's KKT conditions. Their work has the merit of studying a generalization of the discrete-time independent cascade model to continuous functions. Similarly to \cite{Abrahao:13}, they place themselves in +the restrictive single-source context. TODO: add related works on lower bounds. diff --git a/paper/sparse.bib b/paper/sparse.bib index 1de78d6..e622b52 100644 --- a/paper/sparse.bib +++ b/paper/sparse.bib @@ -415,3 +415,53 @@ year = "2009" biburl = {http://dblp.uni-trier.de/rec/bib/journals/corr/abs-1105-0697}, bibsource = {dblp computer science bibliography, http://dblp.org} } + +@article{Nowell08, + author = {Liben-Nowell, David and Kleinberg, Jon}, + biburl = {http://www.bibsonomy.org/bibtex/250b9b1ca1849fa9cb8bb92d6d9031436/mkroell}, + doi = {10.1073/pnas.0708471105}, + eprint = {http://www.pnas.org/content/105/12/4633.full.pdf+html}, + journal = {Proceedings of the National Academy of Sciences}, + keywords = {SNA graph networks}, + number = 12, + pages = {4633-4638}, + timestamp = {2008-10-09T10:32:56.000+0200}, + title = {{Tracing information flow on a global scale using Internet chain-letter data}}, + url = {http://www.pnas.org/content/105/12/4633.abstract}, + volume = 105, + year = 2008 +} + +@inproceedings{Leskovec07, + author = {Jure Leskovec and + Mary McGlohon and + Christos Faloutsos and + Natalie S. Glance and + Matthew Hurst}, + title = {Patterns of Cascading Behavior in Large Blog Graphs}, + booktitle = {Proceedings of the Seventh {SIAM} International Conference on Data + Mining, April 26-28, 2007, Minneapolis, Minnesota, {USA}}, + pages = {551--556}, + year = {2007}, + url = {http://dx.doi.org/10.1137/1.9781611972771.60}, + doi = {10.1137/1.9781611972771.60}, + timestamp = {Wed, 12 Feb 2014 17:08:15 +0100}, + biburl = {http://dblp.uni-trier.de/rec/bib/conf/sdm/LeskovecMFGH07}, + bibsource = {dblp computer science bibliography, http://dblp.org} +} + + +@inproceedings{AdarA05, + author = {Eytan Adar and + Lada A. Adamic}, + title = {Tracking Information Epidemics in Blogspace}, + booktitle = {2005 {IEEE} / {WIC} / {ACM} International Conference on Web Intelligence + {(WI} 2005), 19-22 September 2005, Compiegne, France}, + pages = {207--214}, + year = {2005}, + url = {http://dx.doi.org/10.1109/WI.2005.151}, + doi = {10.1109/WI.2005.151}, + timestamp = {Tue, 12 Aug 2014 16:59:16 +0200}, + biburl = {http://dblp.uni-trier.de/rec/bib/conf/webi/AdarA05}, + bibsource = {dblp computer science bibliography, http://dblp.org} +}
\ No newline at end of file diff --git a/src/make_plots.py b/src/make_plots.py index 792e325..840cc94 100644 --- a/src/make_plots.py +++ b/src/make_plots.py @@ -197,13 +197,13 @@ def plot_ROC_curve(figure_name): if __name__=="__main__": - if 0: + if 1: compute_graph("../datasets/watts_strogatz_300_30_point3.txt", - n_cascades=100, lbda=.01, min_proba=.2, max_proba=.7, + n_cascades=300, lbda=.01, min_proba=.2, max_proba=.7, passed_function= #convex_optimization.sparse_recovery) #algorithms.greedy_prediction) - convex_optimization.sparse_recovery) + convex_optimization.sparse_recovery, p_init=.15) if 0: compute_graph("../datasets/powerlaw_200_30_point3.txt", n_cascades=200, lbda=.01, min_proba=.2, max_proba=.7, @@ -218,7 +218,7 @@ if __name__=="__main__": convex_optimization.sparse_recovery) #algorithms.greedy_prediction) #convex_optimization.type_lasso) - if 1: + if 0: compute_graph("../datasets/kronecker_graph_256_cross.txt", n_cascades=50, lbda=0., min_proba=.2, max_proba=.7, passed_function= |
