aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorjeanpouget-abadie <jean.pougetabadie@gmail.com>2015-11-06 14:00:41 -0500
committerjeanpouget-abadie <jean.pougetabadie@gmail.com>2015-11-06 14:00:41 -0500
commit4afe22be3af6106b41366c2e76172014337fed7e (patch)
tree779add36f1cf738724cfb3649eeb7bb1c7734a68
parenta743631400ef997a4748b83bb363f84624cfeb10 (diff)
downloadcascades-4afe22be3af6106b41366c2e76172014337fed7e.tar.gz
adding bayes description + references
-rw-r--r--finale/mid_report.tex72
1 files changed, 44 insertions, 28 deletions
diff --git a/finale/mid_report.tex b/finale/mid_report.tex
index 38d9020..9873abd 100644
--- a/finale/mid_report.tex
+++ b/finale/mid_report.tex
@@ -73,30 +73,37 @@ Two subquestions:
The study of edge prediction in graphs has been an active field of research for
over a decade~\cite{Nowell08, Leskovec07, AdarA05}.~\cite{GomezRodriguez:2010}
introduced the {\scshape Netinf} algorithm, which approximates the likelihood of
-cascades represented as a continuous process. The algorithm was improved in
-later work~\cite{gomezbalduzzi:2011}, but is not known to have any theoretical
+cascades represented as a continuous process. The algorithm, relying on
+Maximum-Likelihood and convex relaxations was improved in later
+work~\cite{gomezbalduzzi:2011}, but is not known to have any theoretical
guarantees beside empirical validation on synthetic networks.
\cite{Netrapalli:2012} studied the discrete-time version of the independent
cascade model and obtained the first ${\cal O}(s^2 \log m)$ recovery guarantee
on general networks, by using unregularized MLE and making a {\it correlation
-decay\/} assumption, which limits the number of new infections at every step. In
-this setting, they show a lower bound of the number of cascades needed for
-support recovery with constant probability of the order $\Omega(s \log (m/s))$.
-They also suggest a {\scshape Greedy} algorithm, which achieves a ${\cal O}(s
-\log m)$ guarantee in the case of tree graphs. The work of~\cite{Abrahao:13}
-studies the same continuous-model framework as \cite{GomezRodriguez:2010} and
-obtains an ${\cal O}(s^9 \log^2 s \log m)$ support recovery algorithm, without
-the \emph{correlation decay} assumption. \cite{du2013uncover,Daneshmand:2014}
-propose Lasso regularization of MLE for recovering the weights of the graph
-under a continuous-time independent cascade model. The work
-of~\cite{du2014influence} is slightly orthogonal to ours since they suggest
-learning the \emph{influence} function, rather than the parameters of the
-network directly.
+decay\/} assumption, which limits the number of new infections at every step.
+They also suggest a {\scshape Greedy} algorithm, with provably good performance
+in tree graphs, which maintains a counter of each node's antecedents, i.e. the
+set of nodes infected at a prior time step to its infection time. The work
+of~\cite{Abrahao:13} studies the same continuous-model framework as
+\cite{GomezRodriguez:2010} and obtains an ${\cal O}(s^9 \log^2 s \log m)$
+support recovery algorithm, without the \emph{correlation decay} assumption.
+\cite{du2013uncover,Daneshmand:2014} propose Lasso regularization of MLE for
+recovering the weights of the graph under a continuous-time independent cascade
+model. The work of~\cite{du2014influence} is slightly orthogonal to ours since
+they suggest learning the \emph{influence} function, rather than the parameters
+of the network directly.
-More recently, the inference of Hawkes process is perhaps the closest related
-topic to our work, with recent papers \cite{linderman2014discovering,
-linderman2015scalable} , focusing on MLE, EM, and Variational
-Inference methods to learning these processes.
+More recently, the continuous process studied in previous work has been
+reformulated as a Hawkes process, with recent papers
+\cite{linderman2014discovering, linderman2015scalable, simma2012modeling},
+focusing on Expectation-Maximization, and Variational Inference methods to learn
+these processes. Because of the discrete nature of the GLC model, we hope to
+bring a better understanding to the link between the properties of a graph and
+its \emph{learnability}. We wish to further explore the idea of non-product
+priors raised in \cite{linderman2014discovering, linderman2015scalable}, since
+the experimental validation of their work focused on simple graph priors.
+Finally, the \emph{Active Learning} formulation is, to the best of the
+authors' knowledge, novel in this context.
\section{Model}
@@ -228,6 +235,7 @@ then the model is identifiable.
\section{Inference}
\subsection{Maximum-Likelihood Estimation}
+\label{MLEsec}
Because transitions occur independently for each node $i$, the log-likelihood
is a sum of $k$ terms, each term $i$ only depending on $\bt_i$. Hence, each
@@ -282,11 +290,19 @@ Though straightforward MCMC could be applied here, recent
work~\cite{caimo2011bayesian, koskinen2010analysing, robins2007recent} has shown
that ERGM inference has slow convergence and lack of robustness, developping
better alternatives to naive MCMC formulations. Experiments using such a prior
-are ongoing, but we present only simple product distribution-type priors here.
+are ongoing, but we present results only for simple product distribution-type
+priors here.
\paragraph{Inference}
-We can sample from the posterior by MCMC\@. This might not be the fastest
-solution however. We could greatly benefit from using an alternative method:
+We can sample from the posterior by MCMC\@ using the PyMC library, which works
+relatively well (see paragraph on \emph{experiments}. If the prior is a product
+distribution, then we can simply do Bayesian GLM regression, as in
+Section~\ref{MLEsec}. However, computing the posterior edge-by-edge does not
+accomodate for complex priors, which could encourage for example the presence of
+triangles or two stars. For such priors, we must sample from the posterior of
+the Markov chain directly, which is what we have done here. This method is very
+slow however, and scales badly as a result of the number of cascades and number
+of nodes. We could greatly benefit from using an alternative method:
\begin{itemize}
\item EM\@. This approach was used in \cite{linderman2014discovering,
simma2012modeling} to learn
@@ -314,12 +330,12 @@ parameter space.
\includegraphics[scale=.4]{../simulation/plots/2015-11-05_22:57:26.pdf}}
\subfloat[][1000 cascades]{
\includegraphics[scale=.4]{../simulation/plots/2015-11-05_22:58:29.pdf}}
-\caption{Bayesian Inference of $\Theta$ with MCMC using a $Beta(1, 1)$ prior.
-For each figure, the plot $(i, j)$ on the $i^{th}$ row and $j^{th}$ column
-represent a histogram of samples taken from the posterior of the corresponding
-edge $\Theta_{i, j}$. The red line indicates the true value of the edge weight.
-If an edge does not exist (has weight $0$) the red line is confounded with the y
-axis.}
+\caption{Bayesian Inference of $\Theta$ with MCMC using a $Beta(1, 1)$ prior on
+each edge. For each figure, the plot $(i, j)$ on the $i^{th}$ row and $j^{th}$
+column represent a histogram of samples taken from the posterior of the
+corresponding edge $\Theta_{i, j}$. The red line indicates the true value of the
+edge weight. If an edge does not exist (has weight $0$) the red line is
+confounded with the y axis.}
\label{betapriorbayeslearning}
\end{figure}