diff options
Diffstat (limited to 'finale')
| -rw-r--r-- | finale/mid_report.tex | 72 |
1 files changed, 44 insertions, 28 deletions
diff --git a/finale/mid_report.tex b/finale/mid_report.tex index 38d9020..9873abd 100644 --- a/finale/mid_report.tex +++ b/finale/mid_report.tex @@ -73,30 +73,37 @@ Two subquestions: The study of edge prediction in graphs has been an active field of research for over a decade~\cite{Nowell08, Leskovec07, AdarA05}.~\cite{GomezRodriguez:2010} introduced the {\scshape Netinf} algorithm, which approximates the likelihood of -cascades represented as a continuous process. The algorithm was improved in -later work~\cite{gomezbalduzzi:2011}, but is not known to have any theoretical +cascades represented as a continuous process. The algorithm, relying on +Maximum-Likelihood and convex relaxations was improved in later +work~\cite{gomezbalduzzi:2011}, but is not known to have any theoretical guarantees beside empirical validation on synthetic networks. \cite{Netrapalli:2012} studied the discrete-time version of the independent cascade model and obtained the first ${\cal O}(s^2 \log m)$ recovery guarantee on general networks, by using unregularized MLE and making a {\it correlation -decay\/} assumption, which limits the number of new infections at every step. In -this setting, they show a lower bound of the number of cascades needed for -support recovery with constant probability of the order $\Omega(s \log (m/s))$. -They also suggest a {\scshape Greedy} algorithm, which achieves a ${\cal O}(s -\log m)$ guarantee in the case of tree graphs. The work of~\cite{Abrahao:13} -studies the same continuous-model framework as \cite{GomezRodriguez:2010} and -obtains an ${\cal O}(s^9 \log^2 s \log m)$ support recovery algorithm, without -the \emph{correlation decay} assumption. \cite{du2013uncover,Daneshmand:2014} -propose Lasso regularization of MLE for recovering the weights of the graph -under a continuous-time independent cascade model. The work -of~\cite{du2014influence} is slightly orthogonal to ours since they suggest -learning the \emph{influence} function, rather than the parameters of the -network directly. +decay\/} assumption, which limits the number of new infections at every step. +They also suggest a {\scshape Greedy} algorithm, with provably good performance +in tree graphs, which maintains a counter of each node's antecedents, i.e. the +set of nodes infected at a prior time step to its infection time. The work +of~\cite{Abrahao:13} studies the same continuous-model framework as +\cite{GomezRodriguez:2010} and obtains an ${\cal O}(s^9 \log^2 s \log m)$ +support recovery algorithm, without the \emph{correlation decay} assumption. +\cite{du2013uncover,Daneshmand:2014} propose Lasso regularization of MLE for +recovering the weights of the graph under a continuous-time independent cascade +model. The work of~\cite{du2014influence} is slightly orthogonal to ours since +they suggest learning the \emph{influence} function, rather than the parameters +of the network directly. -More recently, the inference of Hawkes process is perhaps the closest related -topic to our work, with recent papers \cite{linderman2014discovering, -linderman2015scalable} , focusing on MLE, EM, and Variational -Inference methods to learning these processes. +More recently, the continuous process studied in previous work has been +reformulated as a Hawkes process, with recent papers +\cite{linderman2014discovering, linderman2015scalable, simma2012modeling}, +focusing on Expectation-Maximization, and Variational Inference methods to learn +these processes. Because of the discrete nature of the GLC model, we hope to +bring a better understanding to the link between the properties of a graph and +its \emph{learnability}. We wish to further explore the idea of non-product +priors raised in \cite{linderman2014discovering, linderman2015scalable}, since +the experimental validation of their work focused on simple graph priors. +Finally, the \emph{Active Learning} formulation is, to the best of the +authors' knowledge, novel in this context. \section{Model} @@ -228,6 +235,7 @@ then the model is identifiable. \section{Inference} \subsection{Maximum-Likelihood Estimation} +\label{MLEsec} Because transitions occur independently for each node $i$, the log-likelihood is a sum of $k$ terms, each term $i$ only depending on $\bt_i$. Hence, each @@ -282,11 +290,19 @@ Though straightforward MCMC could be applied here, recent work~\cite{caimo2011bayesian, koskinen2010analysing, robins2007recent} has shown that ERGM inference has slow convergence and lack of robustness, developping better alternatives to naive MCMC formulations. Experiments using such a prior -are ongoing, but we present only simple product distribution-type priors here. +are ongoing, but we present results only for simple product distribution-type +priors here. \paragraph{Inference} -We can sample from the posterior by MCMC\@. This might not be the fastest -solution however. We could greatly benefit from using an alternative method: +We can sample from the posterior by MCMC\@ using the PyMC library, which works +relatively well (see paragraph on \emph{experiments}. If the prior is a product +distribution, then we can simply do Bayesian GLM regression, as in +Section~\ref{MLEsec}. However, computing the posterior edge-by-edge does not +accomodate for complex priors, which could encourage for example the presence of +triangles or two stars. For such priors, we must sample from the posterior of +the Markov chain directly, which is what we have done here. This method is very +slow however, and scales badly as a result of the number of cascades and number +of nodes. We could greatly benefit from using an alternative method: \begin{itemize} \item EM\@. This approach was used in \cite{linderman2014discovering, simma2012modeling} to learn @@ -314,12 +330,12 @@ parameter space. \includegraphics[scale=.4]{../simulation/plots/2015-11-05_22:57:26.pdf}} \subfloat[][1000 cascades]{ \includegraphics[scale=.4]{../simulation/plots/2015-11-05_22:58:29.pdf}} -\caption{Bayesian Inference of $\Theta$ with MCMC using a $Beta(1, 1)$ prior. -For each figure, the plot $(i, j)$ on the $i^{th}$ row and $j^{th}$ column -represent a histogram of samples taken from the posterior of the corresponding -edge $\Theta_{i, j}$. The red line indicates the true value of the edge weight. -If an edge does not exist (has weight $0$) the red line is confounded with the y -axis.} +\caption{Bayesian Inference of $\Theta$ with MCMC using a $Beta(1, 1)$ prior on +each edge. For each figure, the plot $(i, j)$ on the $i^{th}$ row and $j^{th}$ +column represent a histogram of samples taken from the posterior of the +corresponding edge $\Theta_{i, j}$. The red line indicates the true value of the +edge weight. If an edge does not exist (has weight $0$) the red line is +confounded with the y axis.} \label{betapriorbayeslearning} \end{figure} |
