aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--finale/final_report.tex24
-rw-r--r--finale/sections/experiments.tex21
2 files changed, 43 insertions, 2 deletions
diff --git a/finale/final_report.tex b/finale/final_report.tex
index 9e1e33c..159badc 100644
--- a/finale/final_report.tex
+++ b/finale/final_report.tex
@@ -7,7 +7,7 @@
\usepackage{graphicx}
\usepackage{bbm}
%\usepackage{fullpage}
-\input{def}
+\input{def}
\usepackage{icml2015}
%\usepackage{algpseudocode}
\DeclareMathOperator*{\argmax}{arg\,max}
@@ -43,6 +43,18 @@
\begin{document}
\maketitle
+\begin{abstract}
+ The Network Inference Problem (NIP) is the machine learning challenge of
+ recovering the edges and edge weights of an unknown weighted graph from the
+ observations of a random contagion process propagating over this graph.
+ While previous work has focused on provable convergence guarantees for the
+ Maximum-Likelihood estimator for the edge weights, a Bayesian treatment of
+ the problem is still lacking. In this work, we establish a scalable Bayesian
+ framework for the unified NIP formulation of \cite{pouget}. Furthermore, we
+ show how this Bayesian framework leads to intuitive and effective heuristics
+ to greatly speed up learning.
+\end{abstract}
+
\section{Introduction}
\input{sections/intro.tex}
@@ -51,14 +63,24 @@
\input{sections/model.tex}
\section{Bayesian Inference}
+\label{sec:bayes}
\input{sections/bayesian.tex}
\section{Active Learning}
+\label{sec:active}
\input{sections/active.tex}
\section{Experiments}
\input{sections/experiments.tex}
+\section{Discussion}
+
\bibliography{sparse}
\bibliographystyle{icml2015}
+
+\newpage
+\section{Appendix}
+\label{sec:appendix}
+\input{sections/appendix.tex}
+
\end{document}
diff --git a/finale/sections/experiments.tex b/finale/sections/experiments.tex
index c9cf762..14c83f6 100644
--- a/finale/sections/experiments.tex
+++ b/finale/sections/experiments.tex
@@ -1,7 +1,26 @@
-implementation: PyMC (scalability), blocks
+In this section, we apply the framework from Section~\ref{sec:bayes}
+and~\ref{sec:active} on synthetic graphs and cascades to validate the Bayesian
+approach as well as the effectiveness of the Active Learning heuristics.
+
+We started with using the library PyMC to sample from the posterior distribution
+directly. This method was shown to scale poorly with the number of nodes in the
+graph, such that graphs of size $\geq 100$ could not be reasonably be learned
+quickly. In Section~\ref{sec:appendix}, we show the progressive convergence of
+the posterior around the true values of the edge weights of the graph for a
+graph of size $4$.
+
+In order to show the effect of the active learning policies, we needed to scale
+the experiments to graphs of size $\geq 1000$, which required the use of the
+variational inference procedure. A graph of size $1000$ has $1M$ parameters to
+be learned ($2M$ in the product-prior in Eq.~\ref{eq:gaussianprior}). The
+maximum-likelihood estimator converges to an $l_\infty$-error of $.05$ for most
+graphs after having observed at least $100M$ distinct cascade-steps.
+
baseline
+fair comparison of online learning
+
graphs/datasets
bullshit