Abandonned by my project partner... finishing up experiment section

author: Thibaut Horel <thibaut.horel@gmail.com> 2015-12-12 00:28:23 -0500
committer: Thibaut Horel <thibaut.horel@gmail.com> 2015-12-12 00:28:23 -0500
commit: 6f876aa0fa8596eec71eec640ca9556b48034b6f (patch)
tree: 20e8c339802978a7e2f564704e2a75888c3dd05a /finale/sections/experiments.tex
parent: c760d533c1a44f98b2f660f64ea8354a7f896121 (diff)
download: cascades-6f876aa0fa8596eec71eec640ca9556b48034b6f.tar.gz
1 files changed, 39 insertions, 30 deletions
diff --git a/finale/sections/experiments.tex b/finale/sections/experiments.tex
index 74025f2..534ea7b 100644
--- a/finale/sections/experiments.tex
+++ b/finale/sections/experiments.tex
@@ -2,31 +2,35 @@ In this section, we apply the framework from Section~\ref{sec:bayes}
 and~\ref{sec:active} on synthetic graphs and cascades to validate the Bayesian
 approach as well as the effectiveness of the Active Learning heuristics.
 
-We started with using the library PyMC to sample from the posterior distribution
-directly. This method was shown to scale poorly with the number of nodes in the
-graph, such that graphs of size $\geq 100$ could not be reasonably be learned
-quickly. In Section~\ref{sec:appendix}, we show the progressive convergence of
-the posterior around the true values of the edge weights of the graph for a
-graph of size $4$.
+We started by using the PyMC library \cite{pymc} which implements MCMC
+algorithms to sample from the posterior distribution directly. We found this
+method to scale poorly with the number of nodes in the graph, such that graphs
+of size $\geq 100$ could not be reasonably be learned in the amount of time we
+had. In the appendix, we show the progressive convergence of the posterior
+around the true values of the edge weights of the graph for a graph of size
+$4$.
 
-In order to show the effect of the active learning policies, we needed to scale
-the experiments to graphs of size $\geq 1000$, which required the use of the
-variational inference procedure. A graph of size $1000$ has $1M$ parameters to
-be learned ($2M$ in the product-prior in Eq.~\ref{eq:gaussianprior}). The
-maximum-likelihood estimator converges to an $l_\infty$-error of $.05$ for most
-graphs after having observed at least $100M$ distinct cascade-steps.
+\paragraph{VI implementation.} In order to show the effect of the active
+learning policies, we needed to scale the experiments to graphs of size $\geq
+1000$. A graph of size $1000$ has $1M$ parameters to be learned ($2M$ in the
+product-prior of Eq.~\ref{eq:gaussianprior}). The maximum-likelihood estimator
+converges to an $\ell_\infty$-error of $.05$ for most graphs after having
+observed at least $100M$ distinct cascade-steps. Given the number of parameters
+to be learned and the number of cascades to process, we had to resort to
+Variational Inference.
 
 We therefore used the Blocks~\cite{blocks} framework, written on top of Theano
-for highly efficient SGD routines with minimal memory overhead. By encoding a
-cascade step as a tuple of two binary vectors, one for the infected nodes and
-one for the susceptible nodes, the variational inference objective can be
-written as a sum of two matrix multiplications, which Theano optimizes for
-on GPU.
+for automatic differentiation and highly efficient SGD routines with minimal
+memory overhead. By encoding a cascade step as a tuple of two binary vectors,
+one for the infected nodes and one for the susceptible nodes, the variational
+inference objective can be written as a simple computation graph on matrices
+that Theano compiles to machine code optimized for GPU execution.
 
-Since intuitively if nodes are exchangeable in our graph, the active learning
-policy will have little impact over the uniform-source policy, we decided to
-test our algorithms on an unbalanced graph $\mathcal{G}_A$ whose adjacency
-matrix $A$ is as follows:
+\paragraph{Synthetic graph.} Since intuitively if nodes are exchangeable
+(\emph{i.e} their topological properties like degree are identical) in our
+graph, the active learning policy will make little difference compared to the
+uniform-source policy, we decided to test our algorithms on an unbalanced graph
+$\mathcal{G}_A$ whose adjacency matrix $A$ is as follows:
 \begin{equation*}
 A = \left( \begin{array}{cccccc}
 0 & 1 & 1 & 1 & \dots & 1 \\
@@ -37,15 +41,20 @@ A = \left( \begin{array}{cccccc}
 \end{array}
 \right)
 \end{equation*}
+In other words, graph $\mathcal{G}_A$ is a wheel graph where every node, except
+for the center node, points to its (clock-wise) neighbor. 
 
-In other words, graph $\mathcal{G}_A$ is a star graph where every node, except
-for the center node, points to its (clock-wise) neighbor. In order for the
-baseline to be fair, we choose to create cascades starting from the source node
-on the fly both in the case of the uniform source and for the active learning
-policy. Each cascade is therefore `observed' only once. We plot the RMSE of the
-graph i.e. $RMSE^2 = \frac{1}{n^2} \|\hat{\mathbf{\Theta}} -
-\mathbf{\Theta}\|^2_2$.
+\paragraph{Baseline.} In order for the baseline to be fair, we choose to create
+cascades starting from the source node on the fly both in the case of the
+uniform source and for the active learning policy. Each cascade is therefore
+`observed' only once. We plot the RMSE of the graph i.e. $RMSE^2
+= \frac{1}{n^2} \|\hat{\mathbf{\Theta}} - \mathbf{\Theta}\|^2_2$. The results
+are presented in 
 
-graphs/datasets
+\begin{figure}
+\centering
+\label{fig:graphical}
+\includegraphics[scale=.4]{finale.png}
+\caption{}
+\end{figure}
 
-bullshit
author	Thibaut Horel <thibaut.horel@gmail.com>	2015-12-12 00:28:23 -0500
committer	Thibaut Horel <thibaut.horel@gmail.com>	2015-12-12 00:28:23 -0500
commit	6f876aa0fa8596eec71eec640ca9556b48034b6f (patch)
tree	20e8c339802978a7e2f564704e2a75888c3dd05a /finale/sections/experiments.tex
parent	c760d533c1a44f98b2f660f64ea8354a7f896121 (diff)
download	cascades-6f876aa0fa8596eec71eec640ca9556b48034b6f.tar.gz