1 files changed, 66 insertions, 5 deletions
diff --git a/finale/sections/active.tex b/finale/sections/active.tex
index 84cf4c6..43ed75d 100644
--- a/finale/sections/active.tex
+++ b/finale/sections/active.tex
@@ -1,9 +1,70 @@
-intro (motivation, description)
+In this section we study the problem of Graph Inference in the context of
+Active Learning: instead of assuming that the source of a cascade is drawn from
+a source distribution $p_s$, we assume that after each cascade observation, the
+experimenter is able to select the node which will become the source of the
+next cascade.
 
-mutual information approach 
+This setting is motivated by the following \emph{data stream} scenario: if the
+cascades are observed sequentially in an online manner at a very high rate,
+then it is impossible for the experimenter to process and use all the available
+data for learning.  One can instead cherry-pick from the stream the cascades
+originating at the nodes which are most helpful for the learning problem.
 
-two approx:
+We note that the above scenario should be more accurately described as
+\emph{Online Active Learning}: not only is the experimenter deciding the node
+at which the cascade originates, but also the decisions are made sequentially
+based only the past observations. We address both the online and active aspects
+below.
 
-- first step truncation
+\paragraph{Bayesian Experimental Design.} Bayesian Inference provides a natural
+framework for Active Learning: indeed, the posterior distribution quantifies
+the amount of uncertainty left in the parameters given the observations and can
+be used to design criteria to decide with experiment to perform next. This is
+known as \emph{Bayesian Experimental Design}. We refer the reader to
+\cite{chaloner} for a survey of the relevant literature.
+
+A well-studied criterion in Bayesian Experimental Design is the mutual
+information between the prior distribution and the observation, \emph{i.e} the
+difference between the entropy of the posterior and the prior distributions. In
+the specific case of Graph Inference, we define the utility function $U$ for
+selecting node $i$ given current prior distribution $\Theta$:
+\begin{displaymath}
+    U(i) = I(\Theta, \mathbf{x}\,|\, x^0 = e_i) = H(\Theta) - H(\Theta\,|\,
+    \mathbf{x}, x^0 = e_i)
+\end{displaymath}
+where $x^0 = e_i$ simply expresses that at time $t=0$ only node $i$ is infected
+and $\bx$ is a cascade whose conditional distribution $\bx\,|\,\Theta$ is given
+by \cref{eq:dist}. Given utility function $U$, our online active learning
+policy simply selects at each time step a node $i^*$ such that
+$i^*\in\argmax_{i\in V} U(i)$.
+
+Unfortunately, the utility function $U$ define above is computationally hard to
+evaluate. We propose instead the following surrogate utility function
+$\tilde{U}$:
+\begin{displaymath}
+    \tilde{U}(i) = \sum_{j\in V} \var(\Theta_{i,j}),\quad i\in V
+\end{displaymath}
+The justification of this surrogate utility function is given by
+\cref{prop:active} which shows that maximizing $\tilde{U}$ implies maximizing
+a lower bound on $U$. Note that when doing Variational Inference with Gaussian
+approximations as described in \cref{sec:inference}, then computing $\tilde{U}$
+simply involves summing one row in the matrix of the current estimates of the
+variances.
+
+\begin{proposition}
+    \label{prop:active} For all $i\in V$, $\tilde{U}(i) \leq U(i)$.
+\end{proposition}
+
+\begin{proof} \end{proof}
+
+\begin{remark} In the first part of the proof, we show that truncating
+    a cascade to keep only the first time step is a lower bound on the complete
+    mutual information. It has been noted in a different context \cite{} that
+    for the purpose of graph inference, the first step of a cascade is the most
+    informative.
+\end{remark}
+
+\paragraph{Online Bayesian Updates} bullshit on SGD on data streams. Cite
+"SGD as an online algorithm for data streams". Should tie this with our VI
+algorithm.
 
-- variance in lieu of MI