diff options
| author | Thibaut Horel <thibaut.horel@gmail.com> | 2015-12-09 18:28:34 -0500 |
|---|---|---|
| committer | Thibaut Horel <thibaut.horel@gmail.com> | 2015-12-09 18:28:34 -0500 |
| commit | 7140bf8f92ff2fa997cb563a6515a53238971e6a (patch) | |
| tree | 4b612df4e0236cf34c6cbff0679fc20465b55058 | |
| parent | 489fb722254fe791317f318130690ddd4b9c779e (diff) | |
| download | cascades-7140bf8f92ff2fa997cb563a6515a53238971e6a.tar.gz | |
Beginning of Active Learning
| -rw-r--r-- | finale/sections/active.tex | 71 |
1 files changed, 66 insertions, 5 deletions
diff --git a/finale/sections/active.tex b/finale/sections/active.tex index 84cf4c6..43ed75d 100644 --- a/finale/sections/active.tex +++ b/finale/sections/active.tex @@ -1,9 +1,70 @@ -intro (motivation, description) +In this section we study the problem of Graph Inference in the context of +Active Learning: instead of assuming that the source of a cascade is drawn from +a source distribution $p_s$, we assume that after each cascade observation, the +experimenter is able to select the node which will become the source of the +next cascade. -mutual information approach +This setting is motivated by the following \emph{data stream} scenario: if the +cascades are observed sequentially in an online manner at a very high rate, +then it is impossible for the experimenter to process and use all the available +data for learning. One can instead cherry-pick from the stream the cascades +originating at the nodes which are most helpful for the learning problem. -two approx: +We note that the above scenario should be more accurately described as +\emph{Online Active Learning}: not only is the experimenter deciding the node +at which the cascade originates, but also the decisions are made sequentially +based only the past observations. We address both the online and active aspects +below. -- first step truncation +\paragraph{Bayesian Experimental Design.} Bayesian Inference provides a natural +framework for Active Learning: indeed, the posterior distribution quantifies +the amount of uncertainty left in the parameters given the observations and can +be used to design criteria to decide with experiment to perform next. This is +known as \emph{Bayesian Experimental Design}. We refer the reader to +\cite{chaloner} for a survey of the relevant literature. + +A well-studied criterion in Bayesian Experimental Design is the mutual +information between the prior distribution and the observation, \emph{i.e} the +difference between the entropy of the posterior and the prior distributions. In +the specific case of Graph Inference, we define the utility function $U$ for +selecting node $i$ given current prior distribution $\Theta$: +\begin{displaymath} + U(i) = I(\Theta, \mathbf{x}\,|\, x^0 = e_i) = H(\Theta) - H(\Theta\,|\, + \mathbf{x}, x^0 = e_i) +\end{displaymath} +where $x^0 = e_i$ simply expresses that at time $t=0$ only node $i$ is infected +and $\bx$ is a cascade whose conditional distribution $\bx\,|\,\Theta$ is given +by \cref{eq:dist}. Given utility function $U$, our online active learning +policy simply selects at each time step a node $i^*$ such that +$i^*\in\argmax_{i\in V} U(i)$. + +Unfortunately, the utility function $U$ define above is computationally hard to +evaluate. We propose instead the following surrogate utility function +$\tilde{U}$: +\begin{displaymath} + \tilde{U}(i) = \sum_{j\in V} \var(\Theta_{i,j}),\quad i\in V +\end{displaymath} +The justification of this surrogate utility function is given by +\cref{prop:active} which shows that maximizing $\tilde{U}$ implies maximizing +a lower bound on $U$. Note that when doing Variational Inference with Gaussian +approximations as described in \cref{sec:inference}, then computing $\tilde{U}$ +simply involves summing one row in the matrix of the current estimates of the +variances. + +\begin{proposition} + \label{prop:active} For all $i\in V$, $\tilde{U}(i) \leq U(i)$. +\end{proposition} + +\begin{proof} \end{proof} + +\begin{remark} In the first part of the proof, we show that truncating + a cascade to keep only the first time step is a lower bound on the complete + mutual information. It has been noted in a different context \cite{} that + for the purpose of graph inference, the first step of a cascade is the most + informative. +\end{remark} + +\paragraph{Online Bayesian Updates} bullshit on SGD on data streams. Cite +"SGD as an online algorithm for data streams". Should tie this with our VI +algorithm. -- variance in lieu of MI |
