adding bayesian framework to extensions file

author: jeanpouget-abadie <jean.pougetabadie@gmail.com> 2015-11-03 16:40:24 -0500
committer: jeanpouget-abadie <jean.pougetabadie@gmail.com> 2015-11-03 16:40:24 -0500
commit: d4fff5add651e98a1ce2e7c7aa6a2223c5771ca9 (patch)
tree: 98f7558dfe9ed455f16025d5749bc660efd09324
parent: 1533c8a87015c2c8b6ebf99321d5e197e7ba62e6 (diff)
download: cascades-d4fff5add651e98a1ce2e7c7aa6a2223c5771ca9.tar.gz
2 files changed, 67 insertions, 4 deletions
diff --git a/notes/extensions.tex b/notes/extensions.tex
index 344555d..dd9933a 100644
--- a/notes/extensions.tex
+++ b/notes/extensions.tex
@@ -1,10 +1,73 @@
 \documentclass{article}
 \usepackage{fullpage, amsmath, amssymb}
-
+\usepackage{algpseudocode, algorithm, algorithmicx}
 \title{Extensions}
 
 \begin{document}
 
+\section{Bayesian Framework}
+
+\subsection{Bayesian Learning}
+
+Let $\theta$ be the parameters of the graph, let $S$ be a source node, let $X =
+(X_1, \dots X_T)$ (where $T$ is potentially a stopping time) be the vectors
+encoding a cascade on the graph $\mathcal{G}$. Assuming a prior $\mathcal{D}$ on
+$\theta$, we have the following setup:
+
+\begin{alignat*}{2}
+ \theta & \sim \mathcal{D}\qquad &&\text{(prior on $\theta$)} \\
+ X_0 = S & \sim D''&&\text{(random source distribution)}\\
+ \forall t\geq 0, X_{t+1} | X_t, \theta &\sim \mathcal{D'}
+ &&\text{(cascade process)}
+\end{alignat*}
+
+Prior work has focused on regularized lasso estimation, which is equivalent to
+choosing an independent Laplace prior for $\theta$. In the context of social
+network learning however, we can place more informative priors. We can
+
+\begin{itemize}
+\item Take into account the existence of edges
+\item Take into account the distinction between weak and strong ties
+\item Take into account common graph structures, such as triangles
+\end{itemize}
+
+We can sample from the posterior by MCMC.
+
+
+\subsection{Active Learning}
+
+In this setup, $S$ is no longer drawn from a random distribution $\mathcal{D''}$
+but is chosen by the designer of the experiment. Once a source is drawn, a
+cascade is drawn from the Markovian process $\mathcal{D'}$.  We wish to choose
+the source which maximises the information gain on the parameter $\theta$, so as
+to speed up learning. We suggest to choose the source which maximises the mutual
+information between $\theta$ and $X$ at each time step:
+
+\begin{algorithm}
+\caption{Active Bayesian Learning through Cascades (ABC)}
+\begin{algorithmic}[1]
+\State $\theta \sim \mathcal{D}_0 = \mathcal{D}$ \Comment{Initial prior on
+$\theta$}
+\While{True}
+\State $i \leftarrow \arg\min_{i} I(\theta, (X_t)_{t \geq 1} | X_0 = i)$
+\Comment{Maximize conditional mutual information}
+\State Observe $(X_t)_{t\geq1} |S = i$ \Comment{Observe a cascade from chosen
+source node}
+\State $\mathcal{D}_{t+1} \gets \text{posterior of $\theta \sim \mathcal{D}_t$
+given $(X_t)_{t\geq1}$}$ \Comment{Update the distribution of $\theta$}
+\EndWhile
+\end{algorithmic}
+\end{algorithm}
+
+Note that the mutual information can be computed node-by-node by sampling
+$$I((X_t)_{t\geq 1} ,\theta | X_0 = i) = \mathbb{E}_{\theta \sim D_t; X |
+\theta, i \sim D'}[\log p(X|\theta)] - \mathbb{E}_{\theta \sim D_t; X' | \theta,
+i \sim D'}[\log [\mathbb{E}_{\theta\sim D_t}p(X'|\theta) ]]$$
+
+The update to the posterior can be done implicitly by MCMC, since we only need
+to sample $\theta\sim D_t$ in order to run the algorithm.
+
+
 \section{Gram matrix}
 
 In the ICML paper, we showed that the graph could be recovered as long as the
@@ -25,9 +88,9 @@ In the voter model, $\mathbb{P}[X_j^{t+1} = 1 | X^t] = \sum_{i=1}^m A_{i,j}
 X^t_i$, where $A$ is the weighted adjacency matrix of the graph. Furthermore,
 the model runs indefinitely until time $T$, a hyperparameter of the model.
 Therefore, $\mathbb{E}[X_{n+1} | X_n] = A^T X_n$ and $\mathcal{G} = \sum_i A^i
-\mathbb{E}[X_0 X_0^T] (A^T)^i$.  In the case of the single-source model,
+\mathbb{E}[X_0 X_0^T] {(A^T)}^i$.  In the case of the single-source model,
 $\mathbb{E}[X_0X_0^T] = \frac{1}{m} I_m$ and it follows that $$\mathcal{G} =
-\frac{1}{m} \sum_{i=1}^T A^i (A^T)^i$$
+\frac{1}{m} \sum_{i=1}^T A^i {(A^T)}^i$$
 
 \section{Submodularity of Generalized Linear Cascades}
 
@@ -48,7 +111,7 @@ the random draw of these thresholds.
 \section{Logistic regression}
 
 Let's try to fit a logit model to the cascades. The variance of the parameters
-is approximated by $\hat V (\hat \beta) = (\sum p_i(1-p_i)X_i X_i^T)^{-1}$
+is approximated by $\hat V (\hat \beta) = {(\sum p_i(1-p_i)X_i X_i^T)}^{-1}$
 
 Let's have a look at the matrix we are taking the inverse of. If none of the
 parents of node $i$ are active at step t, then the $t^{th}$ term is $0$.
diff --git a/old_work/formalisation.pdf b/old_work/formalisation.pdf
deleted file mode 100644
index 760abe0..0000000
--- a/old_work/formalisation.pdf
+++ /dev/null
author	jeanpouget-abadie <jean.pougetabadie@gmail.com>	2015-11-03 16:40:24 -0500
committer	jeanpouget-abadie <jean.pougetabadie@gmail.com>	2015-11-03 16:40:24 -0500
commit	d4fff5add651e98a1ce2e7c7aa6a2223c5771ca9 (patch)
tree	98f7558dfe9ed455f16025d5749bc660efd09324
parent	1533c8a87015c2c8b6ebf99321d5e197e7ba62e6 (diff)
download	cascades-d4fff5add651e98a1ce2e7c7aa6a2223c5771ca9.tar.gz