diff options
| author | jeanpouget-abadie <jean.pougetabadie@gmail.com> | 2015-11-03 16:40:24 -0500 |
|---|---|---|
| committer | jeanpouget-abadie <jean.pougetabadie@gmail.com> | 2015-11-03 16:40:24 -0500 |
| commit | d4fff5add651e98a1ce2e7c7aa6a2223c5771ca9 (patch) | |
| tree | 98f7558dfe9ed455f16025d5749bc660efd09324 /notes | |
| parent | 1533c8a87015c2c8b6ebf99321d5e197e7ba62e6 (diff) | |
| download | cascades-d4fff5add651e98a1ce2e7c7aa6a2223c5771ca9.tar.gz | |
adding bayesian framework to extensions file
Diffstat (limited to 'notes')
| -rw-r--r-- | notes/extensions.tex | 71 |
1 files changed, 67 insertions, 4 deletions
diff --git a/notes/extensions.tex b/notes/extensions.tex index 344555d..dd9933a 100644 --- a/notes/extensions.tex +++ b/notes/extensions.tex @@ -1,10 +1,73 @@ \documentclass{article} \usepackage{fullpage, amsmath, amssymb} - +\usepackage{algpseudocode, algorithm, algorithmicx} \title{Extensions} \begin{document} +\section{Bayesian Framework} + +\subsection{Bayesian Learning} + +Let $\theta$ be the parameters of the graph, let $S$ be a source node, let $X = +(X_1, \dots X_T)$ (where $T$ is potentially a stopping time) be the vectors +encoding a cascade on the graph $\mathcal{G}$. Assuming a prior $\mathcal{D}$ on +$\theta$, we have the following setup: + +\begin{alignat*}{2} + \theta & \sim \mathcal{D}\qquad &&\text{(prior on $\theta$)} \\ + X_0 = S & \sim D''&&\text{(random source distribution)}\\ + \forall t\geq 0, X_{t+1} | X_t, \theta &\sim \mathcal{D'} + &&\text{(cascade process)} +\end{alignat*} + +Prior work has focused on regularized lasso estimation, which is equivalent to +choosing an independent Laplace prior for $\theta$. In the context of social +network learning however, we can place more informative priors. We can + +\begin{itemize} +\item Take into account the existence of edges +\item Take into account the distinction between weak and strong ties +\item Take into account common graph structures, such as triangles +\end{itemize} + +We can sample from the posterior by MCMC. + + +\subsection{Active Learning} + +In this setup, $S$ is no longer drawn from a random distribution $\mathcal{D''}$ +but is chosen by the designer of the experiment. Once a source is drawn, a +cascade is drawn from the Markovian process $\mathcal{D'}$. We wish to choose +the source which maximises the information gain on the parameter $\theta$, so as +to speed up learning. We suggest to choose the source which maximises the mutual +information between $\theta$ and $X$ at each time step: + +\begin{algorithm} +\caption{Active Bayesian Learning through Cascades (ABC)} +\begin{algorithmic}[1] +\State $\theta \sim \mathcal{D}_0 = \mathcal{D}$ \Comment{Initial prior on +$\theta$} +\While{True} +\State $i \leftarrow \arg\min_{i} I(\theta, (X_t)_{t \geq 1} | X_0 = i)$ +\Comment{Maximize conditional mutual information} +\State Observe $(X_t)_{t\geq1} |S = i$ \Comment{Observe a cascade from chosen +source node} +\State $\mathcal{D}_{t+1} \gets \text{posterior of $\theta \sim \mathcal{D}_t$ +given $(X_t)_{t\geq1}$}$ \Comment{Update the distribution of $\theta$} +\EndWhile +\end{algorithmic} +\end{algorithm} + +Note that the mutual information can be computed node-by-node by sampling +$$I((X_t)_{t\geq 1} ,\theta | X_0 = i) = \mathbb{E}_{\theta \sim D_t; X | +\theta, i \sim D'}[\log p(X|\theta)] - \mathbb{E}_{\theta \sim D_t; X' | \theta, +i \sim D'}[\log [\mathbb{E}_{\theta\sim D_t}p(X'|\theta) ]]$$ + +The update to the posterior can be done implicitly by MCMC, since we only need +to sample $\theta\sim D_t$ in order to run the algorithm. + + \section{Gram matrix} In the ICML paper, we showed that the graph could be recovered as long as the @@ -25,9 +88,9 @@ In the voter model, $\mathbb{P}[X_j^{t+1} = 1 | X^t] = \sum_{i=1}^m A_{i,j} X^t_i$, where $A$ is the weighted adjacency matrix of the graph. Furthermore, the model runs indefinitely until time $T$, a hyperparameter of the model. Therefore, $\mathbb{E}[X_{n+1} | X_n] = A^T X_n$ and $\mathcal{G} = \sum_i A^i -\mathbb{E}[X_0 X_0^T] (A^T)^i$. In the case of the single-source model, +\mathbb{E}[X_0 X_0^T] {(A^T)}^i$. In the case of the single-source model, $\mathbb{E}[X_0X_0^T] = \frac{1}{m} I_m$ and it follows that $$\mathcal{G} = -\frac{1}{m} \sum_{i=1}^T A^i (A^T)^i$$ +\frac{1}{m} \sum_{i=1}^T A^i {(A^T)}^i$$ \section{Submodularity of Generalized Linear Cascades} @@ -48,7 +111,7 @@ the random draw of these thresholds. \section{Logistic regression} Let's try to fit a logit model to the cascades. The variance of the parameters -is approximated by $\hat V (\hat \beta) = (\sum p_i(1-p_i)X_i X_i^T)^{-1}$ +is approximated by $\hat V (\hat \beta) = {(\sum p_i(1-p_i)X_i X_i^T)}^{-1}$ Let's have a look at the matrix we are taking the inverse of. If none of the parents of node $i$ are active at step t, then the $t^{th}$ term is $0$. |
