\documentclass{article} \usepackage{fullpage, amsmath, amssymb} \usepackage{algpseudocode, algorithm, algorithmicx} \title{Extensions} \begin{document} \section{Bayesian Framework} \subsection{Bayesian Learning} Let $\theta$ be the parameters of the graph, let $S$ be a source node, let $X = (X_1, \dots X_T)$ (where $T$ is potentially a stopping time) be the vectors encoding a cascade on the graph $\mathcal{G}$. Assuming a prior $\mathcal{D}$ on $\theta$, we have the following setup: \begin{alignat*}{2} \theta & \sim \mathcal{D}\qquad &&\text{(prior on $\theta$)} \\ X_0 = S & \sim D''&&\text{(random source distribution)}\\ \forall t\geq 0, X_{t+1} | X_t, \theta &\sim \mathcal{D'} &&\text{(cascade process)} \end{alignat*} Prior work has focused on regularized lasso estimation, which is equivalent to choosing an independent Laplace prior for $\theta$. In the context of social network learning however, we can place more informative priors. We can \begin{itemize} \item Take into account the existence of edges \item Take into account the distinction between weak and strong ties \item Take into account common graph structures, such as triangles \end{itemize} We can sample from the posterior by MCMC. This might not be a very fast solution however. In the case of the independent cascade model, there is an easier solution: we can use the EM algorithm to compute the posterior, by using the parent that \emph{does} infect us (if at all) as the latent variable in the model. \subsection{Active Learning} In this setup, $S$ is no longer drawn from a random distribution $\mathcal{D''}$ but is chosen by the designer of the experiment. Once a source is drawn, a cascade is drawn from the Markovian process $\mathcal{D'}$. We wish to choose the source which maximises the information gain on the parameter $\theta$, so as to speed up learning. We suggest to choose the source which maximises the mutual information between $\theta$ and $X$ at each time step: \begin{algorithm} \caption{Active Bayesian Learning through Cascades (ABC)} \begin{algorithmic}[1] \State $\theta \sim \mathcal{D}_0 = \mathcal{D}$ \Comment{Initial prior on $\theta$} \While{True} \State $i \leftarrow \arg\min_{i} I(\theta, (X_t)_{t \geq 1} | X_0 = i)$ \Comment{Maximize conditional mutual information} \State Observe $(X_t)_{t\geq1} |S = i$ \Comment{Observe a cascade from chosen source node} \State $\mathcal{D}_{t+1} \gets \text{posterior of $\theta \sim \mathcal{D}_t$ given $(X_t)_{t\geq1}$}$ \Comment{Update the distribution of $\theta$} \EndWhile \end{algorithmic} \end{algorithm} Note that the mutual information can be computed node-by-node by sampling $$I((X_t)_{t\geq 1} ,\theta | X_0 = i) = \mathbb{E}_{\theta \sim D_t; X | \theta, i \sim D'}[\log p(X|\theta)] - \mathbb{E}_{\theta \sim D_t; X' | \theta, i \sim D'}[\log [\mathbb{E}_{\theta\sim D_t}p(X'|\theta) ]]$$ The update to the posterior can be done implicitly by MCMC, since we only need to sample $\theta\sim D_t$ in order to run the algorithm. \section{Gram matrix} In the ICML paper, we showed that the graph could be recovered as long as the expected gram matrix $\mathcal{G}$ of observations had its restricted eigenvalues lower bounded away from $0$. Recall that the gram matrix $G$ is given by $X X^T$, where $X$ is an $n\times m$ matrix, where $n$ is the number of measurements and $m$ is the number of nodes and if $X_i$ is a column vector indicating the nodes infected at step $i$, then the rows of $X$ are given by $X_i^T$. It follows that $$\mathcal{G} = \mathbb{E}[\sum_i X_i X_i^T]$$ Note that the indices of the sum are themselves random variables! Furthermore, recall the definition of a restricted eigenvalue as $\|X_{S^c}\|_1 \leq 3 \|X_S\|_1$. \subsection{Voter model} In the voter model, $\mathbb{P}[X_j^{t+1} = 1 | X^t] = \sum_{i=1}^m A_{i,j} X^t_i$, where $A$ is the weighted adjacency matrix of the graph. Furthermore, the model runs indefinitely until time $T$, a hyperparameter of the model. Therefore, $\mathbb{E}[X_{n+1} | X_n] = A^T X_n$ and $\mathcal{G} = \sum_i A^i \mathbb{E}[X_0 X_0^T] {(A^T)}^i$. In the case of the single-source model, $\mathbb{E}[X_0X_0^T] = \frac{1}{m} I_m$ and it follows that $$\mathcal{G} = \frac{1}{m} \sum_{i=1}^T A^i {(A^T)}^i$$ \section{Submodularity of Generalized Linear Cascades} For which link functions is the resulting influence function submodular? We know it to be true for the IC and LT model, what about the Logistic cascade model? The answer is no. If we take the example of three nodes $(A, B, C)$ with the possiblity to influence a remaining node D with respective edge weights: $a$, $b$, and $c$, then we see that the following equality must be verified: $$2\sigma(a+b) \geq \sigma(a+b+c) + \sigma(a)$$ We see that for $a=.5$, $,b=.5$, and $c=1$, the equality is violated. Interestingly however is that if we let the scale parameter of the sigmoid go to infinity, it is harder to violate the inequality. (TH) note that the LT model is NOT submodular for every fixed value of thresholds, but only in expectation over the random draw of these thresholds. \section{Logistic regression} Let's try to fit a logit model to the cascades. The variance of the parameters is approximated by $\hat V (\hat \beta) = {(\sum p_i(1-p_i)X_i X_i^T)}^{-1}$ Let's have a look at the matrix we are taking the inverse of. If none of the parents of node $i$ are active at step t, then the $t^{th}$ term is $0$. Similarly, if the probability that node $i$ becomes active is $1$, then the term cancels out. We can therefore write:$$A = \sum_{g_i \notin \{0,1\}} g_i (1-g_i) X_i X_i^T$$ Furthermore, we notice that $x^TAx = \sum_{g_i \notin\{0,1\}} g_i(1-g_i)x^TX_i X_i^Tx = \sum_{g_i \notin \{0,1\}} g_i(1-g_i)\|X_i^Tx\|_2^2$. This matrix-vector product is zero as soon as I can find one node $a$ which can never be active when the parents of my node are active (think line graph or circle with single source law). Suppose now that with high probability, each parent has been active at least once. If I consider the union of all nodes which were active just before $X_i$ was, then by considering only this sub-space of parameters, I can avoid the previous pitfall. But is it invertible nonetheless? Let's consider a line and node $i \notin \{2, n-2\}$ along that line. The parents cannot be infected at the same time, depending on whether the source $s \geq i$ or $s \leq i$. The matrix $A$ is diagonal-by-block. \end{document}