\documentclass{article} \usepackage{fullpage, amsmath, amssymb} \title{Extensions} \begin{document} \section{Gram matrix} In the ICML paper, we showed that the graph could be recovered as long as the expected gram matrix $\mathcal{G}$ of observations had its restricted eigenvalues lower bounded away from $0$. Recall that the gram matrix $G$ is given by $X X^T$, where $X$ is an $n\times m$ matrix, where $n$ is the number of measurements and $m$ is the number of nodes and if $X_i$ is a column vector indicating the nodes infected at step $i$, then the rows of $X$ are given by $X_i^T$. It follows that $$\mathcal{G} = \mathbb{E}[\sum_i X_i X_i^T]$$ Note that the indices of the sum are themselves random variables! Furthermore, recall the definition of a restricted eigenvalue as $\|X_{S^c}\|_1 \leq 3 \|X_S\|_1$. \subsection{Voter model} In the voter model, $\mathbb{P}[X_j^{t+1} = 1 | X^t] = \sum_{i=1}^m A_{i,j} X^t_i$, where $A$ is the weighted adjacency matrix of the graph. Furthermore, the model runs indefinitely until time $T$, a hyperparameter of the model. Therefore, $\mathbb{E}[X_{n+1} | X_n] = A^T X_n$ and $\mathcal{G} = \sum_i A^i \mathbb{E}[X_0 X_0^T] (A^T)^i$. In the case of the single-source model, $\mathbb{E}[X_0X_0^T] = \frac{1}{m} I_m$ and it follows that $$\mathcal{G} = \frac{1}{m} \sum_{i=1}^T A^i (A^T)^i$$ \section{Submodularity of Generalized Linear Cascades} For which link functions is the resulting influence function submodular? We know it to be true for the IC and LT model, what about the Logistic cascade model? The answer is no. If we take the example of three nodes $(A, B, C)$ with the possiblity to influence a remaining node D with respective edge weights: $a$, $b$, and $c$, then we see that the following equality must be verified: $$2\sigma(a+b) \geq \sigma(a+b+c) + \sigma(a)$$ We see that for $a=.5$, $,b=.5$, and $c=1$, the equality is violated. Interestingly however is that if we let the scale parameter of the sigmoid go to infinity, it is harder to violate the inequality. (TH) note that the LT model is NOT submodular for every fixed value of thresholds, but only in expectation over the random draw of these thresholds. \section{Logistic regression} Let's try to fit a logit model to the cascades. The variance of the parameters is approximated by $\hat V (\hat \beta) = (\sum p_i(1-p_i)X_i X_i^T)^{-1}$ Let's have a look at the matrix we are taking the inverse of. If none of the parents of node $i$ are active at step t, then the $t^{th}$ term is $0$. Similarly, if the probability that node $i$ becomes active is $1$, then the term cancels out. We can therefore write:$$A = \sum_{g_i \notin \{0,1\}} g_i (1-g_i) X_i X_i^T$$ Furthermore, we notice that $x^TAx = \sum_{g_i \notin\{0,1\}} g_i(1-g_i)x^TX_i X_i^Tx = \sum_{g_i \notin \{0,1\}} g_i(1-g_i)\|X_i^Tx\|_2^2$. This matrix-vector product is zero as soon as I can find one node $a$ which can never be active when the parents of my node are active (think line graph or circle with single source law). Suppose now that with high probability, each parent has been active at least once. If I consider the union of all nodes which were active just before $X_i$ was, then by considering only this sub-space of parameters, I can avoid the previous pitfall. But is it invertible nonetheless? Let's consider a line and node $i \notin \{2, n-2\}$ along that line. The parents cannot be infected at the same time, depending on whether the source $s \geq i$ or $s \leq i$. The matrix $A$ is diagonal-by-block. \end{document}