2 files changed, 186 insertions, 0 deletions
diff --git a/notes/formalisation.tex b/notes/formalisation.tex
new file mode 100644
index 0000000..53786d3
--- /dev/null
+++ b/notes/formalisation.tex
@@ -0,0 +1,175 @@
+\documentclass[10pt]{article}
+\usepackage{fullpage, amsmath, amssymb, bbm, amsthm}
+
+\newtheorem{theorem}{Theorem}[section]
+\newtheorem{lemma}[theorem]{Lemma}
+\newtheorem{proposition}[theorem]{Proposition}
+\newtheorem{corollary}[theorem]{Corollary}
+\newenvironment{remark}[1][Remark]{\begin{trivlist}
+\item[\hskip \labelsep {\bfseries #1}]}{\end{trivlist}}
+
+
+\date{}
+\title{Cracking Cascades: Applying Sparse Recovery to Graph Reconstruction}
+\author{Jean Pouget-Abadie}
+
+\begin{document}
+\maketitle
+
+\section{Preliminaries: What is a measurement?}
+
+\begin{itemize}
+\item Introduction
+\item Simple graphs (cycle, star, path, tree?)
+\item Isotropic condition
+\item RIP condition
+\end{itemize}
+
+One last property we are going to look for in measurements is the {\bf Restriced Isometry Property} (RIP), which was introduced in \cite{???} and is ubiquitous to the compressed sensing literature. Suppose we are given a measurement matrix $M$. For every $T \in \mathbb{N}$, we define $\delta_T$ to be the smallest quantity such that, for any submatrix $M_T$ of $M$ obtained by extracting $T$ columns of $M$, and for any vector $c$ such that $\| c\|_2 = 1$:
+
+$$1 - \delta_T \leq \| M_T c \|_2^2 \leq 1 + \delta_T$$
+
+For small $\delta_T$, the above equation defines a `loose' orthonormality property for the columns of $M$.
+
+\section{Voter Model}
+\subsection{The Model}
+
+Recap on what the model is
+
+\subsection{Formulating the sparse recovery problem}
+
+\subsection{Results under strong assumptions}
+
+\subsection{Results under RIP condition}
+
+\subsection{Results under isotropic condition}
+
+\section{Independent Cascade Model}
+\subsection{The Model}
+Recall the independent cascade model, where nodes have 3 states: susceptible, active, and inactive. All nodes start either as susceptible or active. At each time step t, for every (active, susceptible)-pair $(j,i)$, $j$ infects $i$ with probability $p_{j,i}$. At t+1, $j$ becomes inactive and $i$ becomes active. The cascade continues until no active nodes remain.
+
+Consider a node i. If $p_{j,i}$ is the probability that node $j$ infects $i$ at the next step, let $\theta_{i,j} := \log (1 - p_{j,i})$. Note that $\theta_{j,i} = 0 \Leftrightarrow p_{j,i} = 0$. Also note that note that 
+
+$$\sum_{j\in S} \theta_{j,i} = \log \prod_{j \in S} (1 - p_{j,i}) = \log \mathbb{P}(\text{i not infected by any j} \in S)$$
+
+Our objective is to recover the vector $\theta_{*,i}$ for every node $i$.
+
+\subsection{Formulating the sparse recovery problem}
+The idea behind what follows is the assumption that $p_{*,i}$ and therefore $\theta_{*,i}$ are sparse vectors: a node has few neighbors (parents in the directed graph) compared to the size of the graph.
+
+Let $\mathbbm{1}_S$ be the indicator variable for the nodes that are active at a certain time step t of a certain cascade. Under the previous assumption, we have: $$< \mathbbm{1}_S, \theta_{*,i}> = \log \mathbb{P}( \text{i not infected at } t+1| S \text{ active at } t)$$
+
+Suppose we observe a series of cascades $(I_k^t)$, where $I_k^t$ are the nodes infected in cascade $k$ at time $t$. Recall that once a node has been infected (becomes active), it cannot be infected in the future (is inactive). Let us therefore look at all the sets $(I_k^t)_{t \leq t_i^k} $ such that for every cascade $k$, $t_i^k > t$, where $t_i^k = + \infty$ if $i$ was not infected in that cascade. If we concatenate these rows $\mathbbm{1}_{I_K^t}$ into a matrix $M$ (for measurement), we have:
+
+$$M \theta_{*,i} = b$$
+
+where $b$ is a column vector corresponding to $\log \mathbb{P}(\text{i was not infected at } t+1| S \text{ active at } t)$. Note that even if we had access to $b$ directly but $M$ does not have full column rank, then solving for $x$ is not trivial! However, under sparsity assumptions on $x$, we could apply results for the sparse recovery/sparse coding/compressed sensing literature 
+
+In reality, we do not have access to $b$. We have access to the realization vector $\vec w$ of $\{i$ was infected at $t+1 | S$ active at $t\}$:
+
+$$\vec w\sim B(1 - e^{M \theta_{*,i}})$$
+
+where the operation $f: \vec v \rightarrow 1 - e^{\vec v}$ is done element-wise and $B(\vec p)$ is a vector whose entries are bernoulli variables of parameter the entries of $\vec p$.
+
+\subsection{Results under strong assumptions}
+
+What can we hope to have etc. (If M has full column rank or if same sets S occur frequently).
+
+\subsection{Result under RIP condition}
+
+For ease of notation, let $\theta := \theta_{*,i}$ The following resolution of the above problem is heavily borrowed from \cite{2005math......3066C}. We can see $w$ as a bounded perturbation of $f(b)$: 
+$$w = f(M\theta) + \epsilon, \text{ where } \|\epsilon\|_{\infty} < 1$$
+
+Define the following program, which we can solve with all available information $M$ and $w$:
+\begin{equation}
+(N_1) \quad \min_\theta \|\theta\|_1 \quad \text{subject to} \quad \|e^{M\theta} - (1 - w) \|_2 \leq \|\epsilon \|_\infty \nonumber
+\end{equation}
+
+While we can hope for the columns of $M$ to be `loosely' orthogonal, we must normalize them if we hope for $\delta_T$ to be small. Let $M'$ correspond to $M$ where the columns have been normalized, we will therefore define $\epsilon'$ and $M'$, such that:
+$$w = f(M'\theta) + \epsilon'$$
+We see that $\epsilon'= w - (w - \epsilon)^{\vec \lambda}$ (element-wise operation), where $\lambda_i = 1/\sqrt{\sum \text{col}(M)_i}$ such that $\|\epsilon'\|_{\infty} < 1$. We can define the following program:
+
+\begin{equation}
+(N_2) \quad \min_\theta \|\theta\|_1 \quad \text{subject to} \quad \|e^{M'\theta} - (1 - w) \|_2 \leq \|\epsilon' \|_\infty \nonumber
+\end{equation}
+
+With the corresponding $\delta_T'$ defined for $M'$, we are going to prove the following:
+
+\begin{theorem}
+
+Suppose that all nodes in our graph have {\bf bounded degree by $S$}, that no edge between $i$ and its neighbors has a probability of $1$, such that $\forall j, p_{j,i} < 1 - \eta \ \text{ where } 0<\eta<1$, and that $\delta'_{3S} + 3 \delta'_{4S} < 2$. If $\theta_0$ is the true vector for node $i$, the solution $\hat \theta$ to $(N_2)$ obeys:
+$$\|\hat \theta - \theta_0 \|_2 \leq \frac{C_S}{\eta} \|\epsilon' \|_{\infty} $$ where $C_S$ is well behaved ($\approx 10$ for reasonable values of $\delta'_{4S}$)
+\end{theorem}
+
+
+\begin{proof}
+The proof borrows heavily from \cite{2005math......3066C}. Define $h := \theta - \theta_0$. We list the results as they are given in \cite{2005math......3066C} or as they can be adapted in our case.
+
+\begin{itemize}
+\item $\|e^{M'\theta} - e^{M' \theta_0} \|_2 \leq \| e^{M'\theta} - ( 1 - w) \|_2 + \| e^{M'\theta} - (1 - w) \|_2 \leq 2 \|\epsilon' \|_{\infty} $
+\item $\|h\|_2^2 \leq (1 + |T_0|/N) \|h_{T_{01}}\|_2^2$, where N is an arbitrary positive integer, and $h_{T_{01}}$ is the vector $h$, where all entries are masked to 0, except those corresponding to the non-zero entries of $\theta_0$ and the $N$ largest entries of $h_{T_0^c}$.
+\item $\|e^{M'h} \|_2 \geq |\eta| \|M'h\|_2   \geq |\eta| C_{T_0, N} \cdot \|h_{T_{0,1}} \|_2$, where $C_{T_0,N} := \sqrt{1 - \delta'_{N + T_0}}  - \sqrt{T_0 / N} \sqrt{1 + \delta'_N}$
+\end{itemize}
+The first bullet point is obtained by the triangle inequality. The second bullet point is identical to the proof in \cite{2005math......3066C}. The first inequality in the third bullet-point is obtained because by assumption all entries of $M'\theta$ are greater than $\ln \eta$ (remember that the $e^{\vec v}$ operation is done element-wise). The rest follows like in \cite{2005math......3066C}, and we obtain that:
+$$\|h\|_2 \leq 2 \frac{\sqrt{1 + T_0/N}}{ \eta \cdot C_{T_0, N}} \|\epsilon'\|_{\infty} = \frac{C_S}{\eta} \|\epsilon'\|_{\infty}$$
+For $\delta'_{4S} \leq \frac{1}{5}, C_S \approx 8.82$, for $\delta'_{4S} \leq \frac{1}{4}, C_S \approx 10.47$
+\end{proof}
+
+\begin{remark} A few notes on the result:
+\begin{itemize}
+\item With this naive reduction of $w$ as an $e-$pertubation to $f(b)$, the best $\|\epsilon\|_{\infty}$ we can hope for is $1 - \eta$. Seeing $w$ as more than an arbitrary perturbation will most likely yield a much better upper-bound.
+\item  The upper-bound is done in $\log$-space for the true probabilities:
+$$\frac{C_S}{\eta} > \sum \left[ \log \left(\frac{1 - p_j}{1 - p_j'} \right) \right]^2 > \frac{\eta \log(1/\eta)}{1 - \eta} \| \frac{1 - p_j}{1 - p'_j} - 1 \|  $$
+
+This bullet-point needs to be finished
+\item To what extent can we approximate our measurement matrix to a matrix with independent bernoulli entries? What $\delta_T$'s can we hope for? Which constant $C_S$ can we hope for? How does it change with the rows of $M$?
+\end{itemize}
+
+\end{remark}
+
+
+
+\subsection{Results under isotropic condition}
+
+
+
+% Given $M$ and $w$, we can solve the following maximum log-likelihood problem:
+% \begin{equation}
+% \max_{\|\theta_{*,i}\|_0 \leq k}  <w,\log f(M \theta_{*,i}) + <1 - w, M\theta_{*,i}>
+% \label{eq:max_loglik_obj}
+% \end{equation}
+
+% Since optimizing under the $\|\theta\|_0 \leq k$ is hard, we solve instead the Lagrangian relaxation:
+
+% \begin{equation}
+% \max_{\theta_{*,i}}  <w,\log f(M \theta_{*,i}) + <1 - w, M\theta_{*,i}> + \lambda \|\theta_{*,i}\|_0 
+% \label{eq:lagra_max_loglik_obj}
+% \end{equation}
+
+
+% Eq~\ref{eq:lagra_max_loglik_obj} is concave in $\theta_{*,i}$ and can therefore be solved efficiently. We would like to exploit the sparsity of $\theta_{*,i}$ to show that we recover the true sparse $\theta_{*,i}$ with few measurements (rows of M).
+
+
+\bibliography{sparse}{}
+\bibliographystyle{plain}
+
+\end{document}
+
+
+
+
+% Let us now consider the matrix M, whose rows are the vectors $\mathbbm{1}_S$. If we could estimate the vector $X := \mathbb{P}(i \text{ not infected at } t | S \text{ active at } t)$, we could solve for $\theta$:
+% $$M \theta_{*i} = X, \text{ where }\theta_{*,i} \text{ is a sparse vector}$$
+
+% There are many powerful tools from sparse coding theory which would allow us to recover the sparsest $\theta$ under certain assumptions on $M$. One such assumption is incoherence of the columns of $M$. In other words, if we can show that for all columns $M_i, M_j$ of M, we have $|<M_i,M_s>| \leq \mu \|M_i\| \|M_j\|$, i.e M is $\mu-incoherent$, then for any node $i$ with fewer than $k = \frac{1}{2 \mu}$ parents, we can recover $\theta_{*i}$ exactly. To get a better intuition of what this means, if the entries of $M$ were independent gaussian variables, we would need approximately $k \ln(\frac{n}{k})$ measurements to recover $\theta_{*i}$ exactly. 
+
+% This would seem to suggest that the quality of observed cascades is much more important than quantity--a notion which has not truly been explored in graph reconstruction as of yet. Another stronger assumption on the columns of M is separability (see Moitra's work in Topic Modeling) leading to even stronger results.
+
+% In reality of course, we do not know $X$. Instead, we have access to $\hat X$ which estimates $X$ in the following way. For any set S in any cascade that was active before i was, we obtain a binary signal, $\hat X_S:=0$ if $i$ was infected immediately afterwards and $\hat X_S:=1$ otherwise. As the number of times set S was observed active, we have $$\frac{1}{m} \sum \hat X_S \rightarrow \mathbb{P}(i \text{ not infected at }t+1 | S \text{ active at } t)$$.
+
+% Certain sets might have been infected many times in our set of cascades, certain sets might have been infected only once. One possible way to rewrite the program above is: $$\hat X_j = H(w_j - e^{(M \theta_{*i})_j}), \text{ where } w\sim {\cal U}[0,1] \text{ and H is the step function } \mathbbm{1}_{\cdot \geq 0}$$
+
+% This problem seems to be closely related to the growing field of 1-bit compressed sensing, with at least one difference: when scientists consider 1bitCS with noise, they are usually in a position to decide which measurement to do next to recover $\theta_{*i}$, whereas in this framework, we are limited to an observer status\footnote{Research in the other direction could be very interesting: see discussion about exploration-exploitation with Thibaut}.
+
+% I need to do more reading on the subject, namely to understand what guarantees people have found for sample complexity.
+
diff --git a/notes/sparse.bib b/notes/sparse.bib
new file mode 100644
index 0000000..7e682a1
--- /dev/null
+++ b/notes/sparse.bib
@@ -0,0 +1,11 @@
+@ARTICLE{2005math......3066C,
+   author = {{Candes}, E. and {Romberg}, J. and {Tao}, T.},
+    title = "{Stable Signal Recovery from Incomplete and Inaccurate Measurements}",
+  journal = {ArXiv Mathematics e-prints},
+   eprint = {math/0503066},
+ keywords = {Mathematics - Numerical Analysis, 94A12, 41A45, 42A10},
+     year = 2005,
+    month = mar,
+   adsurl = {http://adsabs.harvard.edu/abs/2005math......3066C},
+  adsnote = {Provided by the SAO/NASA Astrophysics Data System}
+}
+\ No newline at end of file