diff options
Diffstat (limited to 'notes')
| -rw-r--r-- | notes/formalisation.tex | 175 | ||||
| -rw-r--r-- | notes/sparse.bib | 11 |
2 files changed, 186 insertions, 0 deletions
diff --git a/notes/formalisation.tex b/notes/formalisation.tex new file mode 100644 index 0000000..53786d3 --- /dev/null +++ b/notes/formalisation.tex @@ -0,0 +1,175 @@ +\documentclass[10pt]{article} +\usepackage{fullpage, amsmath, amssymb, bbm, amsthm} + +\newtheorem{theorem}{Theorem}[section] +\newtheorem{lemma}[theorem]{Lemma} +\newtheorem{proposition}[theorem]{Proposition} +\newtheorem{corollary}[theorem]{Corollary} +\newenvironment{remark}[1][Remark]{\begin{trivlist} +\item[\hskip \labelsep {\bfseries #1}]}{\end{trivlist}} + + +\date{} +\title{Cracking Cascades: Applying Sparse Recovery to Graph Reconstruction} +\author{Jean Pouget-Abadie} + +\begin{document} +\maketitle + +\section{Preliminaries: What is a measurement?} + +\begin{itemize} +\item Introduction +\item Simple graphs (cycle, star, path, tree?) +\item Isotropic condition +\item RIP condition +\end{itemize} + +One last property we are going to look for in measurements is the {\bf Restriced Isometry Property} (RIP), which was introduced in \cite{???} and is ubiquitous to the compressed sensing literature. Suppose we are given a measurement matrix $M$. For every $T \in \mathbb{N}$, we define $\delta_T$ to be the smallest quantity such that, for any submatrix $M_T$ of $M$ obtained by extracting $T$ columns of $M$, and for any vector $c$ such that $\| c\|_2 = 1$: + +$$1 - \delta_T \leq \| M_T c \|_2^2 \leq 1 + \delta_T$$ + +For small $\delta_T$, the above equation defines a `loose' orthonormality property for the columns of $M$. + +\section{Voter Model} +\subsection{The Model} + +Recap on what the model is + +\subsection{Formulating the sparse recovery problem} + +\subsection{Results under strong assumptions} + +\subsection{Results under RIP condition} + +\subsection{Results under isotropic condition} + +\section{Independent Cascade Model} +\subsection{The Model} +Recall the independent cascade model, where nodes have 3 states: susceptible, active, and inactive. All nodes start either as susceptible or active. At each time step t, for every (active, susceptible)-pair $(j,i)$, $j$ infects $i$ with probability $p_{j,i}$. At t+1, $j$ becomes inactive and $i$ becomes active. The cascade continues until no active nodes remain. + +Consider a node i. If $p_{j,i}$ is the probability that node $j$ infects $i$ at the next step, let $\theta_{i,j} := \log (1 - p_{j,i})$. Note that $\theta_{j,i} = 0 \Leftrightarrow p_{j,i} = 0$. Also note that note that + +$$\sum_{j\in S} \theta_{j,i} = \log \prod_{j \in S} (1 - p_{j,i}) = \log \mathbb{P}(\text{i not infected by any j} \in S)$$ + +Our objective is to recover the vector $\theta_{*,i}$ for every node $i$. + +\subsection{Formulating the sparse recovery problem} +The idea behind what follows is the assumption that $p_{*,i}$ and therefore $\theta_{*,i}$ are sparse vectors: a node has few neighbors (parents in the directed graph) compared to the size of the graph. + +Let $\mathbbm{1}_S$ be the indicator variable for the nodes that are active at a certain time step t of a certain cascade. Under the previous assumption, we have: $$< \mathbbm{1}_S, \theta_{*,i}> = \log \mathbb{P}( \text{i not infected at } t+1| S \text{ active at } t)$$ + +Suppose we observe a series of cascades $(I_k^t)$, where $I_k^t$ are the nodes infected in cascade $k$ at time $t$. Recall that once a node has been infected (becomes active), it cannot be infected in the future (is inactive). Let us therefore look at all the sets $(I_k^t)_{t \leq t_i^k} $ such that for every cascade $k$, $t_i^k > t$, where $t_i^k = + \infty$ if $i$ was not infected in that cascade. If we concatenate these rows $\mathbbm{1}_{I_K^t}$ into a matrix $M$ (for measurement), we have: + +$$M \theta_{*,i} = b$$ + +where $b$ is a column vector corresponding to $\log \mathbb{P}(\text{i was not infected at } t+1| S \text{ active at } t)$. Note that even if we had access to $b$ directly but $M$ does not have full column rank, then solving for $x$ is not trivial! However, under sparsity assumptions on $x$, we could apply results for the sparse recovery/sparse coding/compressed sensing literature + +In reality, we do not have access to $b$. We have access to the realization vector $\vec w$ of $\{i$ was infected at $t+1 | S$ active at $t\}$: + +$$\vec w\sim B(1 - e^{M \theta_{*,i}})$$ + +where the operation $f: \vec v \rightarrow 1 - e^{\vec v}$ is done element-wise and $B(\vec p)$ is a vector whose entries are bernoulli variables of parameter the entries of $\vec p$. + +\subsection{Results under strong assumptions} + +What can we hope to have etc. (If M has full column rank or if same sets S occur frequently). + +\subsection{Result under RIP condition} + +For ease of notation, let $\theta := \theta_{*,i}$ The following resolution of the above problem is heavily borrowed from \cite{2005math......3066C}. We can see $w$ as a bounded perturbation of $f(b)$: +$$w = f(M\theta) + \epsilon, \text{ where } \|\epsilon\|_{\infty} < 1$$ + +Define the following program, which we can solve with all available information $M$ and $w$: +\begin{equation} +(N_1) \quad \min_\theta \|\theta\|_1 \quad \text{subject to} \quad \|e^{M\theta} - (1 - w) \|_2 \leq \|\epsilon \|_\infty \nonumber +\end{equation} + +While we can hope for the columns of $M$ to be `loosely' orthogonal, we must normalize them if we hope for $\delta_T$ to be small. Let $M'$ correspond to $M$ where the columns have been normalized, we will therefore define $\epsilon'$ and $M'$, such that: +$$w = f(M'\theta) + \epsilon'$$ +We see that $\epsilon'= w - (w - \epsilon)^{\vec \lambda}$ (element-wise operation), where $\lambda_i = 1/\sqrt{\sum \text{col}(M)_i}$ such that $\|\epsilon'\|_{\infty} < 1$. We can define the following program: + +\begin{equation} +(N_2) \quad \min_\theta \|\theta\|_1 \quad \text{subject to} \quad \|e^{M'\theta} - (1 - w) \|_2 \leq \|\epsilon' \|_\infty \nonumber +\end{equation} + +With the corresponding $\delta_T'$ defined for $M'$, we are going to prove the following: + +\begin{theorem} + +Suppose that all nodes in our graph have {\bf bounded degree by $S$}, that no edge between $i$ and its neighbors has a probability of $1$, such that $\forall j, p_{j,i} < 1 - \eta \ \text{ where } 0<\eta<1$, and that $\delta'_{3S} + 3 \delta'_{4S} < 2$. If $\theta_0$ is the true vector for node $i$, the solution $\hat \theta$ to $(N_2)$ obeys: +$$\|\hat \theta - \theta_0 \|_2 \leq \frac{C_S}{\eta} \|\epsilon' \|_{\infty} $$ where $C_S$ is well behaved ($\approx 10$ for reasonable values of $\delta'_{4S}$) +\end{theorem} + + +\begin{proof} +The proof borrows heavily from \cite{2005math......3066C}. Define $h := \theta - \theta_0$. We list the results as they are given in \cite{2005math......3066C} or as they can be adapted in our case. + +\begin{itemize} +\item $\|e^{M'\theta} - e^{M' \theta_0} \|_2 \leq \| e^{M'\theta} - ( 1 - w) \|_2 + \| e^{M'\theta} - (1 - w) \|_2 \leq 2 \|\epsilon' \|_{\infty} $ +\item $\|h\|_2^2 \leq (1 + |T_0|/N) \|h_{T_{01}}\|_2^2$, where N is an arbitrary positive integer, and $h_{T_{01}}$ is the vector $h$, where all entries are masked to 0, except those corresponding to the non-zero entries of $\theta_0$ and the $N$ largest entries of $h_{T_0^c}$. +\item $\|e^{M'h} \|_2 \geq |\eta| \|M'h\|_2 \geq |\eta| C_{T_0, N} \cdot \|h_{T_{0,1}} \|_2$, where $C_{T_0,N} := \sqrt{1 - \delta'_{N + T_0}} - \sqrt{T_0 / N} \sqrt{1 + \delta'_N}$ +\end{itemize} +The first bullet point is obtained by the triangle inequality. The second bullet point is identical to the proof in \cite{2005math......3066C}. The first inequality in the third bullet-point is obtained because by assumption all entries of $M'\theta$ are greater than $\ln \eta$ (remember that the $e^{\vec v}$ operation is done element-wise). The rest follows like in \cite{2005math......3066C}, and we obtain that: +$$\|h\|_2 \leq 2 \frac{\sqrt{1 + T_0/N}}{ \eta \cdot C_{T_0, N}} \|\epsilon'\|_{\infty} = \frac{C_S}{\eta} \|\epsilon'\|_{\infty}$$ +For $\delta'_{4S} \leq \frac{1}{5}, C_S \approx 8.82$, for $\delta'_{4S} \leq \frac{1}{4}, C_S \approx 10.47$ +\end{proof} + +\begin{remark} A few notes on the result: +\begin{itemize} +\item With this naive reduction of $w$ as an $e-$pertubation to $f(b)$, the best $\|\epsilon\|_{\infty}$ we can hope for is $1 - \eta$. Seeing $w$ as more than an arbitrary perturbation will most likely yield a much better upper-bound. +\item The upper-bound is done in $\log$-space for the true probabilities: +$$\frac{C_S}{\eta} > \sum \left[ \log \left(\frac{1 - p_j}{1 - p_j'} \right) \right]^2 > \frac{\eta \log(1/\eta)}{1 - \eta} \| \frac{1 - p_j}{1 - p'_j} - 1 \| $$ + +This bullet-point needs to be finished +\item To what extent can we approximate our measurement matrix to a matrix with independent bernoulli entries? What $\delta_T$'s can we hope for? Which constant $C_S$ can we hope for? How does it change with the rows of $M$? +\end{itemize} + +\end{remark} + + + +\subsection{Results under isotropic condition} + + + +% Given $M$ and $w$, we can solve the following maximum log-likelihood problem: +% \begin{equation} +% \max_{\|\theta_{*,i}\|_0 \leq k} <w,\log f(M \theta_{*,i}) + <1 - w, M\theta_{*,i}> +% \label{eq:max_loglik_obj} +% \end{equation} + +% Since optimizing under the $\|\theta\|_0 \leq k$ is hard, we solve instead the Lagrangian relaxation: + +% \begin{equation} +% \max_{\theta_{*,i}} <w,\log f(M \theta_{*,i}) + <1 - w, M\theta_{*,i}> + \lambda \|\theta_{*,i}\|_0 +% \label{eq:lagra_max_loglik_obj} +% \end{equation} + + +% Eq~\ref{eq:lagra_max_loglik_obj} is concave in $\theta_{*,i}$ and can therefore be solved efficiently. We would like to exploit the sparsity of $\theta_{*,i}$ to show that we recover the true sparse $\theta_{*,i}$ with few measurements (rows of M). + + +\bibliography{sparse}{} +\bibliographystyle{plain} + +\end{document} + + + + +% Let us now consider the matrix M, whose rows are the vectors $\mathbbm{1}_S$. If we could estimate the vector $X := \mathbb{P}(i \text{ not infected at } t | S \text{ active at } t)$, we could solve for $\theta$: +% $$M \theta_{*i} = X, \text{ where }\theta_{*,i} \text{ is a sparse vector}$$ + +% There are many powerful tools from sparse coding theory which would allow us to recover the sparsest $\theta$ under certain assumptions on $M$. One such assumption is incoherence of the columns of $M$. In other words, if we can show that for all columns $M_i, M_j$ of M, we have $|<M_i,M_s>| \leq \mu \|M_i\| \|M_j\|$, i.e M is $\mu-incoherent$, then for any node $i$ with fewer than $k = \frac{1}{2 \mu}$ parents, we can recover $\theta_{*i}$ exactly. To get a better intuition of what this means, if the entries of $M$ were independent gaussian variables, we would need approximately $k \ln(\frac{n}{k})$ measurements to recover $\theta_{*i}$ exactly. + +% This would seem to suggest that the quality of observed cascades is much more important than quantity--a notion which has not truly been explored in graph reconstruction as of yet. Another stronger assumption on the columns of M is separability (see Moitra's work in Topic Modeling) leading to even stronger results. + +% In reality of course, we do not know $X$. Instead, we have access to $\hat X$ which estimates $X$ in the following way. For any set S in any cascade that was active before i was, we obtain a binary signal, $\hat X_S:=0$ if $i$ was infected immediately afterwards and $\hat X_S:=1$ otherwise. As the number of times set S was observed active, we have $$\frac{1}{m} \sum \hat X_S \rightarrow \mathbb{P}(i \text{ not infected at }t+1 | S \text{ active at } t)$$. + +% Certain sets might have been infected many times in our set of cascades, certain sets might have been infected only once. One possible way to rewrite the program above is: $$\hat X_j = H(w_j - e^{(M \theta_{*i})_j}), \text{ where } w\sim {\cal U}[0,1] \text{ and H is the step function } \mathbbm{1}_{\cdot \geq 0}$$ + +% This problem seems to be closely related to the growing field of 1-bit compressed sensing, with at least one difference: when scientists consider 1bitCS with noise, they are usually in a position to decide which measurement to do next to recover $\theta_{*i}$, whereas in this framework, we are limited to an observer status\footnote{Research in the other direction could be very interesting: see discussion about exploration-exploitation with Thibaut}. + +% I need to do more reading on the subject, namely to understand what guarantees people have found for sample complexity. + diff --git a/notes/sparse.bib b/notes/sparse.bib new file mode 100644 index 0000000..7e682a1 --- /dev/null +++ b/notes/sparse.bib @@ -0,0 +1,11 @@ +@ARTICLE{2005math......3066C, + author = {{Candes}, E. and {Romberg}, J. and {Tao}, T.}, + title = "{Stable Signal Recovery from Incomplete and Inaccurate Measurements}", + journal = {ArXiv Mathematics e-prints}, + eprint = {math/0503066}, + keywords = {Mathematics - Numerical Analysis, 94A12, 41A45, 42A10}, + year = 2005, + month = mar, + adsurl = {http://adsabs.harvard.edu/abs/2005math......3066C}, + adsnote = {Provided by the SAO/NASA Astrophysics Data System} +}
\ No newline at end of file |
