aboutsummaryrefslogtreecommitdiffstats
path: root/notes/formalisation.tex
diff options
context:
space:
mode:
Diffstat (limited to 'notes/formalisation.tex')
-rw-r--r--notes/formalisation.tex82
1 files changed, 73 insertions, 9 deletions
diff --git a/notes/formalisation.tex b/notes/formalisation.tex
index 53786d3..72e7cb9 100644
--- a/notes/formalisation.tex
+++ b/notes/formalisation.tex
@@ -1,13 +1,10 @@
\documentclass[10pt]{article}
-\usepackage{fullpage, amsmath, amssymb, bbm, amsthm}
-
-\newtheorem{theorem}{Theorem}[section]
-\newtheorem{lemma}[theorem]{Lemma}
-\newtheorem{proposition}[theorem]{Proposition}
-\newtheorem{corollary}[theorem]{Corollary}
-\newenvironment{remark}[1][Remark]{\begin{trivlist}
-\item[\hskip \labelsep {\bfseries #1}]}{\end{trivlist}}
-
+\usepackage[utf8x]{inputenc}
+\usepackage{amsmath, amssymb, bbm, amsthm}
+\usepackage[pagebackref=true,breaklinks=true,colorlinks=true,backref=false]{hyperref}
+\usepackage[capitalize, noabbrev]{cleveref}
+\usepackage{microtype}
+\input{defs}
\date{}
\title{Cracking Cascades: Applying Sparse Recovery to Graph Reconstruction}
@@ -18,6 +15,73 @@
\section{Preliminaries: What is a measurement?}
+There is an unknown graph $G=(V,E)$ that we wish to reconstruct by observing
+information cascades. An information cascade is a random discrete-time process
+where we get to observe the set of vertices infected at each time step.
+
+The key contribution is to formulate this problem as a sparse recovery problem
+akin to those found in the compressed sensing literature. The parallel between
+the two problems is as follows:
+\begin{itemize}
+ \item the unknown \emph{signal} $s$ is a vector in $\{0,1\}^{V\times V}$
+ with $s_{u,v} = 1$ iff. $(u,v)\in E$.
+ \item the \emph{sensing matrix} is given by the cascades. The \emph{sensing
+ matrix} maps the signal to the observation space. Since the cascades
+ are random time processes, they define a sequence of random
+ observations.
+ \item the \emph{observations} are the set of vertices infected at each time
+ step in each cascade.
+\end{itemize}
+
+Let us define $n\eqdef |V|$. If we assume that all the vertices in $G$ have
+a degree bounded by $d\in\N_+$, then the signal $s$ is $nd$-sparse. Assuming
+that the sensing matrix has good random properties, then we can hope to
+recover $s$ with $O(nd\log n)$ random observations.
+
+Two main goals:
+\begin{itemize}
+ \item understand to which extent the probabilistic cascade models translate
+ into incoherence properties of the sensing matrix and allow for an
+ accurate reconstruction of the unknown graph with a small number of
+ observations.
+ \item compare the empirical performance of this compressed sensing approach
+ to prior graph reconstruction approaches.
+\end{itemize}
+
+\begin{remark}
+\begin{itemize}
+ \item It seems that we will need a \emph{complete} probabilistic model for
+ the cascades. By complete I mean that we need to specify a joint
+ distribution on the source of the cascade (\emph{e.g.} the source is
+ chosen uniformly at random) and on the cascade diffusion process. Such
+ a model is probably not very realistic, but is necessary to obtain
+ theoretical guarantees. The unrealism of the model is not a problem
+ \emph{per se}; the eventual validity of this approach will be
+ experimental anyway.
+ \item It seems likely that we will want the rows of the sensing matrix to
+ be independent realizations of the same distribution. However,
+ \textbf{this will never be the case} if we have one row for each time
+ step of each cascade: the rows within the same cascade are obviously
+ highly correlated. We might have to \emph{aggregate} them (for example
+ through concatenation?) in a sensible way to have with one row per
+ cascade.
+ \item Something which makes this problem very different from the usual
+ compressed sensing problems is that the measurements depend on the
+ signal we wish to recover: cascades propagates along the edges
+ contained in the signal. More precisely, not only the rows of the
+ sensing matrix are correlated (if we do not aggregate them by cascade)
+ but they are correlated through the signal. Is this a problem? I don't
+ know, this looks scary though.
+ \item A slightly more general model which we will need in the independent
+ cascade case considers a signal $s\in[0,1]^{V\times V}$. We can always
+ go from this model to the binary case by first recovering the
+ non-binary $s$ and then applying a thresholding procedure. But if we
+ only care about recovering the support of $s$, can we do better? (TO
+ DO: read stuff about support recovery).
+\end{itemize}
+\end{remark}
+
+Previous stuff:
\begin{itemize}
\item Introduction
\item Simple graphs (cycle, star, path, tree?)