diff options
Diffstat (limited to 'notes')
| -rw-r--r-- | notes/defs.tex | 17 | ||||
| -rw-r--r-- | notes/formalisation.pdf | bin | 179913 -> 194430 bytes | |||
| -rw-r--r-- | notes/formalisation.tex | 82 |
3 files changed, 90 insertions, 9 deletions
diff --git a/notes/defs.tex b/notes/defs.tex new file mode 100644 index 0000000..ca5dd55 --- /dev/null +++ b/notes/defs.tex @@ -0,0 +1,17 @@ +\newtheorem{theorem}{Theorem}[section] +\newtheorem{lemma}[theorem]{Lemma} +\newtheorem{proposition}[theorem]{Proposition} +\newtheorem{corollary}[theorem]{Corollary} +\theoremstyle{definition} +\newtheorem*{remark}{Remark} + +\DeclareMathOperator{\E}{\mathbb{E}} +\let\P\relax +\DeclareMathOperator{\P}{\mathbb{P}} +\newcommand{\ex}[1]{\E\left[#1\right]} +\newcommand{\prob}[1]{\P\left[#1\right]} +\newcommand{\inprod}[1]{\left\langle #1 \right\rangle} +\newcommand{\R}{\mathbb{R}} +\newcommand{\N}{\mathbb{N}} +\newcommand{\eps}{\varepsilon} +\newcommand{\eqdef}{\mathbin{\stackrel{\rm def}{=}}} diff --git a/notes/formalisation.pdf b/notes/formalisation.pdf Binary files differindex 02e0a47..75d5480 100644 --- a/notes/formalisation.pdf +++ b/notes/formalisation.pdf diff --git a/notes/formalisation.tex b/notes/formalisation.tex index 53786d3..72e7cb9 100644 --- a/notes/formalisation.tex +++ b/notes/formalisation.tex @@ -1,13 +1,10 @@ \documentclass[10pt]{article} -\usepackage{fullpage, amsmath, amssymb, bbm, amsthm} - -\newtheorem{theorem}{Theorem}[section] -\newtheorem{lemma}[theorem]{Lemma} -\newtheorem{proposition}[theorem]{Proposition} -\newtheorem{corollary}[theorem]{Corollary} -\newenvironment{remark}[1][Remark]{\begin{trivlist} -\item[\hskip \labelsep {\bfseries #1}]}{\end{trivlist}} - +\usepackage[utf8x]{inputenc} +\usepackage{amsmath, amssymb, bbm, amsthm} +\usepackage[pagebackref=true,breaklinks=true,colorlinks=true,backref=false]{hyperref} +\usepackage[capitalize, noabbrev]{cleveref} +\usepackage{microtype} +\input{defs} \date{} \title{Cracking Cascades: Applying Sparse Recovery to Graph Reconstruction} @@ -18,6 +15,73 @@ \section{Preliminaries: What is a measurement?} +There is an unknown graph $G=(V,E)$ that we wish to reconstruct by observing +information cascades. An information cascade is a random discrete-time process +where we get to observe the set of vertices infected at each time step. + +The key contribution is to formulate this problem as a sparse recovery problem +akin to those found in the compressed sensing literature. The parallel between +the two problems is as follows: +\begin{itemize} + \item the unknown \emph{signal} $s$ is a vector in $\{0,1\}^{V\times V}$ + with $s_{u,v} = 1$ iff. $(u,v)\in E$. + \item the \emph{sensing matrix} is given by the cascades. The \emph{sensing + matrix} maps the signal to the observation space. Since the cascades + are random time processes, they define a sequence of random + observations. + \item the \emph{observations} are the set of vertices infected at each time + step in each cascade. +\end{itemize} + +Let us define $n\eqdef |V|$. If we assume that all the vertices in $G$ have +a degree bounded by $d\in\N_+$, then the signal $s$ is $nd$-sparse. Assuming +that the sensing matrix has good random properties, then we can hope to +recover $s$ with $O(nd\log n)$ random observations. + +Two main goals: +\begin{itemize} + \item understand to which extent the probabilistic cascade models translate + into incoherence properties of the sensing matrix and allow for an + accurate reconstruction of the unknown graph with a small number of + observations. + \item compare the empirical performance of this compressed sensing approach + to prior graph reconstruction approaches. +\end{itemize} + +\begin{remark} +\begin{itemize} + \item It seems that we will need a \emph{complete} probabilistic model for + the cascades. By complete I mean that we need to specify a joint + distribution on the source of the cascade (\emph{e.g.} the source is + chosen uniformly at random) and on the cascade diffusion process. Such + a model is probably not very realistic, but is necessary to obtain + theoretical guarantees. The unrealism of the model is not a problem + \emph{per se}; the eventual validity of this approach will be + experimental anyway. + \item It seems likely that we will want the rows of the sensing matrix to + be independent realizations of the same distribution. However, + \textbf{this will never be the case} if we have one row for each time + step of each cascade: the rows within the same cascade are obviously + highly correlated. We might have to \emph{aggregate} them (for example + through concatenation?) in a sensible way to have with one row per + cascade. + \item Something which makes this problem very different from the usual + compressed sensing problems is that the measurements depend on the + signal we wish to recover: cascades propagates along the edges + contained in the signal. More precisely, not only the rows of the + sensing matrix are correlated (if we do not aggregate them by cascade) + but they are correlated through the signal. Is this a problem? I don't + know, this looks scary though. + \item A slightly more general model which we will need in the independent + cascade case considers a signal $s\in[0,1]^{V\times V}$. We can always + go from this model to the binary case by first recovering the + non-binary $s$ and then applying a thresholding procedure. But if we + only care about recovering the support of $s$, can we do better? (TO + DO: read stuff about support recovery). +\end{itemize} +\end{remark} + +Previous stuff: \begin{itemize} \item Introduction \item Simple graphs (cycle, star, path, tree?) |
