1 files changed, 95 insertions, 69 deletions
diff --git a/finale/project_proposal.tex b/finale/project_proposal.tex
index 5e0d21c..c8607e6 100644
--- a/finale/project_proposal.tex
+++ b/finale/project_proposal.tex
@@ -1,96 +1,122 @@
-\documentclass[8pt]{article}
+\documentclass[10pt]{article}
+\usepackage[utf8x]{inputenc}
 \usepackage{fullpage, amsmath, amssymb, amsthm}
+\usepackage{bbm}
+\DeclareMathOperator*{\argmax}{argmax}
 
-\title{Regression Analysis with Network data}
-\author{Jean Pouget-Abadie, Thibaut Horel}
-\date{}
+\title{Regression Analysis with Network Data}
+\author{Thibaut Horel \and Jean Pouget-Abadie}
 
 \begin{document}
 \maketitle
 
-\subsection*{The Network Inference problem}
+\section{Background: Cascade Models and Network Inference}
 
-The network inference problem concerns itself with learning the edges and the
-edge weights of an unknown network. Each edge weight $\theta_e, e\in E$ is a
-parameter to be estimated. The information at our disposal is the result of a
-cascade process on the network. Here, we will focus on the Generalized Linear
-Cascade (GLC) model introduced in~\cite{paper} presented below.
+The Network Inference Problem concerns itself with learning the edges and the
+edge weights of an unknown network on the set of vertices $V$. Formally, the
+set of parameters to learn is a matrix $\theta\in\mathbb{R}^{V\times V}$ where
+$\theta_{u,v}$ is the weight corresponding to the pair of vertices $(u, v)$.
+The observations at our disposal for this learning task are samples drawn for
+a probabilistic cascade model. In this project, we will focus on the
+Generalized Linear Cascade (GLC) model introduced in~\cite{pouget} that we
+briefly present below.
 
-\paragraph{The GLC model}
-
-Consider a graph $\mathcal{G} = (N, E)$, where nodes $N$ can be one of three
-possible states: susceptible, infected, and immune.  Let $x^t$ be the indicator
-variable of ``infected nodes'' at time step $t$.  A \emph{generalized linear
-cascade model} is a cascade model such that for each ``susceptible'' node $i$ at
-time step $t$, the probability of $i$ becoming ``infected'' at time step $t+1$
-conditioned on $x^t$ is a Bernoulli variable of parameter $f(\theta_i
-\cdot X^t)$:
+\paragraph{GLC model.} The nodes in $V$ can be in one of three possible states:
+susceptible, infected, and immune.  Let $x^t$ be the indicator variable of
+``infected nodes'' at time step $t$ and $S_t$ be the set of susceptible nodes
+at time step $t$. A \emph{generalized linear cascade model} is a discrete-time
+random process in which the conditional law of $x^{t+1}$ given $x^t$ takes the
+following form: for each ``susceptible'' node $i$ at time step $t$, the
+probability of $i$ becoming ``infected'' at time step $t+1$ conditioned on
+$x^t$ is a Bernoulli variable of parameter $f(\theta_i \cdot x^t)$:
 \begin{equation}
-    \mathbb{P}(x^{t+1}_i = 1|x^t) = f(\theta_i \cdot x^t)
+    \label{eq:model}
+    \mathbb{P}(x^{t+1}_i = 1\,|\,x^t) = f(\theta_i \cdot x^t),
+    \quad i\in S_t
 \end{equation}
-where $f: \mathbb{R} \rightarrow [0,1]$. There is no a priori rule for the
-transition from and to the immune and susceptible state, but the most commonly
-studied GLC model, known as the Independent Cascade model, assumes that all
+for some function $f: \mathbb{R} \rightarrow [0,1]$ and where $\theta_i
+= (\theta_{u, i})_{u\in V}$. There is no a priori rule for the transition from
+and to the immune and susceptible state, but the most commonly studied GLC
+model, known as the Independent Cascade model \cite{Kempe:03}, assumes that all
 infected nodes at time step $t$ are immune for all time steps $t' \geq t+1$.
 Under this assumption, a cascade ends when no ``infected'' nodes remain. The
 designer is also free to decide the initial state of graph $\mathcal{G}$ at the
-beginning of every cascade process. For simplicity, we will suppose that, at the
-start of each cascade, one node is chosen uniformly at random to be
+beginning of every cascade process. For simplicity, we will suppose that, at
+the start of each cascade, one node is chosen uniformly at random to be
 ``infected'', and that all other nodes are in the ``susceptible'' state.
 
-\bigskip
-The network inference problem can therefore be formalized as finding the support
-of the edge weight vector $\theta$ from the observation of ${(x^c_t)}_{c\in
-\mathcal{C}, t\in \mathcal{T}_C}$, where $\mathcal{C}$ indexes the cascade
-number, and $\mathcal{T}_C$ the time step in that cascade. Prior work has
-observed that the network inference problem is decomposable node-by-node
-i.e.~solving for $\beta_{i} = \{\theta_{i,j} | j \in N\}$ for node $i$. This can
-easily be formulated as the following maximum likelihood program:
-$$ \theta_{i} \in \arg \max_\theta \log
-\mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n) = \frac{1}{|{\cal T}_i|} \sum_{t\in
-{\cal T}_i } x_i^{t+1}\log f(\theta_i\cdot x^{t}) + (1 -
-x_i^{t+1})\log\big(1-f(\theta_i \cdot x^t)\big)$$
-where $\mathcal{T}_i = \{\text{node i is ``susceptible'' at time step } t\}$.
-The reader will note the above problem boils down to fitting a generalized
-linear model. In particular, if $f$ is the sigmoid function, we are performing
+\paragraph{Inference.} A cascade is the collection of vectors $x^t$ for all
+time steps until the cascade ends. Given a sample of independent cascades, the
+parameters $\theta$ can be estimated using maximum likelihood estimation. Prior
+work has observed that the MLE problem can be decomposed node-by-node
+\emph{i.e}, $\theta_i$ can be learned independently for each node $i$ by
+solving the following optimization program:
+\begin{equation}
+    \label{eq:mle}
+    \hat{\theta}_{i} \in \argmax_\theta
+    \log \mathcal{L}_i(\theta_i) =
+    \sum_{t:i\in S_t} x_i^{t+1}\log f(\theta_i\cdot x^{t})
+    + (1 - x_i^{t+1})\log\big(1-f(\theta_i \cdot x^t)\big)
+\end{equation}
+The reader will observe that the above problem reduces to fitting a generalized
+linear model. It is important to note that even though the cascades are
+independent, the vectors $x^t$ observed within a cascade are not independent
+since they follow the Markov process described in \eqref{eq:model}. In the
+particular case where $f$ is the sigmoid function, we are performing
 logistic regression:
-$$
+\begin{displaymath}
 \begin{cases}
-y_i^* = \theta_i \cdot x_i + \epsilon \text{~where~} \epsilon\sim
-Logistic(0, 1) \text{~and~} x = {(x^t)}_{t \in \mathcal{T}_i}\\
-y_i = [y_i^* > 0] \text{~where~} y_i^t = x_i^{t+1}
+    y^t = \theta_i \cdot x^t + \varepsilon^t
+    &\text{where } \varepsilon^t\sim \text{Logistic}(0, 1)\\
+    \tilde{y}^t = \mathbf{1}\{y^t > 0\} &\text{where } x_i^{t+1} = \tilde{y}^t
 \end{cases}
-$$
+\end{displaymath}
 
-\subsection*{Objectives}
+\section{Problem Statement}
 
+A growing series of work (see for example \cite{Netrapalli:2012,
+Daneshmand:2014, pouget}) has focused on solving the Network Inference Problem
+under different cascade models and obtaining estimators with close-to-optimal
+convergence rates. However, these works often rely on hard-to-interpret
+assumptions which often hide subtle properties of the probabilistic model
+\eqref{eq:model}. The overarching goal of the current project is to analyze the
+fundamental statistical properties of observations coming from model
+\eqref{eq:model} and of the MLE estimator \eqref{eq:mle}. In particular we are
+interested in the following questions:
 \begin{itemize}
-\item Try a Bayesian approach to estimate these parameters. Use the posterior
-predictive distribution to obtain confidence intervals for the edge parameters.
-Validate this with bootstrapping. How does this perform in different networks?
-Can you intuitively link certain node-level/graph-level properties with the
-resulting variance on the estimated parameter?
-\item Do the previous observations correspond with the theoretical result, given
-by the Fisher information matrix: $$\hat \beta \sim \mathcal{N}(\beta,
-I{(\theta)}^{-1})$$ where $I(\theta) = - {\left(\frac{\partial^2\log
-\mathcal{L}}{\partial \theta^2} \right)}^{-1}$
+\item Is the model \eqref{eq:model} identifiable and is the MLE estimator
+    \eqref{eq:mle} consistent? Are there networks which are fundamentally
+    ambiguous and for which the Network Inference Problem is unsolvable?
+\item Use a Bayesian approach to estimate the parameters. Use the posterior
+    predictive distribution to obtain confidence intervals for the edge
+    parameters. Validate this with bootstrapping.
+\item Are there networks which are harder to learn than others? Can the
+    variance of the MLE estimator be related to certain graph theoretic
+    properties of the network?
+\item Do the previous observations match with the theoretical result, given
+by the Fisher information matrix:
+\begin{displaymath}
+    \hat{\theta}_i \sim \mathcal{N}(\theta_i, I{(\theta_i)}^{-1})
+    \quad \text{where}\;
+    I(\theta) = - \left(\frac{\partial^2\log \mathcal{L}_i}{\partial \theta^2}
+    \right)^{-1}
+\end{displaymath}
 \item Are there networks in which the Fisher information matrix is singular?
-What happens to the estimation of $\beta$ in this case?
-\item What if the generative process is generated with a different link
-function? Is there a regularization scheme which can mitigate any bias/exploding
-variance in the estimated parameters?
+What happens to the estimation of $\theta_i$ in this case?
+\item Is the estimator \eqref{eq:mle} robust to the choice of $f$? What if the
+    observations are generated with $f_1$, but \eqref{eq:mle} is solved with
+    $f_2\neq f_1$? Is there a regularization scheme which can mitigate any bias
+    or lack of convergence in the estimated parameters?
 \end{itemize}
 
-\subsection*{Program plan}
-
-The project will be a series of simulations to answer each of the above
-questions? When possible, we will try to explain the results found in the
-simulation with a simplified analysis on toy-networks. Thibaut and I have worked
-together in the past, and have kept our contributions balanced.
+\paragraph{Roadmap.} We plan to answer the above question by a mixture of
+a theoretical-based and a simulation-based approaches. We expect some of these
+questions to be hard from the theoretical standpoint. In those cases, the
+hardness can be mitigated by using simulations or focusing on specific cases:
+toy-networks (star graph, cycle, complete graph, etc.) or simpler cascades
+models (the Voter Model for example).
 
-\begin{thebibliography}{1}
-\bibitem{paper} Pouget-Abadie, J. and Horel, T. \emph{Inferring Graphs from
-Cascades: A Sparse Recovery Framework}, ICML 2015
-\end{thebibliography}
+\bibliography{sparse}
+\bibliographystyle{abbrv}
 
 \end{document}