1 files changed, 31 insertions, 27 deletions
diff --git a/finale/project_proposal.tex b/finale/project_proposal.tex
index 55d44e6..830d63a 100644
--- a/finale/project_proposal.tex
+++ b/finale/project_proposal.tex
@@ -11,38 +11,42 @@
 \subsection*{The Network Inference problem}
 
 The network inference problem concerns itself with learning the edges and the
-edge weights of an unknown network. Each edge weight is a parameter to be
-estimated. The information at our disposal is the result of a cascade process on
-the network. Here, we will focus on the Generalized Linear Cascade (GLC) model
-introduced in~\cite{}.
+edge weights of an unknown network. Each edge weight $\theta_e, e\in E$ is a
+parameter to be estimated. The information at our disposal is the result of a
+cascade process on the network. Here, we will focus on the Generalized Linear
+Cascade (GLC) model introduced in~\cite{} presented below.
 
-\paragraph{Short description of the GLC model}
+\paragraph{The GLC model}
 
-Let $X^t$ be the indicator variable of ``contagious nodes'' at time step $t$.
-A \emph{generalized linear cascade model} is a cascade model such that for each
-susceptible node $j$ in state $s$ at time step $t$, the probability of $j$
-becoming ``contagious'' at time step $t+1$ conditioned on $X^t$ is a Bernoulli
-variable of parameter $f(\theta_j \cdot X^t)$:
+Consider a graph $\mathcal{G} = (N, E)$, where nodes $N$ can be one of three
+possible states: susceptible, infected, and immune.  Let $X^t$ be the indicator
+variable of ``infected nodes'' at time step $t$.  A \emph{generalized linear
+cascade model} is a cascade model such that for each ``susceptible'' node $i$ at
+time step $t$, the probability of $i$ becoming ``infected'' at time step $t+1$
+conditioned on $X^t$ is a Bernoulli variable of parameter $f(\theta_i
+\cdot X^t)$:
 \begin{equation}
-    \label{eq:glm}
-    \mathbb{P}(X^{t+1}_j = 1|X^t)
-    = f(\theta_j \cdot X^t)
+    \mathbb{P}(X^{t+1}_i = 1|X^t) = f(\theta_i \cdot X^t)
 \end{equation}
-where $f: \mathbb{R} \rightarrow [0,1]$
+where $f: \mathbb{R} \rightarrow [0,1]$. There is no a priori rule for the
+transition from and to the immune and susceptible state, but the most commonly
+studied GLC model, known as the Independent Cascade model, assumes that all
+infected nodes at time step $T$ are immune for all time steps $t \geq T+1$.
+Under this assumption, a cascade ends when no ``infected'' nodes remain.
 
-\paragraph{Problem statement}
-Assume that $X_t \sim \mathcal{D}$, where $D$ is the GLC process defined above.
-Identify the parents and estimate the edge weights for each node $i$ in the
-network $\mathcal{N}$. This can be solved using maximum likelihood:
-$$\log \mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n) = \frac{1}{|{\cal T}_i|}
-\sum_{t\in {\cal T}_i } x_i^{t+1}\log f(\theta_i\cdot x^{t}) + (1 -
-x_i^{t+1})\log\big(1-f(\theta_i \cdot x^t)\big)$$
-
-In particular, it is known that an approximation of the variance for $\hat
-\beta$ is given by the inverse of the information matrix, which is given by:
-$$blabla$$
+The designer is also free to decide the initial state of graph $\mathcal{G}$ at
+the beginning of every cascade process. For simplicity, we will suppose that, at
+the start of each cascade, one node is chosen uniformly at random to be
+``infected'', and that all other nodes are in the ``susceptible'' state.
 
-In the case of logistic regression. In the case of the independent cascade
-model.
+\bigskip
+Prior work has focused on recovering the edges of the graph, which can be
+formalized as finding the support of the edge weight vector $\theta$. This almost
+always relies on solving the following maximum likelihood program:
+$$ \theta_{i*} \in \arg \max_\theta \log
+\mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n) = \frac{1}{|{\cal T}_i|} \sum_{t\in
+{\cal T}_i } x_i^{t+1}\log f(\theta_i\cdot x^{t}) + (1 -
+x_i^{t+1})\log\big(1-f(\theta_i \cdot x^t)\big)$$
+where $\mathcal{T}_i = \{t: \text{node i is infected at time step } t\}$
 
 \end{document}