aboutsummaryrefslogtreecommitdiffstats
path: root/finale
diff options
context:
space:
mode:
Diffstat (limited to 'finale')
-rw-r--r--finale/project_proposal.tex43
1 files changed, 29 insertions, 14 deletions
diff --git a/finale/project_proposal.tex b/finale/project_proposal.tex
index 830d63a..c92bb36 100644
--- a/finale/project_proposal.tex
+++ b/finale/project_proposal.tex
@@ -19,34 +19,49 @@ Cascade (GLC) model introduced in~\cite{} presented below.
\paragraph{The GLC model}
Consider a graph $\mathcal{G} = (N, E)$, where nodes $N$ can be one of three
-possible states: susceptible, infected, and immune. Let $X^t$ be the indicator
+possible states: susceptible, infected, and immune. Let $x^t$ be the indicator
variable of ``infected nodes'' at time step $t$. A \emph{generalized linear
cascade model} is a cascade model such that for each ``susceptible'' node $i$ at
time step $t$, the probability of $i$ becoming ``infected'' at time step $t+1$
-conditioned on $X^t$ is a Bernoulli variable of parameter $f(\theta_i
+conditioned on $x^t$ is a Bernoulli variable of parameter $f(\theta_i
\cdot X^t)$:
\begin{equation}
- \mathbb{P}(X^{t+1}_i = 1|X^t) = f(\theta_i \cdot X^t)
+ \mathbb{P}(x^{t+1}_i = 1|x^t) = f(\theta_i \cdot x^t)
\end{equation}
where $f: \mathbb{R} \rightarrow [0,1]$. There is no a priori rule for the
transition from and to the immune and susceptible state, but the most commonly
studied GLC model, known as the Independent Cascade model, assumes that all
-infected nodes at time step $T$ are immune for all time steps $t \geq T+1$.
-Under this assumption, a cascade ends when no ``infected'' nodes remain.
-
-The designer is also free to decide the initial state of graph $\mathcal{G}$ at
-the beginning of every cascade process. For simplicity, we will suppose that, at
-the start of each cascade, one node is chosen uniformly at random to be
+infected nodes at time step $t$ are immune for all time steps $t' \geq t+1$.
+Under this assumption, a cascade ends when no ``infected'' nodes remain. The
+designer is also free to decide the initial state of graph $\mathcal{G}$ at the
+beginning of every cascade process. For simplicity, we will suppose that, at the
+start of each cascade, one node is chosen uniformly at random to be
``infected'', and that all other nodes are in the ``susceptible'' state.
\bigskip
-Prior work has focused on recovering the edges of the graph, which can be
-formalized as finding the support of the edge weight vector $\theta$. This almost
-always relies on solving the following maximum likelihood program:
-$$ \theta_{i*} \in \arg \max_\theta \log
+The network inference problem can therefore be formalized as finding the support
+of the edge weight vector $\theta$ from the observation of ${(x^c_t)}_{c\in
+\mathcal{C}, t\in \mathcal{T}_C}$, where $\mathcal{C}$ indexes the cascade
+number, and $\mathcal{T}_C$ the time step in that cascade. Prior work has
+observed that the network inference problem is decomposable node-by-node
+i.e.~solving for $\beta_{i} = \{\theta_{i,j} | j \in N\}$ for node $i$. This can
+easily be formulated as the following maximum likelihood program:
+$$ \theta_{i} \in \arg \max_\theta \log
\mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n) = \frac{1}{|{\cal T}_i|} \sum_{t\in
{\cal T}_i } x_i^{t+1}\log f(\theta_i\cdot x^{t}) + (1 -
x_i^{t+1})\log\big(1-f(\theta_i \cdot x^t)\big)$$
-where $\mathcal{T}_i = \{t: \text{node i is infected at time step } t\}$
+where $\mathcal{T}_i = \{\text{node i is ``susceptible'' at time step } t\}$.
+The reader will note the above problem boils down to fitting a generalized
+linear model. In particular, if $f$ is the sigmoid function, we are performing
+logistic regression:
+$$
+\begin{cases}
+y_i^* = \theta_i \cdot x^t + \epsilon, \text{~where~} \epsilon\sim
+Logistic(0, 1) \\
+y_i = [y_i^* > 0] \text{~where~} y_i^t = x_i^{t+1}
+\end{cases}
+$$
+
+\subsection*{Objectives}
\end{document}