aboutsummaryrefslogtreecommitdiffstats
path: root/finale/sections/model.tex
diff options
context:
space:
mode:
authorThibaut Horel <thibaut.horel@gmail.com>2015-12-08 16:38:24 -0500
committerThibaut Horel <thibaut.horel@gmail.com>2015-12-08 16:38:24 -0500
commitac70a17ecab6c0efb26166c953b2f14fba2d2b87 (patch)
tree3a0aa6f283b2185dcabea1bc7cc5f390930a826a /finale/sections/model.tex
parent3bcc519d2fcfdad6c57d50da13e3227fa5f0c13a (diff)
downloadcascades-ac70a17ecab6c0efb26166c953b2f14fba2d2b87.tar.gz
Final report template
Diffstat (limited to 'finale/sections/model.tex')
-rw-r--r--finale/sections/model.tex68
1 files changed, 68 insertions, 0 deletions
diff --git a/finale/sections/model.tex b/finale/sections/model.tex
new file mode 100644
index 0000000..cb8b699
--- /dev/null
+++ b/finale/sections/model.tex
@@ -0,0 +1,68 @@
+The GLC model is described over a directed graph $G = (V, \Theta)$. Denoting by
+$k=|V|$ the number of nodes in the graph, $\Theta\in\R_{+}^{k\times k}$ is the
+matrix of edge weights. Note that $\Theta$ implicitly defines the edge set $E$ of
+the graph through the following equivalence:
+\begin{displaymath}
+ (u,v)\in E\Leftrightarrow \Theta_{u,v} > 0,\quad
+ (u,v)\in V^2
+\end{displaymath}
+
+The time is discretized and indexed by a variable $t\in\N$. The nodes can be in
+one of three states: \emph{susceptible}, \emph{infected} or \emph{immune}.
+Let us denote by $S_t$ the set of nodes susceptible at the beginning of time
+step $t\in\N$ and by $I_t$ the set of nodes who become infected at this time
+step. The following dynamics:
+\begin{displaymath}
+ S_0 = V,\quad S_{t+1} = S_t \setminus I_t
+\end{displaymath}
+expresses that the nodes infected at a time step are no longer susceptible
+starting from the next time step (they become part of the immune nodes).
+
+The dynamics of $I_t$ are described by a random Markovian process. Let us
+denote by $x_t$ the indicator vector of $I_t$ at time step $t\in\N$. $x_0$ is
+drawn from a \emph{source distribution} $p_s:\{0,1\}^n\to[0,1]$. For $t\geq 1$,
+we have:
+\begin{equation}
+ \label{eq:markov}
+ \forall i\in S_t,\quad
+ \P\big(x_{i}^{t} = 1\,|\, x^{t-1}\big) = f(\bt_i\cdot x^{t-1})
+\end{equation}
+where $\bt_i$ is the $i$th column of $\Theta$. The function $f:\R\to[0,1]$ can
+be interpreted as the inverse link function of the model. Finally, the
+transitions in \cref{eq:markov} occur independently for each $i$. A cascade
+continues until no infected nodes remains.
+
+We refer the reader to \cite{pouget} for a more complete description of the
+model and examples of common contagion models which can be interpreted as
+specific instances of the GLC model.
+
+It follows from Section 2, that a source distribution $p_s$ and
+\cref{eq:markov} together completely specify the distribution $p$ of a cascade
+$\mathbf{x} = (x_t)_{t\geq 0}$:
+\begin{equation}
+ \label{eq:dist}
+ \mathcal{L}_{\Theta}(\bx)
+ = p_s(x^0)\prod_{\substack{t\geq 1 \\ i\in S_t}}
+ f(\bt_i\cdot x^{t-1})^{x^t_i}\big(1-f(\theta_i\cdot x^{t-1})\big)^{1-x_i^t}
+\end{equation}
+
+\paragraph{MLE estimation.}
+It follows from the form of \cref{eq:dist} that the log-likelihood of $\Theta$
+given cascade data can be written as the sum of $k$ terms, each term $i$ only
+depending on $\bt_i$. Hence, each $\bt_i$ can be learnt separately by solving
+a node-specific optimization problem. Specifically, for a given node $i$, if we
+concatenate together all the time steps $t$ where this node was susceptible and
+write $y^t = x^{t+1}_i$, \emph{i.e} whether or not the node became infected at
+the next time step, the MLE estimator for $\bt_i$ is obtained by solving the
+following optimization problem:
+\begin{equation}
+ \label{eq:mle}
+ \hat{\theta}\in \argmax_\theta \sum_{t} y^t\log f(\theta\cdot x^t)
+ + (1-y^t) \log \big(1 - f(\theta\cdot x^t)\big)
+\end{equation}
+It is interesting to note that at the node-level, doing MLE inference for the
+GLC model is exactly amounts to fitting a Generalized Linear Model. When $f$ is
+log-concave as is the case in most examples of GLC models, then the above
+optimization problem becomes a convex optimization problem which can be solved
+exactly and efficiently. The code to perform MLE estimation can be found in the
+appendix, file \textsf{mle.py}.