diff options
| author | Thibaut Horel <thibaut.horel@gmail.com> | 2015-12-08 16:38:24 -0500 |
|---|---|---|
| committer | Thibaut Horel <thibaut.horel@gmail.com> | 2015-12-08 16:38:24 -0500 |
| commit | ac70a17ecab6c0efb26166c953b2f14fba2d2b87 (patch) | |
| tree | 3a0aa6f283b2185dcabea1bc7cc5f390930a826a /finale/sections/model.tex | |
| parent | 3bcc519d2fcfdad6c57d50da13e3227fa5f0c13a (diff) | |
| download | cascades-ac70a17ecab6c0efb26166c953b2f14fba2d2b87.tar.gz | |
Final report template
Diffstat (limited to 'finale/sections/model.tex')
| -rw-r--r-- | finale/sections/model.tex | 68 |
1 files changed, 68 insertions, 0 deletions
diff --git a/finale/sections/model.tex b/finale/sections/model.tex new file mode 100644 index 0000000..cb8b699 --- /dev/null +++ b/finale/sections/model.tex @@ -0,0 +1,68 @@ +The GLC model is described over a directed graph $G = (V, \Theta)$. Denoting by +$k=|V|$ the number of nodes in the graph, $\Theta\in\R_{+}^{k\times k}$ is the +matrix of edge weights. Note that $\Theta$ implicitly defines the edge set $E$ of +the graph through the following equivalence: +\begin{displaymath} + (u,v)\in E\Leftrightarrow \Theta_{u,v} > 0,\quad + (u,v)\in V^2 +\end{displaymath} + +The time is discretized and indexed by a variable $t\in\N$. The nodes can be in +one of three states: \emph{susceptible}, \emph{infected} or \emph{immune}. +Let us denote by $S_t$ the set of nodes susceptible at the beginning of time +step $t\in\N$ and by $I_t$ the set of nodes who become infected at this time +step. The following dynamics: +\begin{displaymath} + S_0 = V,\quad S_{t+1} = S_t \setminus I_t +\end{displaymath} +expresses that the nodes infected at a time step are no longer susceptible +starting from the next time step (they become part of the immune nodes). + +The dynamics of $I_t$ are described by a random Markovian process. Let us +denote by $x_t$ the indicator vector of $I_t$ at time step $t\in\N$. $x_0$ is +drawn from a \emph{source distribution} $p_s:\{0,1\}^n\to[0,1]$. For $t\geq 1$, +we have: +\begin{equation} + \label{eq:markov} + \forall i\in S_t,\quad + \P\big(x_{i}^{t} = 1\,|\, x^{t-1}\big) = f(\bt_i\cdot x^{t-1}) +\end{equation} +where $\bt_i$ is the $i$th column of $\Theta$. The function $f:\R\to[0,1]$ can +be interpreted as the inverse link function of the model. Finally, the +transitions in \cref{eq:markov} occur independently for each $i$. A cascade +continues until no infected nodes remains. + +We refer the reader to \cite{pouget} for a more complete description of the +model and examples of common contagion models which can be interpreted as +specific instances of the GLC model. + +It follows from Section 2, that a source distribution $p_s$ and +\cref{eq:markov} together completely specify the distribution $p$ of a cascade +$\mathbf{x} = (x_t)_{t\geq 0}$: +\begin{equation} + \label{eq:dist} + \mathcal{L}_{\Theta}(\bx) + = p_s(x^0)\prod_{\substack{t\geq 1 \\ i\in S_t}} + f(\bt_i\cdot x^{t-1})^{x^t_i}\big(1-f(\theta_i\cdot x^{t-1})\big)^{1-x_i^t} +\end{equation} + +\paragraph{MLE estimation.} +It follows from the form of \cref{eq:dist} that the log-likelihood of $\Theta$ +given cascade data can be written as the sum of $k$ terms, each term $i$ only +depending on $\bt_i$. Hence, each $\bt_i$ can be learnt separately by solving +a node-specific optimization problem. Specifically, for a given node $i$, if we +concatenate together all the time steps $t$ where this node was susceptible and +write $y^t = x^{t+1}_i$, \emph{i.e} whether or not the node became infected at +the next time step, the MLE estimator for $\bt_i$ is obtained by solving the +following optimization problem: +\begin{equation} + \label{eq:mle} + \hat{\theta}\in \argmax_\theta \sum_{t} y^t\log f(\theta\cdot x^t) + + (1-y^t) \log \big(1 - f(\theta\cdot x^t)\big) +\end{equation} +It is interesting to note that at the node-level, doing MLE inference for the +GLC model is exactly amounts to fitting a Generalized Linear Model. When $f$ is +log-concave as is the case in most examples of GLC models, then the above +optimization problem becomes a convex optimization problem which can be solved +exactly and efficiently. The code to perform MLE estimation can be found in the +appendix, file \textsf{mle.py}. |
