aboutsummaryrefslogtreecommitdiffstats
path: root/finale
diff options
context:
space:
mode:
authorThibaut Horel <thibaut.horel@gmail.com>2015-11-06 18:16:17 -0500
committerThibaut Horel <thibaut.horel@gmail.com>2015-11-06 18:16:17 -0500
commite482995948fd81f4cdd8cc5f587c223114fbb844 (patch)
tree4eee4738894c059bc9bebda93a867f3f653329cf /finale
parented0a93f509073e9c4159a21a07a24d022d931099 (diff)
downloadcascades-e482995948fd81f4cdd8cc5f587c223114fbb844.tar.gz
Pass over 4.1
Diffstat (limited to 'finale')
-rw-r--r--finale/mid_report.tex43
1 files changed, 31 insertions, 12 deletions
diff --git a/finale/mid_report.tex b/finale/mid_report.tex
index b7d9608..67aedd0 100644
--- a/finale/mid_report.tex
+++ b/finale/mid_report.tex
@@ -238,7 +238,7 @@ adversarially.}
\begin{example}[Lack of information]
Let us consider the graph in \cref{fig:ident} and the source distribution
- $p_s$ defined by $p_s\big((0, 1, 0)\big) = 1$. In other words, node $2$ is
+ $p_s$ defined by $p_s\big((0, 1, 0)\big) = 1$. In other words, node $2$ is
always the only infected node at $t=0$. Then, it is clear that all
casacades die at $t=1$ at which node 3 becomes infected with probability
$f(\theta_{2,3})$. This probability does not depend on $\theta_{1,3}$.
@@ -254,7 +254,7 @@ Let us consider again the graph in \cref{fig:ident} and let us consider the
case where the source distribution is such that $p_s\big((1,1,0)\big) = 1$,
\emph{i.e} nodes 1 and 2 are always jointly infected at time $t=0$. Then it is
clear that cascades will all die at time $t=1$ where node 3 becomes infected
-with probability $f(\theta_{1,3} + \theta_{2,3})$. This implies that all
+with probability $f(\theta_{1,3} + \theta_{2,3})$. This implies that all
matrices $\Theta$ which are compatible with the graph in \cref{fig:ident}
(defining the same edge set) and for which $\theta_{1,3} + \theta_{2,3} = c$
for some $c\in \R_+$ define the same cascade probability distribution.
@@ -321,7 +321,7 @@ distributions which satisfy the assumptions of the proposition include a single
source chosen uniformly at random among the vertices of the graph, or a source
where each node in $G$ is included independently with some probability $p>0$.
-Finally, Assumption 1 can be relaxed to a weaker assumption. For this weaker
+Finally, assumption 1 can be relaxed to a weaker assumption. Under this weaker
assumption, Proposition 2. becomes an exact characterization of the
identifiability of the model. Again, we defer the details to the final version
of this report.
@@ -332,16 +332,35 @@ of this report.
\subsection{Maximum-Likelihood Estimation}
\label{MLEsec}
-Because transitions occur independently for each node $i$, the log-likelihood
-is a sum of $k$ terms, each term $i$ only depending on $\bt_i$. Hence, each
-$\bt_i$ can be learnt separately by solving a node-specific optimization
-problem:
+It follows from the form of \cref{eq:dist} that the log-likelihood of $\Theta$
+given cascade data can be written as the sum of $k$ terms, each term $i$ only
+depending on $\bt_i$. Hence, each $\bt_i$ can be learnt separately by solving
+a node-specific optimization problem. Specifically, for a given node $i$, if we
+concatenate together all the time steps $t$ where this node was susceptible and
+write $y^t = x^{t+1}_i$, \emph{i.e} whether or not the node became infected at
+the next time step, the MLE estimator for $\bt_i$ is obtained by solving the
+following optimization problem:
+\begin{equation}
+ \label{eq:mle}
+ \hat{\theta}\in \argmax_\theta \sum_{t} y^t\log f(\theta\cdot x^t)
+ + (1-y^t) \log \big(1 - f(\theta\cdot x^t)\big)
+\end{equation}
+It is interesting to note that at the node-level, doing MLE inference for the
+GLC model is exactly amounts to fitting a Generalized Linear Model. When $f$ is
+log-concave as is the case in most examples of GLC models, then the above
+optimization problem becomes a convex optimization problem which can be solved
+exactly and efficiently. The code to perform MLE estimation can be found in the
+appendix, file \textsf{mle.py}.
+
+Furthermore, as usual for an MLE estimator, we obtain that $\hat{\theta}$ is
+asymptotically consistent under mild assumptions.
-We assume $f$ log-concave to have a concave optimization problem. Depending on
-the model, $\theta_{i,j}$ can either be restricted to live in a compact subset
-of $\R_+$. Otherwise, we need to assume that $\lim_{z\to\infty} f(z) = 1$.
-Then, we can apply standard results on consistency and asymptotic normality of
-MLE estimator.
+\begin{proposition}
+ Under the assumptions of Proposition 2., and further assuming that
+ $\lim_{z\to\infty} f(z) = 1$, the optimization program \eqref{eq:mle} has
+ a unique solution $\hat{\theta}$ which is an asymptotically consistent
+ estimator of $\bt_i$.
+\end{proposition}
\subsection{Bayesian Approach}