aboutsummaryrefslogtreecommitdiffstats
path: root/paper/sections/model.tex
diff options
context:
space:
mode:
authorjeanpouget-abadie <jean.pougetabadie@gmail.com>2015-05-07 13:53:36 -0400
committerjeanpouget-abadie <jean.pougetabadie@gmail.com>2015-05-07 13:53:36 -0400
commit814cd0f9d14cfb98f013097d19c210c16c1a195b (patch)
tree53fc1ec1776ab168827bd98a68464c7daf882612 /paper/sections/model.tex
parentdea98c581c95c4a143c8f2fe7a59c902a798024e (diff)
downloadcascades-814cd0f9d14cfb98f013097d19c210c16c1a195b.tar.gz
restructuration
Diffstat (limited to 'paper/sections/model.tex')
-rw-r--r--paper/sections/model.tex57
1 files changed, 43 insertions, 14 deletions
diff --git a/paper/sections/model.tex b/paper/sections/model.tex
index 4241f93..43de785 100644
--- a/paper/sections/model.tex
+++ b/paper/sections/model.tex
@@ -127,6 +127,22 @@ Defining $\Theta_{i,j} \defeq \log(1-p_{i,j})$, this can be rewritten as:
Therefore, the independent cascade model is a Generalized Linear Cascade model
with inverse link function $f : z \mapsto 1 - e^z$.
+Note that interpreting it in the case of the Independent Cascade Model requires
+one more step. Indeed, to cast it as a generalized linear cascade model, we had
+to perform the transformation $\Theta_{i,j} = \log(1-p_{i,j})$, where $p_{i,j}$
+are the infection probabilities. Fortunately, if we estimate $p_j$ through
+$\hat{p}_{j} = \hat{\theta}_j$, a bound on $\|\hat\theta - \theta^*\|_2$
+directly implies a bound on $\|\hat p - p^*\|_2$:
+
+\begin{lemma}
+ $\|\hat{\theta} - \theta^* \|_2 \geq \|\hat{p} - p^*\|_2$.
+\end{lemma}
+\begin{proof}
+Using the inequality $\forall x>0, \; \log x \geq 1 - \frac{1}{x}$, we have
+$|\log (1 - p) - \log (1-p')| \geq \max(1 - \frac{1-p}{1-p'},
+1 - \frac{1-p'}{1-p}) \geq \max( p-p', p'-p)$.
+\end{proof}
+
\subsubsection{The Linear Voter Model}
In the Linear Voter Model, nodes can be either \emph{red} or \emph{blue}.
@@ -150,29 +166,42 @@ with inverse link function $f: z \mapsto z$.
\subsubsection{Discretization of Continous Model}
-Consider the continuous-time independent cascade model with exponential
-transmission function (CICE) of~\cite{GomezRodriguez:2010, Abrahao:13,
-Daneshmand:2014} where time is binned into intervals of length $\epsilon$. Let
-$X^k$ be the set of nodes `infected' before or during the $k^{th}$ time
-interval. Note that contrary to the discrete-time independent cascade model,
-$X^k_i = 1 \implies X^{k+1}_i = 1$. Let $\exp(p)$ be an
-exponentially-distributed random variable of parameter $p$. By the memoryless
-property of the exponential, if we consider that $\forall i,\Theta_{i,j} = 0 if
-(i,j) \notin E$, then if $X^k_j \neq 1$:
+Anoter motivation for the Generalized Linear Cascade model is that it captures
+the time-discretized formulation of the well-studied continuous-time
+independent cascade model with exponential transmission function (CICE)
+of~\cite{GomezRodriguez:2010, Abrahao:13, Daneshmand:2014}. Suppose that time
+is binned into intervals of length $\epsilon$ and let $X^k$ be the set of nodes
+`infected' before or during the $k^{th}$ time interval. Note that contrary to
+the discrete-time independent cascade model, $X^k_i = 1 \implies X^{k+1}_i =
+1$. Let $\exp(p)$ be an exponentially-distributed random variable of parameter
+$p$. By the memoryless property of the exponential, if $X^k_j \neq 1$:
\begin{align*}
\mathbb{P}(X^{k+1}_j = 1 | X^k) & = \mathbb{P}(\min_{i \in {\cal N}(j)}
\exp(p_{i,j}) \leq \epsilon) \\
& = \mathbb{P}(\exp( \sum_{i=1}^m \Theta_{i,j} X^t_i) \leq \epsilon) \\
& = 1 - e^{- \epsilon \cdot \theta_j \cdot X^t}
\end{align*}
-This formulation is consistent when $X^k_j = 1$ by considering the dummy
-variables $\forall i, \Theta_{i,i} = +\infty$. Therefore, the
-$\epsilon-$binned-process of the continuous-time model with exponential
-transmission function is a Generalized Linear Cascade model with inverse link
-function $f:z\mapsto 1-e^{-\epsilon\cdot z}$.
+Note that here we implicitly let $\Theta_{i,j} = 0$ if $(i,j)$ is not a
+directed edge of the graph. Note that this formulation is also consistent when
+$X^k_j = 1$ by considering the dummy variables $\forall i, \Theta_{i,i} =
++\infty$. Therefore, the $\epsilon-$binned-process of the continuous-time
+model with exponential transmission function is a Generalized Linear Cascade
+model with inverse link function $f:z\mapsto 1-e^{-\epsilon\cdot z}$.
\subsubsection{Logistic Cascades}
+Consider the specific case of ``logistic cascades'' (where $f$ is the logistic
+function). This model captures the idea that there is a threshold after which,
+each additional neighbor's influence contribution, becomes significant. In this
+way, logistic cascades can be thought of approximating the more-commonly
+studied Linear Threshold model TODO: CITE. As we will see later in the
+analysis, the Hessian of our optimization program simplifies to
+$\nabla^2\mathcal{L}(\theta^*) = \frac{1}{|\mathcal{T}|}XX^T$ where $X$ is the
+observation matrix $[x^1 \ldots x^\mathcal{|T|}]$. The (RE)-condition is then
+equivalent to the assumption made in the Lasso analysis of \cite{bickel:2009}.
+
+
+
% \subsection{The Linear Threshold Model}
% In the Linear Threshold Model, each node $j\in V$ has a threshold $t_j$ from