diff options
Diffstat (limited to 'paper/sections/model.tex')
| -rw-r--r-- | paper/sections/model.tex | 57 |
1 files changed, 43 insertions, 14 deletions
diff --git a/paper/sections/model.tex b/paper/sections/model.tex index 4241f93..43de785 100644 --- a/paper/sections/model.tex +++ b/paper/sections/model.tex @@ -127,6 +127,22 @@ Defining $\Theta_{i,j} \defeq \log(1-p_{i,j})$, this can be rewritten as: Therefore, the independent cascade model is a Generalized Linear Cascade model with inverse link function $f : z \mapsto 1 - e^z$. +Note that interpreting it in the case of the Independent Cascade Model requires +one more step. Indeed, to cast it as a generalized linear cascade model, we had +to perform the transformation $\Theta_{i,j} = \log(1-p_{i,j})$, where $p_{i,j}$ +are the infection probabilities. Fortunately, if we estimate $p_j$ through +$\hat{p}_{j} = \hat{\theta}_j$, a bound on $\|\hat\theta - \theta^*\|_2$ +directly implies a bound on $\|\hat p - p^*\|_2$: + +\begin{lemma} + $\|\hat{\theta} - \theta^* \|_2 \geq \|\hat{p} - p^*\|_2$. +\end{lemma} +\begin{proof} +Using the inequality $\forall x>0, \; \log x \geq 1 - \frac{1}{x}$, we have +$|\log (1 - p) - \log (1-p')| \geq \max(1 - \frac{1-p}{1-p'}, +1 - \frac{1-p'}{1-p}) \geq \max( p-p', p'-p)$. +\end{proof} + \subsubsection{The Linear Voter Model} In the Linear Voter Model, nodes can be either \emph{red} or \emph{blue}. @@ -150,29 +166,42 @@ with inverse link function $f: z \mapsto z$. \subsubsection{Discretization of Continous Model} -Consider the continuous-time independent cascade model with exponential -transmission function (CICE) of~\cite{GomezRodriguez:2010, Abrahao:13, -Daneshmand:2014} where time is binned into intervals of length $\epsilon$. Let -$X^k$ be the set of nodes `infected' before or during the $k^{th}$ time -interval. Note that contrary to the discrete-time independent cascade model, -$X^k_i = 1 \implies X^{k+1}_i = 1$. Let $\exp(p)$ be an -exponentially-distributed random variable of parameter $p$. By the memoryless -property of the exponential, if we consider that $\forall i,\Theta_{i,j} = 0 if -(i,j) \notin E$, then if $X^k_j \neq 1$: +Anoter motivation for the Generalized Linear Cascade model is that it captures +the time-discretized formulation of the well-studied continuous-time +independent cascade model with exponential transmission function (CICE) +of~\cite{GomezRodriguez:2010, Abrahao:13, Daneshmand:2014}. Suppose that time +is binned into intervals of length $\epsilon$ and let $X^k$ be the set of nodes +`infected' before or during the $k^{th}$ time interval. Note that contrary to +the discrete-time independent cascade model, $X^k_i = 1 \implies X^{k+1}_i = +1$. Let $\exp(p)$ be an exponentially-distributed random variable of parameter +$p$. By the memoryless property of the exponential, if $X^k_j \neq 1$: \begin{align*} \mathbb{P}(X^{k+1}_j = 1 | X^k) & = \mathbb{P}(\min_{i \in {\cal N}(j)} \exp(p_{i,j}) \leq \epsilon) \\ & = \mathbb{P}(\exp( \sum_{i=1}^m \Theta_{i,j} X^t_i) \leq \epsilon) \\ & = 1 - e^{- \epsilon \cdot \theta_j \cdot X^t} \end{align*} -This formulation is consistent when $X^k_j = 1$ by considering the dummy -variables $\forall i, \Theta_{i,i} = +\infty$. Therefore, the -$\epsilon-$binned-process of the continuous-time model with exponential -transmission function is a Generalized Linear Cascade model with inverse link -function $f:z\mapsto 1-e^{-\epsilon\cdot z}$. +Note that here we implicitly let $\Theta_{i,j} = 0$ if $(i,j)$ is not a +directed edge of the graph. Note that this formulation is also consistent when +$X^k_j = 1$ by considering the dummy variables $\forall i, \Theta_{i,i} = ++\infty$. Therefore, the $\epsilon-$binned-process of the continuous-time +model with exponential transmission function is a Generalized Linear Cascade +model with inverse link function $f:z\mapsto 1-e^{-\epsilon\cdot z}$. \subsubsection{Logistic Cascades} +Consider the specific case of ``logistic cascades'' (where $f$ is the logistic +function). This model captures the idea that there is a threshold after which, +each additional neighbor's influence contribution, becomes significant. In this +way, logistic cascades can be thought of approximating the more-commonly +studied Linear Threshold model TODO: CITE. As we will see later in the +analysis, the Hessian of our optimization program simplifies to +$\nabla^2\mathcal{L}(\theta^*) = \frac{1}{|\mathcal{T}|}XX^T$ where $X$ is the +observation matrix $[x^1 \ldots x^\mathcal{|T|}]$. The (RE)-condition is then +equivalent to the assumption made in the Lasso analysis of \cite{bickel:2009}. + + + % \subsection{The Linear Threshold Model} % In the Linear Threshold Model, each node $j\in V$ has a threshold $t_j$ from |
