diff options
| author | jeanpouget-abadie <jean.pougetabadie@gmail.com> | 2015-05-12 19:46:27 +0200 |
|---|---|---|
| committer | jeanpouget-abadie <jean.pougetabadie@gmail.com> | 2015-05-12 19:46:27 +0200 |
| commit | e794a29f1d777e65f778aaf2ff9c764ce5155f3f (patch) | |
| tree | 1e8584a4285a5b44066e610ec106ee165630a779 /paper/sections/model.tex | |
| parent | 4110319283b3b5b667d215bd44f23f8cd1a7cf46 (diff) | |
| download | cascades-e794a29f1d777e65f778aaf2ff9c764ce5155f3f.tar.gz | |
more cosmetic changes
Diffstat (limited to 'paper/sections/model.tex')
| -rw-r--r-- | paper/sections/model.tex | 63 |
1 files changed, 31 insertions, 32 deletions
diff --git a/paper/sections/model.tex b/paper/sections/model.tex index acec974..b704b9e 100644 --- a/paper/sections/model.tex +++ b/paper/sections/model.tex @@ -107,9 +107,9 @@ In the independent cascade model, nodes can be either susceptible, contagious or immune. At $t=0$, all source nodes are ``contagious'' and all remaining nodes are ``susceptible''. At each time step $t$, for each edge $(i,j)$ where $j$ is susceptible and $i$ is contagious, $i$ attempts to infect $j$ with -probability $p_{i,j}\in(0,1]$; the infection attempts are mutually independent. +probability $p_{i,j}\in[0,1]$; the infection attempts are mutually independent. If $i$ succeeds, $j$ will become contagious at time step $t+1$. Regardless of -$i$'s success, node $i$ will be immune at time $t+1$. In other words, nodes +$i$'s success, node $i$ will be immune at time $t+1$, such that nodes stay contagious for only one time step. The cascade process terminates when no contagious nodes remain. @@ -130,9 +130,10 @@ Defining $\Theta_{i,j} \defeq \log(1-p_{i,j})$, this can be rewritten as: Therefore, the independent cascade model is a Generalized Linear Cascade model with inverse link function $f : z \mapsto 1 - e^z$. -In Section~\ref{sec:results}, we show bounds on the parameter of the graph. -Fortunately, a bound on $\|\hat\theta - \theta^*\|_2$ directly implies a bound -on $\|\hat p - p^*\|_2$ as shown by the following lemma: +In Section~\ref{sec:results}, we show bounds on the \emph{transformed} +parameters $\Theta$ of the graph. Fortunately, a bound on $\|\hat\theta - +\theta^*\|_2$ directly implies a bound on $\|\hat p - p^*\|_2$ as shown by the +following lemma: \begin{lemma} $\|\hat{\theta} - \theta^* \|_2 \geq \|\hat{p} - p^*\|_2$. @@ -146,15 +147,14 @@ $|\log (1 - p) - \log (1-p')| \geq \max(1 - \frac{1-p}{1-p'}, \subsubsection{The Linear Voter Model} In the Linear Voter Model, nodes can be either \emph{red} or \emph{blue}. -Without loss of generality, we can suppose that the -\emph{blue} nodes are contagious. The parameters of the graph are normalized -such that $\forall i, \ \sum_j \Theta_{i,j} = 1$ and we assume that -$\Theta_{i,i}$ is always non-zero, meaning that all nodes have self-loops. -Each round, every node $j$ independently chooses -one of its neighbors with probability $\Theta_{i,j}$ and adopts their color. The -cascades stops at a fixed horizon time $T$ or if all nodes are of the same color. -If we denote by $X^t$ the indicator variable of the set of blue nodes at time -step $t$, then we have: +Without loss of generality, we can suppose that the \emph{blue} nodes are +contagious. The parameters of the graph are normalized such that $\forall i, \ +\sum_j \Theta_{i,j} = 1$ and we assume that $\Theta_{i,i}$ is always non-zero, +meaning that all nodes have self-loops. Each round, every node $j$ +independently chooses one of its neighbors with probability $\Theta_{i,j}$ and +adopts their color. The cascades stops at a fixed horizon time $T$ or if all +nodes are of the same color. If we denote by $X^t$ the indicator variable of +the set of blue nodes at time step $t$, then we have: \begin{equation} \mathbb{P}\left[X^{t+1}_j = 1 | X^t \right] = \sum_{i=1}^m \Theta_{i,j} X_i^t = \inprod{\Theta_j}{X^t} @@ -176,11 +176,12 @@ the discrete-time independent cascade model, $X^k_j = 1 \implies X^{k+1}_j = 1$. Let $\exp(p)$ be an exponentially-distributed random variable of parameter $p$ -and let $p_{i,j}$ be the rate of transmission along directed edge $(i,j)$ in the -CICE model. By the memoryless property of the exponential, if $X^k_j \neq 1$: +and let $\Theta_{i,j}$ be the rate of transmission along directed edge $(i,j)$ +in the CICE model. By the memoryless property of the exponential, if $X^k_j +\neq 1$: \begin{align*} \mathbb{P}(X^{k+1}_j = 1 | X^k) & = \mathbb{P}(\min_{i \in {\cal N}(j)} - \exp(p_{i,j}) \leq \epsilon) \\ + \exp(\Theta_{i,j}) \leq \epsilon) \\ & = \mathbb{P}(\exp( \sum_{i=1}^m \Theta_{i,j} X^t_i) \leq \epsilon) \\ & = 1 - e^{- \epsilon \Theta_j \cdot X^t} \end{align*} @@ -193,15 +194,14 @@ Generalized Linear Cascade model with inverse link function $f:z\mapsto \subsubsection{Logistic Cascades} \label{sec:logistic_cascades} -Consider the specific case of ``logistic cascades'' (where $f$ is the logistic -function). This model captures the idea that there is a threshold around which -each additional neighbor's contribution becomes significant: logistic -cascades can be thought of approximating the more-commonly studied Linear -Threshold model~\cite{Kempe:03}. As we will see later in the -analysis, the Hessian of our optimization program simplifies in the case of a -logistic inverse link function to $\nabla^2\mathcal{L}(\theta^*) = -\frac{1}{|\mathcal{T}|}XX^T$ where $X$ is the observation matrix $[x^1 \ldots -x^\mathcal{|T|}]$. +``Logistic cascades'' (where the inverse link function $f$ is the logistic +function) capture the idea that there is a threshold around which each +additional neighbor's contribution becomes significant: they constitue a +differentiable approximation to the more-commonly studied Linear Threshold +model~\cite{Kempe:03}. As we will see later in the analysis, the Hessian of our +optimization program simplifies in the case of a logistic inverse link function +to $\nabla^2\mathcal{L}(\theta^*) = \frac{1}{|\mathcal{T}|}XX^T$ where $X$ is +the observation matrix $[x^1 \ldots x^\mathcal{|T|}]$. % \subsection{The Linear Threshold Model} @@ -234,18 +234,17 @@ x^\mathcal{|T|}]$. \caption{Illustration of the sparse-recovery approach. Our objective is to recover the unknown weight vector $\theta_j$ for each node $j$. We observe a Bernoulli realization of the $f$ transform of the matrix-vector product, where -the measurement matrix encodes which nodes are ``contagious''} +the measurement matrix encodes which nodes are ``contagious'' at each time step} \end{figure} Inferring the model parameter $\Theta$ from observed influence cascades is the central question of the present work. Recovering the edges in $E$ from observed -influence cascades is a well-identified problem known as the \emph{Graph +influence cascades is a well-identified problem known as the \emph{Network Inference} problem. However, recovering the influence parameters is no less -important and has been seemingly overlooked so far. In this work we focus on -recovering $\Theta$, noting that the set of edges $E$ can then be recovered -through the following equivalence: $(i,j)\in E\Leftrightarrow \Theta_{i,j} -\neq 0$ +important. In this work we focus on recovering $\Theta$, noting that the set of +edges $E$ can then be recovered through the following equivalence: $(i,j)\in +E\Leftrightarrow \Theta_{i,j} \neq 0$ Given observations $(x^1,\ldots,x^n)$ of a cascade model, we can recover $\Theta$ via Maximum Likelihood Estimation (MLE). Denoting by $\mathcal{L}$ the |
