diff options
Diffstat (limited to 'paper/sections/model.tex')
| -rw-r--r-- | paper/sections/model.tex | 65 |
1 files changed, 36 insertions, 29 deletions
diff --git a/paper/sections/model.tex b/paper/sections/model.tex index 43de785..b478e94 100644 --- a/paper/sections/model.tex +++ b/paper/sections/model.tex @@ -2,7 +2,7 @@ We consider a graph ${\cal G}= (V, E, \Theta)$, where $\Theta$ is a $|V|\times |V|$ matrix of parameters describing the edge weights of $\mathcal{G}$. Intuitively, $\Theta_{i,j}$ captures the ``influence'' of node $i$ on node $j$. Let $m\defeq |V|$. For each node $j$, let $\theta_{j}$ be the $j^{th}$ column -vector of $\Theta$. A {\color{red} discrete-time} \emph{Cascade model} is a +vector of $\Theta$. A discrete-time \emph{Cascade model} is a Markov process over a finite state space ${\{0, 1, \dots, K-1\}}^V$ with the following properties: \begin{enumerate} @@ -25,16 +25,19 @@ contagious nodes ``influence'' other nodes in the graph to become contagious. An the successive states of the nodes in graph ${\cal G}$. Note that both the ``single source'' assumption made in~\cite{Daneshmand:2014} and \cite{Abrahao:13} as well as the ``uniformly chosen source set'' assumption made -in~\cite{Netrapalli:2012} verify condition 3. {\color{red} why is it less -restrictive? explain} +in~\cite{Netrapalli:2012} verify condition 3. Also note that the multiple-source +node assumption does not reduce to the single-source assumption: even +non-overlapping cascades started from two nodes that are far apart cannot in +practice be attributed to either process without knowledge of the newtork +topology. -In the context of Graph Inference,~\cite{Netrapalli:2012} focus +In the context of Network Inference,~\cite{Netrapalli:2012} focus on the well-known discrete-time independent cascade model recalled below, which \cite{Abrahao:13} and~\cite{Daneshmand:2014} generalize to continuous time. We extend the independent cascade model in a different direction by considering a more general class of transition probabilities while staying in the discrete-time setting. We observe that despite their obvious differences, both -the independent cascade and the voter models make the graph inference problem +the independent cascade and the voter models make the network inference problem similar to the standard generalized linear model inference problem. In fact, we define a class of diffusion processes for which this is true: the \emph{Generalized Linear Cascade Models}. The linear threshold model is @@ -121,18 +124,15 @@ Defining $\Theta_{i,j} \defeq \log(1-p_{i,j})$, this can be rewritten as: \tag{IC} \P\big[X^{t+1}_j = 1\,|\, X^{t}\big] = 1 - \prod_{i = 1}^m e^{\Theta_{i,j}X^t_i} - = 1 - e^{\inprod{\theta_j}{X^t}} + = 1 - e^{\inprod{\Theta_j}{X^t}} \end{equation} Therefore, the independent cascade model is a Generalized Linear Cascade model with inverse link function $f : z \mapsto 1 - e^z$. -Note that interpreting it in the case of the Independent Cascade Model requires -one more step. Indeed, to cast it as a generalized linear cascade model, we had -to perform the transformation $\Theta_{i,j} = \log(1-p_{i,j})$, where $p_{i,j}$ -are the infection probabilities. Fortunately, if we estimate $p_j$ through -$\hat{p}_{j} = \hat{\theta}_j$, a bound on $\|\hat\theta - \theta^*\|_2$ -directly implies a bound on $\|\hat p - p^*\|_2$: +In Section~\ref{sec:results}, we show bounds on the parameter of the graph. +Fortunately, a bound on $\|\hat\theta - \theta^*\|_2$ directly implies a bound +on $\|\hat p - p^*\|_2$ as shown by the following lemma: \begin{lemma} $\|\hat{\theta} - \theta^* \|_2 \geq \|\hat{p} - p^*\|_2$. @@ -166,39 +166,43 @@ with inverse link function $f: z \mapsto z$. \subsubsection{Discretization of Continous Model} -Anoter motivation for the Generalized Linear Cascade model is that it captures +Another motivation for the Generalized Linear Cascade model is that it captures the time-discretized formulation of the well-studied continuous-time independent cascade model with exponential transmission function (CICE) of~\cite{GomezRodriguez:2010, Abrahao:13, Daneshmand:2014}. Suppose that time is binned into intervals of length $\epsilon$ and let $X^k$ be the set of nodes `infected' before or during the $k^{th}$ time interval. Note that contrary to -the discrete-time independent cascade model, $X^k_i = 1 \implies X^{k+1}_i = -1$. Let $\exp(p)$ be an exponentially-distributed random variable of parameter -$p$. By the memoryless property of the exponential, if $X^k_j \neq 1$: +the discrete-time independent cascade model, $X^k_j = 1 \implies X^{k+1}_j = +1$. + +Let $\exp(p)$ be an exponentially-distributed random variable of parameter $p$ +and let $p_{i,j}$ be the rate of transmission along directed edge $(i,j)$ in the +CICE model. By the memoryless property of the exponential, if $X^k_j \neq 1$: \begin{align*} \mathbb{P}(X^{k+1}_j = 1 | X^k) & = \mathbb{P}(\min_{i \in {\cal N}(j)} \exp(p_{i,j}) \leq \epsilon) \\ & = \mathbb{P}(\exp( \sum_{i=1}^m \Theta_{i,j} X^t_i) \leq \epsilon) \\ - & = 1 - e^{- \epsilon \cdot \theta_j \cdot X^t} + & = 1 - e^{- \epsilon \Theta_j \cdot X^t} \end{align*} Note that here we implicitly let $\Theta_{i,j} = 0$ if $(i,j)$ is not a directed edge of the graph. Note that this formulation is also consistent when $X^k_j = 1$ by considering the dummy variables $\forall i, \Theta_{i,i} = -+\infty$. Therefore, the $\epsilon-$binned-process of the continuous-time -model with exponential transmission function is a Generalized Linear Cascade -model with inverse link function $f:z\mapsto 1-e^{-\epsilon\cdot z}$. ++\infty$. Therefore, the $\epsilon$-time-binned CICE-induced process is a +Generalized Linear Cascade model with inverse link function $f:z\mapsto +1-e^{-\epsilon\cdot z}$. \subsubsection{Logistic Cascades} Consider the specific case of ``logistic cascades'' (where $f$ is the logistic -function). This model captures the idea that there is a threshold after which, -each additional neighbor's influence contribution, becomes significant. In this -way, logistic cascades can be thought of approximating the more-commonly -studied Linear Threshold model TODO: CITE. As we will see later in the -analysis, the Hessian of our optimization program simplifies to -$\nabla^2\mathcal{L}(\theta^*) = \frac{1}{|\mathcal{T}|}XX^T$ where $X$ is the -observation matrix $[x^1 \ldots x^\mathcal{|T|}]$. The (RE)-condition is then -equivalent to the assumption made in the Lasso analysis of \cite{bickel:2009}. +function). This model captures the idea that there is a threshold around which +each additional neighbor's contribution becomes significant: logistic +cascades can be thought of approximating the more-commonly studied Linear +Threshold model~\cite{Kempe:03}. As we will see later in the +analysis, the Hessian of our optimization program simplifies in the case of a +logistic inverse link function to $\nabla^2\mathcal{L}(\theta^*) = +\frac{1}{|\mathcal{T}|}XX^T$ where $X$ is the observation matrix $[x^1 \ldots +x^\mathcal{|T|}]$. The (RE)-condition we introduce subsequently is then +equivalent to the assumption made in the Lasso analysis of~\cite{bickel:2009}. @@ -230,7 +234,10 @@ equivalent to the assumption made in the Lasso analysis of \cite{bickel:2009}. \begin{figure} \includegraphics[scale=.4]{figures/drawing.pdf} - \caption{Illustration of the sparse-recovery approach} + \caption{Illustration of the sparse-recovery approach: the measurement matrix + is known, as is a Bernoulli realization of the matrix-vector product via the +non-linear transformation induced by $f$. Our objective is to recover the +unknown weight vector $\theta_i$ for each node $i$.} \end{figure} |
