From c91618ab0cf10075f3b64eb6ee8fb5ad42b9c188 Mon Sep 17 00:00:00 2001 From: jeanpouget-abadie Date: Tue, 13 Oct 2015 23:31:04 -0400 Subject: better presentation of network inference pb --- finale/project_proposal.tex | 64 ++++++++++++++++++++++++--------------------- 1 file changed, 34 insertions(+), 30 deletions(-) diff --git a/finale/project_proposal.tex b/finale/project_proposal.tex index 55d44e6..830d63a 100644 --- a/finale/project_proposal.tex +++ b/finale/project_proposal.tex @@ -11,38 +11,42 @@ \subsection*{The Network Inference problem} The network inference problem concerns itself with learning the edges and the -edge weights of an unknown network. Each edge weight is a parameter to be -estimated. The information at our disposal is the result of a cascade process on -the network. Here, we will focus on the Generalized Linear Cascade (GLC) model -introduced in~\cite{}. - -\paragraph{Short description of the GLC model} - -Let $X^t$ be the indicator variable of ``contagious nodes'' at time step $t$. -A \emph{generalized linear cascade model} is a cascade model such that for each -susceptible node $j$ in state $s$ at time step $t$, the probability of $j$ -becoming ``contagious'' at time step $t+1$ conditioned on $X^t$ is a Bernoulli -variable of parameter $f(\theta_j \cdot X^t)$: +edge weights of an unknown network. Each edge weight $\theta_e, e\in E$ is a +parameter to be estimated. The information at our disposal is the result of a +cascade process on the network. Here, we will focus on the Generalized Linear +Cascade (GLC) model introduced in~\cite{} presented below. + +\paragraph{The GLC model} + +Consider a graph $\mathcal{G} = (N, E)$, where nodes $N$ can be one of three +possible states: susceptible, infected, and immune. Let $X^t$ be the indicator +variable of ``infected nodes'' at time step $t$. A \emph{generalized linear +cascade model} is a cascade model such that for each ``susceptible'' node $i$ at +time step $t$, the probability of $i$ becoming ``infected'' at time step $t+1$ +conditioned on $X^t$ is a Bernoulli variable of parameter $f(\theta_i +\cdot X^t)$: \begin{equation} - \label{eq:glm} - \mathbb{P}(X^{t+1}_j = 1|X^t) - = f(\theta_j \cdot X^t) + \mathbb{P}(X^{t+1}_i = 1|X^t) = f(\theta_i \cdot X^t) \end{equation} -where $f: \mathbb{R} \rightarrow [0,1]$ - -\paragraph{Problem statement} -Assume that $X_t \sim \mathcal{D}$, where $D$ is the GLC process defined above. -Identify the parents and estimate the edge weights for each node $i$ in the -network $\mathcal{N}$. This can be solved using maximum likelihood: -$$\log \mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n) = \frac{1}{|{\cal T}_i|} -\sum_{t\in {\cal T}_i } x_i^{t+1}\log f(\theta_i\cdot x^{t}) + (1 - +where $f: \mathbb{R} \rightarrow [0,1]$. There is no a priori rule for the +transition from and to the immune and susceptible state, but the most commonly +studied GLC model, known as the Independent Cascade model, assumes that all +infected nodes at time step $T$ are immune for all time steps $t \geq T+1$. +Under this assumption, a cascade ends when no ``infected'' nodes remain. + +The designer is also free to decide the initial state of graph $\mathcal{G}$ at +the beginning of every cascade process. For simplicity, we will suppose that, at +the start of each cascade, one node is chosen uniformly at random to be +``infected'', and that all other nodes are in the ``susceptible'' state. + +\bigskip +Prior work has focused on recovering the edges of the graph, which can be +formalized as finding the support of the edge weight vector $\theta$. This almost +always relies on solving the following maximum likelihood program: +$$ \theta_{i*} \in \arg \max_\theta \log +\mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n) = \frac{1}{|{\cal T}_i|} \sum_{t\in +{\cal T}_i } x_i^{t+1}\log f(\theta_i\cdot x^{t}) + (1 - x_i^{t+1})\log\big(1-f(\theta_i \cdot x^t)\big)$$ - -In particular, it is known that an approximation of the variance for $\hat -\beta$ is given by the inverse of the information matrix, which is given by: -$$blabla$$ - -In the case of logistic regression. In the case of the independent cascade -model. +where $\mathcal{T}_i = \{t: \text{node i is infected at time step } t\}$ \end{document} -- cgit v1.2.3-70-g09d2