diff options
| author | jeanpouget-abadie <jean.pougetabadie@gmail.com> | 2015-10-14 08:20:18 -0400 |
|---|---|---|
| committer | jeanpouget-abadie <jean.pougetabadie@gmail.com> | 2015-10-14 08:20:18 -0400 |
| commit | 4fa0f3257276a515133bd8bc07e575f2b81d8fb5 (patch) | |
| tree | acb5e13ab96b2475830d6d18bb00ca821ee48a21 /finale/project_proposal.tex | |
| parent | c91618ab0cf10075f3b64eb6ee8fb5ad42b9c188 (diff) | |
| download | cascades-4fa0f3257276a515133bd8bc07e575f2b81d8fb5.tar.gz | |
adding reformulation of network inference pb
Diffstat (limited to 'finale/project_proposal.tex')
| -rw-r--r-- | finale/project_proposal.tex | 43 |
1 files changed, 29 insertions, 14 deletions
diff --git a/finale/project_proposal.tex b/finale/project_proposal.tex index 830d63a..c92bb36 100644 --- a/finale/project_proposal.tex +++ b/finale/project_proposal.tex @@ -19,34 +19,49 @@ Cascade (GLC) model introduced in~\cite{} presented below. \paragraph{The GLC model} Consider a graph $\mathcal{G} = (N, E)$, where nodes $N$ can be one of three -possible states: susceptible, infected, and immune. Let $X^t$ be the indicator +possible states: susceptible, infected, and immune. Let $x^t$ be the indicator variable of ``infected nodes'' at time step $t$. A \emph{generalized linear cascade model} is a cascade model such that for each ``susceptible'' node $i$ at time step $t$, the probability of $i$ becoming ``infected'' at time step $t+1$ -conditioned on $X^t$ is a Bernoulli variable of parameter $f(\theta_i +conditioned on $x^t$ is a Bernoulli variable of parameter $f(\theta_i \cdot X^t)$: \begin{equation} - \mathbb{P}(X^{t+1}_i = 1|X^t) = f(\theta_i \cdot X^t) + \mathbb{P}(x^{t+1}_i = 1|x^t) = f(\theta_i \cdot x^t) \end{equation} where $f: \mathbb{R} \rightarrow [0,1]$. There is no a priori rule for the transition from and to the immune and susceptible state, but the most commonly studied GLC model, known as the Independent Cascade model, assumes that all -infected nodes at time step $T$ are immune for all time steps $t \geq T+1$. -Under this assumption, a cascade ends when no ``infected'' nodes remain. - -The designer is also free to decide the initial state of graph $\mathcal{G}$ at -the beginning of every cascade process. For simplicity, we will suppose that, at -the start of each cascade, one node is chosen uniformly at random to be +infected nodes at time step $t$ are immune for all time steps $t' \geq t+1$. +Under this assumption, a cascade ends when no ``infected'' nodes remain. The +designer is also free to decide the initial state of graph $\mathcal{G}$ at the +beginning of every cascade process. For simplicity, we will suppose that, at the +start of each cascade, one node is chosen uniformly at random to be ``infected'', and that all other nodes are in the ``susceptible'' state. \bigskip -Prior work has focused on recovering the edges of the graph, which can be -formalized as finding the support of the edge weight vector $\theta$. This almost -always relies on solving the following maximum likelihood program: -$$ \theta_{i*} \in \arg \max_\theta \log +The network inference problem can therefore be formalized as finding the support +of the edge weight vector $\theta$ from the observation of ${(x^c_t)}_{c\in +\mathcal{C}, t\in \mathcal{T}_C}$, where $\mathcal{C}$ indexes the cascade +number, and $\mathcal{T}_C$ the time step in that cascade. Prior work has +observed that the network inference problem is decomposable node-by-node +i.e.~solving for $\beta_{i} = \{\theta_{i,j} | j \in N\}$ for node $i$. This can +easily be formulated as the following maximum likelihood program: +$$ \theta_{i} \in \arg \max_\theta \log \mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n) = \frac{1}{|{\cal T}_i|} \sum_{t\in {\cal T}_i } x_i^{t+1}\log f(\theta_i\cdot x^{t}) + (1 - x_i^{t+1})\log\big(1-f(\theta_i \cdot x^t)\big)$$ -where $\mathcal{T}_i = \{t: \text{node i is infected at time step } t\}$ +where $\mathcal{T}_i = \{\text{node i is ``susceptible'' at time step } t\}$. +The reader will note the above problem boils down to fitting a generalized +linear model. In particular, if $f$ is the sigmoid function, we are performing +logistic regression: +$$ +\begin{cases} +y_i^* = \theta_i \cdot x^t + \epsilon, \text{~where~} \epsilon\sim +Logistic(0, 1) \\ +y_i = [y_i^* > 0] \text{~where~} y_i^t = x_i^{t+1} +\end{cases} +$$ + +\subsection*{Objectives} \end{document} |
