diff options
| -rw-r--r-- | finale/project_proposal.tex | 58 |
1 files changed, 31 insertions, 27 deletions
diff --git a/finale/project_proposal.tex b/finale/project_proposal.tex index 55d44e6..830d63a 100644 --- a/finale/project_proposal.tex +++ b/finale/project_proposal.tex @@ -11,38 +11,42 @@ \subsection*{The Network Inference problem} The network inference problem concerns itself with learning the edges and the -edge weights of an unknown network. Each edge weight is a parameter to be -estimated. The information at our disposal is the result of a cascade process on -the network. Here, we will focus on the Generalized Linear Cascade (GLC) model -introduced in~\cite{}. +edge weights of an unknown network. Each edge weight $\theta_e, e\in E$ is a +parameter to be estimated. The information at our disposal is the result of a +cascade process on the network. Here, we will focus on the Generalized Linear +Cascade (GLC) model introduced in~\cite{} presented below. -\paragraph{Short description of the GLC model} +\paragraph{The GLC model} -Let $X^t$ be the indicator variable of ``contagious nodes'' at time step $t$. -A \emph{generalized linear cascade model} is a cascade model such that for each -susceptible node $j$ in state $s$ at time step $t$, the probability of $j$ -becoming ``contagious'' at time step $t+1$ conditioned on $X^t$ is a Bernoulli -variable of parameter $f(\theta_j \cdot X^t)$: +Consider a graph $\mathcal{G} = (N, E)$, where nodes $N$ can be one of three +possible states: susceptible, infected, and immune. Let $X^t$ be the indicator +variable of ``infected nodes'' at time step $t$. A \emph{generalized linear +cascade model} is a cascade model such that for each ``susceptible'' node $i$ at +time step $t$, the probability of $i$ becoming ``infected'' at time step $t+1$ +conditioned on $X^t$ is a Bernoulli variable of parameter $f(\theta_i +\cdot X^t)$: \begin{equation} - \label{eq:glm} - \mathbb{P}(X^{t+1}_j = 1|X^t) - = f(\theta_j \cdot X^t) + \mathbb{P}(X^{t+1}_i = 1|X^t) = f(\theta_i \cdot X^t) \end{equation} -where $f: \mathbb{R} \rightarrow [0,1]$ +where $f: \mathbb{R} \rightarrow [0,1]$. There is no a priori rule for the +transition from and to the immune and susceptible state, but the most commonly +studied GLC model, known as the Independent Cascade model, assumes that all +infected nodes at time step $T$ are immune for all time steps $t \geq T+1$. +Under this assumption, a cascade ends when no ``infected'' nodes remain. -\paragraph{Problem statement} -Assume that $X_t \sim \mathcal{D}$, where $D$ is the GLC process defined above. -Identify the parents and estimate the edge weights for each node $i$ in the -network $\mathcal{N}$. This can be solved using maximum likelihood: -$$\log \mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n) = \frac{1}{|{\cal T}_i|} -\sum_{t\in {\cal T}_i } x_i^{t+1}\log f(\theta_i\cdot x^{t}) + (1 - -x_i^{t+1})\log\big(1-f(\theta_i \cdot x^t)\big)$$ - -In particular, it is known that an approximation of the variance for $\hat -\beta$ is given by the inverse of the information matrix, which is given by: -$$blabla$$ +The designer is also free to decide the initial state of graph $\mathcal{G}$ at +the beginning of every cascade process. For simplicity, we will suppose that, at +the start of each cascade, one node is chosen uniformly at random to be +``infected'', and that all other nodes are in the ``susceptible'' state. -In the case of logistic regression. In the case of the independent cascade -model. +\bigskip +Prior work has focused on recovering the edges of the graph, which can be +formalized as finding the support of the edge weight vector $\theta$. This almost +always relies on solving the following maximum likelihood program: +$$ \theta_{i*} \in \arg \max_\theta \log +\mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n) = \frac{1}{|{\cal T}_i|} \sum_{t\in +{\cal T}_i } x_i^{t+1}\log f(\theta_i\cdot x^{t}) + (1 - +x_i^{t+1})\log\big(1-f(\theta_i \cdot x^t)\big)$$ +where $\mathcal{T}_i = \{t: \text{node i is infected at time step } t\}$ \end{document} |
