finale/project_proposal.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52

\documentclass[10pt]{article}
\usepackage{fullpage, amsmath, amssymb, amsthm}

\title{Regression Analysis with Network data}
\author{Jean Pouget-Abadie, Thibaut Horel}
\date{}

\begin{document}
\maketitle

\subsection*{The Network Inference problem}

The network inference problem concerns itself with learning the edges and the
edge weights of an unknown network. Each edge weight $\theta_e, e\in E$ is a
parameter to be estimated. The information at our disposal is the result of a
cascade process on the network. Here, we will focus on the Generalized Linear
Cascade (GLC) model introduced in~\cite{} presented below.

\paragraph{The GLC model}

Consider a graph $\mathcal{G} = (N, E)$, where nodes $N$ can be one of three
possible states: susceptible, infected, and immune.  Let $X^t$ be the indicator
variable of ``infected nodes'' at time step $t$.  A \emph{generalized linear
cascade model} is a cascade model such that for each ``susceptible'' node $i$ at
time step $t$, the probability of $i$ becoming ``infected'' at time step $t+1$
conditioned on $X^t$ is a Bernoulli variable of parameter $f(\theta_i
\cdot X^t)$:
\begin{equation}
    \mathbb{P}(X^{t+1}_i = 1|X^t) = f(\theta_i \cdot X^t)
\end{equation}
where $f: \mathbb{R} \rightarrow [0,1]$. There is no a priori rule for the
transition from and to the immune and susceptible state, but the most commonly
studied GLC model, known as the Independent Cascade model, assumes that all
infected nodes at time step $T$ are immune for all time steps $t \geq T+1$.
Under this assumption, a cascade ends when no ``infected'' nodes remain.

The designer is also free to decide the initial state of graph $\mathcal{G}$ at
the beginning of every cascade process. For simplicity, we will suppose that, at
the start of each cascade, one node is chosen uniformly at random to be
``infected'', and that all other nodes are in the ``susceptible'' state.

\bigskip
Prior work has focused on recovering the edges of the graph, which can be
formalized as finding the support of the edge weight vector $\theta$. This almost
always relies on solving the following maximum likelihood program:
$$ \theta_{i*} \in \arg \max_\theta \log
\mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n) = \frac{1}{|{\cal T}_i|} \sum_{t\in
{\cal T}_i } x_i^{t+1}\log f(\theta_i\cdot x^{t}) + (1 -
x_i^{t+1})\log\big(1-f(\theta_i \cdot x^t)\big)$$
where $\mathcal{T}_i = \{t: \text{node i is infected at time step } t\}$

\end{document}