\documentclass[10pt]{article} \usepackage{fullpage, amsmath, amssymb, amsthm} \title{Regression Analysis with Network data} \author{Jean Pouget-Abadie, Thibaut Horel} \date{} \begin{document} \maketitle \subsection*{The Network Inference problem} The network inference problem concerns itself with learning the edges and the edge weights of an unknown network. Each edge weight is a parameter to be estimated. The information at our disposal is the result of a cascade process on the network. Here, we will focus on the Generalized Linear Cascade (GLC) model introduced in~\cite{}. \paragraph{Short description of the GLC model} Let $X^t$ be the indicator variable of ``contagious nodes'' at time step $t$. A \emph{generalized linear cascade model} is a cascade model such that for each susceptible node $j$ in state $s$ at time step $t$, the probability of $j$ becoming ``contagious'' at time step $t+1$ conditioned on $X^t$ is a Bernoulli variable of parameter $f(\theta_j \cdot X^t)$: \begin{equation} \label{eq:glm} \mathbb{P}(X^{t+1}_j = 1|X^t) = f(\theta_j \cdot X^t) \end{equation} where $f: \mathbb{R} \rightarrow [0,1]$ \paragraph{Problem statement} Assume that $X_t \sim \mathcal{D}$, where $D$ is the GLC process defined above. Identify the parents and estimate the edge weights for each node $i$ in the network $\mathcal{N}$. This can be solved using maximum likelihood: $$\log \mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n) = \frac{1}{|{\cal T}_i|} \sum_{t\in {\cal T}_i } x_i^{t+1}\log f(\theta_i\cdot x^{t}) + (1 - x_i^{t+1})\log\big(1-f(\theta_i \cdot x^t)\big)$$ In particular, it is known that an approximation of the variance for $\hat \beta$ is given by the inverse of the information matrix, which is given by: $$blabla$$ In the case of logistic regression. In the case of the independent cascade model. \end{document}