We consider a graph ${\cal G}= (V, E, \Theta)$, where $\Theta$ is a $|V|\times
|V|$ matrix of paremeters describing the edge weights of $\mathcal{G}$. We
define $m\defeq |V|$. A \emph{cascade model} is a Markov process over the
finite state space $\{0, 1, \dots, K\}^V$. An \emph{influence cascade} is
a realisation of the random process described by a cascade model.

In practice, we restrict ourselves to discrete-time homogeneous cascade models.
At $t=0$, each node in the graph has a constant probability $p_{init}$ of being
in the ``active state''. The active nodes at $t=0$ are called the source nodes.
Our probabilistic model for the source nodes is more realistic and less
restrictive than the ``single source'' assumption made in \cite{Daneshmand:2014}
and \cite{Abrahao:13}.


\subsection{Generalized Linear Cascade Models}

Denoting by $X^t$ the state of the cascade at time step $t$, we interpret
$X^t_j = 1$ for node $j\in V$ as ``node $j$ is active'', \emph{i.e}, ``$j$
exhibits the source nodes' behavior at time step $t$''.  We draw inspiration
from \emph{generalized linear models} (GLM) to define a generalized linear
cascade.

\begin{definition}
    \label{def:glcm} 
    Let us denote by $\{\mathcal{F}_t, t\in\ints\}$ the natural
    filtration induced by $\{X_t, t\in\ints\}$. A \emph{generalized linear
    cascade} is characterized by the following equation:
    \begin{displaymath}
        \P[X^{t+1}=x\,|\, \mathcal{F}_t] =
        \prod_{i=1}^m f(\inprod{\theta_i}{X^{t}})^{x_i}
        \big(1-f(\inprod{\theta_i}{X^{t}}\big)^{1-x_i}
    \end{displaymath}
where $f:\mathbb{R}\to[0,1]$ and $\theta_i$ is the $i$-th column of $\Theta$.
\end{definition}

It follows immediately from this definition that a generalized linear cascade
satisfies the Markov property:
\begin{displaymath}
    \P[X^{t+1}=x\,|\mathcal{F}_t] = \P[X^{t+1}=x\,|\, X^t]
\end{displaymath}
Note also that $\E[X^{t+1}_i\,|\,X^t] = f(\inprod{\theta_i}{X^t})$. As such,
$f$ can be interpreted as the inverse link function of our generalized linear
cascade model.

\subsection{Examples}
\label{subsec:examples}

In this section, we show that the well-known Independent Cascade Model and the Voter model are Generalized Linear Cascades. The Linear Threshold model will be discussed in Section~\ref{sec:linear_threshold}.

\subsubsection{Independent Cascade Model}

In the independent cascade model, nodes can be either susceptible, active or
inactive. At $t=0$, all source nodes are ``active'' and all remaining nodes are
``susceptible''. At each time step $t$, for each edge $(i,j)$ where $j$ is
susceptible and $i$ is active, $i$ attempts to infect $j$ with probability
$p_{i,j}\in[0,1]$, the infection attempts are independent of each other. If $i$
succeeds, $j$ will become active at time step $t+1$. Regardless of $i$'s
success, node $i$ will be inactive at time $t+1$. In other words, nodes stay
active for only one time step. The cascade process terminates when no active
nodes remain.

If we denote by $X^t$ the indicator variable of the set of active nodes at time
step $t$, then if $j$ is susceptible at time step $t+1$, we have:
\begin{displaymath}
    \P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
    = 1 - \prod_{i = 1}^m (1 - p_{i,j})^{X^t_i}.
\end{displaymath}
Defining $\Theta_{i,j} \defeq \log(1-p_{i,j})$, this can be rewritten as:
\begin{equation}\label{eq:ic}
    \tag{IC}
    \P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
    = 1 - \prod_{i = 1}^m e^{\Theta_{i,j}X^t_i}
    = 1 - e^{\inprod{\theta_j}{X^t}}
\end{equation}
which is a Generalized Linear Cascade model with inverse link function $f(z)
= 1 - e^z$.

\subsubsection{The Voter Model}

In the Voter Model, nodes can be either red or blue, where ``blue'' is the
source state. The parameters of the graph are normalized such that $\forall i,
\ \sum_j \Theta_{i,j} = 1$. Each round, every node $j$ independently chooses
one of its neighbors with probability $\Theta_ij$ and adopts their color. The
cascades stops at a fixed horizon time T or if all nodes are of the same color.
If we denote by $X^t$ the indicator variable of the set of blue nodes at time
step $t$, then we have:
\begin{equation}
\mathbb{P}\left[X^{t+1}_j = 1 | X^t \right] = \sum_{i=1}^m \Theta_{i,j} X_i^t = \inprod{\theta_j}{X^t}
\tag{V}
\end{equation}
which is a Generalized Linear Cascade model with inverse link function
$f(z) = z$.

\subsection{Maximum Likelihood Estimation}

Recovering the model parameter $\Theta$ from observed influence cascades is the
central question of the present work. Recovering the edges in $E$ from observed
influence cascades is a well-identified problem known as the \emph{Graph
Inference} problem. However, recovering the influence parameters is no less
important and has been seemingly overlooked so far. In this work we focus on
recovering $\Theta$, noting that the set of edges $E$ can then be recovered
through the following equivalence:
\begin{displaymath}
    (i,j)\in E\Leftrightarrow  \Theta_{i,j} \neq 0
\end{displaymath}

Given observations $(x^1,\ldots,x^n)$ of a cascade model, we can recover
$\Theta$ via Maximum Likelihood Estimation (MLE). Denoting by $\mathcal{L}$ the
log-likelihood function, we consider the following $\ell_1$-regularized MLE
problem:
\begin{displaymath}
    \hat{\Theta} \in \argmax_{\Theta} \mathcal{L}(\Theta\,|\,x^1,\ldots,x^n)
    + \lambda\|\Theta\|_1
\end{displaymath}
where $\lambda$ is the regularization factor which helps at preventing
overfitting as well as controlling for the sparsity of the solution.

The generalized linear cascade model is decomposable in the following sense:
given Definition~\ref{def:glcm}, the log-likelihood can be written as the sum
of $m$ terms, each term $i\in\{1,\ldots,m\}$ only depending on $\theta_i$.
Since this is equally true for $\|\Theta\|_1$, each column $\theta_i$ of
$\Theta$ can be estimated by a separate optimization program:
\begin{equation}\label{eq:pre-mle}
    \hat{\theta}_i \in \argmax_{\theta}\frac{1}{n_i}\mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n)
    + \lambda\|\theta_i\|_1
\end{equation}
where we denote by $n_i$ the first step at which node $i$ becomes active and
where:
\begin{multline}
    \mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n) = \frac{1}{n_i}
    \sum_{t=1}^{n_i-1}\log\big(1-f(\inprod{\theta_i}{x^t})\big)\\
+\log f(\inprod{\theta_i}{x^{n_i}})+ \lambda\|\theta_i\|_1
\end{multline}

TODO: discuss conditions on $f$ under which this program is convex. For LC and
VM it is convex.