We consider a graph ${\cal G}= (V, E, \Theta)$, where $\Theta$ is $|V|\times
|V|$ matrix of paremeters describing the edge weights of $\mathcal{G}$.
A \emph{cascade model} is a Markov process over the finite state space $\{0, 1,
\dots, K\}^V$. An \emph{influence cascade} is a realisation of the random
process described by a cascade model.

In practice, we restrict ourselves to discrete-time homogeneous cascade models.
At $t=0$, each node in the graph has a constant probability $p_{init}$ of being
in the ``active state''. The active nodes at $t=0$ are called the source nodes.
Our probabilistic model for the source nodes is more realistic and less
restrictive than the ``single source'' assumption made in \cite{Daneshmand:2014}
and \cite{Abrahao:13}.


\subsection{Generalized Linear Cascade Models}

Denoting by $X^t$ the state of the cascade at time step $t-1$, we interpret
$X^t_j = 1$ for some node $j\in V$ as ``node $j$ is active'', \emph{i.e}, ``$j$
exhibits the source nodes' behavior at time step $t+1$''.  We draw inspiration
from \emph{generalized linear models} (GLM) to define a generalized linear
cascade.

\begin{definition} Let us denote by $\{\mathcal{F}_t, t\in\ints\}$ the natural
    filtration induced by $\{X_t, t\in\ints\}$. A \emph{generalized linear
    cascade} is characterized by the following equation:
    \begin{displaymath}
        \P[X^{t+t}=x\,|\, \mathcal{F}_t] =
        \prod_{i\in V} f(\inprod{\theta_i}{X^{t}})^{x_i}
        \big(1-f(\inprod{\theta_i}{X^{t}}\big)^{1-x_i}
    \end{displaymath}
where $f:\mathbb{R}\to[0,1]$.
\end{definition}

It follows immediately from this definition that a generalized linear cascade
satisfies the Markov property:
\begin{displaymath}
    \P[X^{t+1}=x\,|\mathcal{F}_t] = \P[X^{t+1}=x\,|\, X^t]
\end{displaymath}
Note also that $\E[X^{t+1}_i\,|\,X^t] = f(\inprod{\theta_i}{X^t})$. As such,
$f$ can be interpreted as the inverse link function of our generalized linear
cascade model.

\subsection{Examples}
\label{subsec:examples}

In this section, we show that both the Independent Cascade Model and the Voter
model are Generalized Linear Cascades. The Linear Threshold model will be
discussed in Section~\ref{sec:linear_threshold}

\subsubsection{Independent Cascade Model}

In the independent cascade model, nodes can be either susceptible, active or inactive. At $t=0$, all source nodes are ``active'' and all remaining nodes are ``susceptible''. At each time step $t$, for each edge $(i,j)$ where $j$ is susceptible and $i$ is active, $i$ attempts to infect $j$ with probability $p_{i,j}\in[0,1]$. If $i$ succeeds, $j$ will become active at time step $t+1$. Regardless of $i$'s success, node $i$ will be inactive at time $t+1$. In other words, nodes stay active for only one time step. The cascade process terminates when no active nodes remain.

If we denote by $X^t$ the indicator variable of the set of active nodes at time step $t-1$, then if $j$ is susceptible at time step $t-1$, we have:
\begin{displaymath}
    \P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
    = 1 - \prod_{i = 1}^m (1 - p_{i,j})^{X^t_i}.
\end{displaymath}
Defining $\Theta_{i,j} \defeq \log(1-p_{i,j})$, this can be rewritten as:
\begin{equation}\label{eq:ic}
    \tag{IC}
    \P\big[X^{t+1}_j = 1\,|\, X^{t}\big]
    = 1 - \prod_{i = 1}^m e^{\Theta_{i,j}X^t_i}
    = 1 - e^{\inprod{\theta_j}{X^t}}
\end{equation} where we defined $\theta_j\defeq (\Theta_{1,j},\ldots,\Theta_{m,j})$.

\subsubsection{The Voter Model}

In the Voter Model, nodes can be either red or blue, where ``blue'' is the source state. The parameters of the graph are normalized such that $\forall i, \ \sum_j \Theta_{i,j} = 1$. Each round, every node $j$ chooses one of its neighbors with probability $\Theta_ij$ and adopts their color. The cascades stops at a fixed horizon time T or if all nodes are of the same color. If we denote by $X^t$ the indicator variable of the set of blue nodes at time step $t-1$, then we have:
\begin{equation}
\mathbb{P}\left[X^{t+1}_j = 1 | X^t \right] = \sum_{i=1}^m \Theta_{i,j} X_i^t = \inprod{\theta_j}{X^t}
\tag{V}
\end{equation}

\subsection{Maximum Likelihood Estimation}

Recovering the intrinsic graph parameters $\Theta$ from observed influence
cascades is the central question of the present work and is important for the
following two reasons: \begin{enumerate} \item knowing the influence parameters
            allows for future prediction of influence cascades.  \item when the
        set of edges $E$ is unknown, it can be recovered from the influence
parameters.  \end{enumerate} We note that recovering the edges in $E$ from
observed influence cascades is a well-identified problem known as the
\emph{Graph Inference} problem. However, recovering the influence parameters is
no less important and has been seemingly overlooked so far. The machinery
developped in this work addresses both problems for a large class of well-known
cascade models.

In both models, the unknown influence parameters are described by an $m\times
m$ matrix $\Theta$. Furthermore, the information of the edges of $G$ is fully
contained in $\Theta$:
\begin{displaymath}
    (i,j)\in E\Leftrightarrow  \Theta_{i,j} \neq 0
\end{displaymath}

Given observations $(x^1,\ldots,x^n)$ of a cascade model, we can recover
$\Theta$ via Maximum Likelihood Estimation (MLE). Denoting by $\mathcal{L}$ the
log-likelihood function, we consider the following $\ell_1$-regularized MLE
problem:
\begin{displaymath}
    \hat{\Theta} \in \argmax_{\Theta} \mathcal{L}(\Theta\,|\,x^1,\ldots,x^n)
    + \lambda\|\Theta\|_1
\end{displaymath}
where $\lambda$ is the regularization factor which helps at preventing
overfitting as well as controlling for the sparsity of the solution.

Note that both the (IC) and the (LT) models are decomposable in the following
sense: the state evolution of each node at each time step happens independently
of the following nodes \emph{and} the state evolution of node $j\in V$ only
depends on $\theta_j$, the $j$-th column of $\Theta$. Consequently, the
log-likelihood function $\mathcal{L}$ can be written as a sum of $m$ terms,
each term $j\in\{1,\ldots,m\}$ only involving $\theta_j$. Since this is equally
true for $\|\Theta\|_1$, each column $\theta_j$ of $\Theta$ can be estimated by
a separate optimization program:
\begin{equation}\label{eq:pre-mle}
    \hat{\theta}_j \in \argmax_{\theta} \mathcal{L}_j(\theta_j\,|\,x^1,\ldots,x^n)
    + \lambda\|\theta_j\|_1
\end{equation}

Furthermore, the state evolution of a node $j\in V$ has the same structure in
both models: the transition from susceptible to active at time step $t+1$ is
controlled by a Bernoulli variable whose parameter can be written
$f(\inprod{\theta_j}{x^t})$ for some function $f$. Hence, if we denote by $n_j$
the first step at which $j$ becomes active, we can rewrite the  MLE program
\eqref{eq:pre-mle}:
\begin{multline}
    \hat{\theta}_j \in \argmax_{\theta} \frac{1}{n_j}
    \sum_{t=1}^{n_j-1}\log\big(1-f(\inprod{\theta_j}{x^t})\big)\\
\log f(\inprod{\theta_j}{x^{n_j}})+ \lambda\|\theta_j\|
\end{multline}