We consider a graph ${\cal G}= (V, E, \Theta)$, where $\Theta$ is $|V|\times |V|$ matrix of paremeters describing the edge weights of $\mathcal{G}$. A \emph{cascade model} is a Markov process over the finite state space $\{0, 1, \dots, K\}^V$. An \emph{influence cascade} is a realisation of the random process described by a cascade model. In practice, we restrict ourselves to discrete-time homogeneous cascade models. At $t=0$, each node in the graph has a constant probability $p_{init}$ of being in the ``active state''. The active nodes at $t=0$ are called the source nodes. Our probabilistic model for the source nodes is more realistic and less restrictive than the ``single source'' assumption made in \cite{Daneshmand:2014} and \cite{Abrahao:13}. \subsection{Generalized Linear Cascade Models} Denoting by $X^t$ the state of the cascade at time step $t-1$, we interpret $X^t_j = 1$ for some node $j\in V$ as ``node $j$ is active'', \emph{i.e}, ``$j$ exhibits the source nodes' behavior at time step $t+1$''. We draw inspiration from \emph{generalized linear models} (GLM) to define a generalized linear cascade. \begin{definition} Let us denote by $\{\mathcal{F}_t, t\in\ints\}$ the natural filtration induced by $\{X_t, t\in\ints\}$. A \emph{generalized linear cascade} is characterized by the following equation: \begin{displaymath} \P[X^{t+t}=x\,|\, \mathcal{F}_t] = \prod_{i\in V} f(\inprod{\theta_i}{X^{t}})^{x_i} \big(1-f(\inprod{\theta_i}{X^{t}}\big)^{1-x_i} \end{displaymath} where $f:\mathbb{R}\to[0,1]$. \end{definition} It follows immediately from this definition that a generalized linear cascade satisfies the Markov property: \begin{displaymath} \P[X^{t+1}=x\,|\mathcal{F}_t] = \P[X^{t+1}=x\,|\, X^t] \end{displaymath} Note also that $\E[X^{t+1}_i\,|\,X^t] = f(\inprod{\theta_i}{X^t})$. As such, $f$ can be interpreted as the inverse link function of our generalized linear cascade model. \subsection{Examples} \label{subsec:examples} In this section, we show that both the Independent Cascade Model and the Voter model are Generalized Linear Cascades. The Linear Threshold model will be discussed in Section~\ref{sec:linear_threshold} \subsubsection{Independent Cascade Model} In the independent cascade model, nodes can be either susceptible, active or inactive. At $t=0$, all source nodes are ``active'' and all remaining nodes are ``susceptible''. At each time step $t$, for each edge $(i,j)$ where $j$ is susceptible and $i$ is active, $i$ attempts to infect $j$ with probability $p_{i,j}\in[0,1]$. If $i$ succeeds, $j$ will become active at time step $t+1$. Regardless of $i$'s success, node $i$ will be inactive at time $t+1$. In other words, nodes stay active for only one time step. The cascade process terminates when no active nodes remain. If we denote by $X^t$ the indicator variable of the set of active nodes at time step $t-1$, then if $j$ is susceptible at time step $t-1$, we have: \begin{displaymath} \P\big[X^{t+1}_j = 1\,|\, X^{t}\big] = 1 - \prod_{i = 1}^m (1 - p_{i,j})^{X^t_i}. \end{displaymath} Defining $\Theta_{i,j} \defeq \log(1-p_{i,j})$, this can be rewritten as: \begin{equation}\label{eq:ic} \tag{IC} \P\big[X^{t+1}_j = 1\,|\, X^{t}\big] = 1 - \prod_{i = 1}^m e^{\Theta_{i,j}X^t_i} = 1 - e^{\inprod{\theta_j}{X^t}} \end{equation} where we defined $\theta_j\defeq (\Theta_{1,j},\ldots,\Theta_{m,j})$. \subsubsection{The Voter Model} In the Voter Model, nodes can be either red or blue, where ``blue'' is the source state. The parameters of the graph are normalized such that $\forall i, \ \sum_j \Theta_{i,j} = 1$. Each round, every node $j$ chooses one of its neighbors with probability $\Theta_ij$ and adopts their color. The cascades stops at a fixed horizon time T or if all nodes are of the same color. If we denote by $X^t$ the indicator variable of the set of blue nodes at time step $t-1$, then we have: \begin{equation} \mathbb{P}\left[X^{t+1}_j = 1 | X^t \right] = \sum_{i=1}^m \Theta_{i,j} X_i^t = \inprod{\theta_j}{X^t} \tag{V} \end{equation} \subsection{Maximum Likelihood Estimation} Recovering the intrinsic graph parameters $\Theta$ from observed influence cascades is the central question of the present work and is important for the following two reasons: \begin{enumerate} \item knowing the influence parameters allows for future prediction of influence cascades. \item when the set of edges $E$ is unknown, it can be recovered from the influence parameters. \end{enumerate} We note that recovering the edges in $E$ from observed influence cascades is a well-identified problem known as the \emph{Graph Inference} problem. However, recovering the influence parameters is no less important and has been seemingly overlooked so far. The machinery developped in this work addresses both problems for a large class of well-known cascade models. In both models, the unknown influence parameters are described by an $m\times m$ matrix $\Theta$. Furthermore, the information of the edges of $G$ is fully contained in $\Theta$: \begin{displaymath} (i,j)\in E\Leftrightarrow \Theta_{i,j} \neq 0 \end{displaymath} Given observations $(x^1,\ldots,x^n)$ of a cascade model, we can recover $\Theta$ via Maximum Likelihood Estimation (MLE). Denoting by $\mathcal{L}$ the log-likelihood function, we consider the following $\ell_1$-regularized MLE problem: \begin{displaymath} \hat{\Theta} \in \argmax_{\Theta} \mathcal{L}(\Theta\,|\,x^1,\ldots,x^n) + \lambda\|\Theta\|_1 \end{displaymath} where $\lambda$ is the regularization factor which helps at preventing overfitting as well as controlling for the sparsity of the solution. Note that both the (IC) and the (LT) models are decomposable in the following sense: the state evolution of each node at each time step happens independently of the following nodes \emph{and} the state evolution of node $j\in V$ only depends on $\theta_j$, the $j$-th column of $\Theta$. Consequently, the log-likelihood function $\mathcal{L}$ can be written as a sum of $m$ terms, each term $j\in\{1,\ldots,m\}$ only involving $\theta_j$. Since this is equally true for $\|\Theta\|_1$, each column $\theta_j$ of $\Theta$ can be estimated by a separate optimization program: \begin{equation}\label{eq:pre-mle} \hat{\theta}_j \in \argmax_{\theta} \mathcal{L}_j(\theta_j\,|\,x^1,\ldots,x^n) + \lambda\|\theta_j\|_1 \end{equation} Furthermore, the state evolution of a node $j\in V$ has the same structure in both models: the transition from susceptible to active at time step $t+1$ is controlled by a Bernoulli variable whose parameter can be written $f(\inprod{\theta_j}{x^t})$ for some function $f$. Hence, if we denote by $n_j$ the first step at which $j$ becomes active, we can rewrite the MLE program \eqref{eq:pre-mle}: \begin{multline} \hat{\theta}_j \in \argmax_{\theta} \frac{1}{n_j} \sum_{t=1}^{n_j-1}\log\big(1-f(\inprod{\theta_j}{x^t})\big)\\ \log f(\inprod{\theta_j}{x^{n_j}})+ \lambda\|\theta_j\| \end{multline}