We consider a graph $G= (V, E)$ and we define $m\defeq |V|$. A \emph{cascade model} is a discrete-time homogeneous Markov process over $\{0,1\}^m$. Denoting by $X^t$ the state of the process at time step $t-1$, we interpret $X^t_j = 1$ for some node $j\in V$ as ``node $j$ is infected at time step $t-1$''. An \emph{influence cascade} is simply a realisation of the random process described by a cascade model. In what follows, we describe two widely used cascade models, the Independent Cascade model (IC) and the Linear Threshold model (LT). In these models, the transition probabilities of the Markov process depend on unknown parameters describing how much the nodes in $G$ influence each other. Recovering these parameters from observed influence cascades is the central question of the present work and is important for the following two reasons: \begin{enumerate} \item knowing the influence parameters allows for future prediction of influence cascades. \item when the set of edges $E$ is unknown, it can be recovered from the influence parameters. \end{enumerate} We note that recovering the edges in $E$ from observed influence casccades is a well-identified problem known as the \emph{Graph Inference} problem. However, recovering the influence parameters is no less important and has been seemingly overlooked so far. The machinery developped in this work addresses both problems. \subsection{Independent Cascade Model} In the independent cascade model, nodes can be either uninfected, active or infected. All nodes start either as uninfected or active. At each time step $t$, for each edge $(i,j)$ where $j$ is uninfected and $i$ is active, $i$ attempts to infect $j$ with probability $p_{i,j}\in[0,1]$. If $i$ succeeds, $j$ will become active at time step $t+1$. Regardless of $i$'s success, node $i$ will be infected at time $t+1$: nodes stay active for only one time step. The cascade continues until no active nodes remain. If we denote by $X^t$ the indicator variable of the set of active nodes at time step $t-1$, then if $j$ is uninfected at time step $t-1$, we have: \begin{displaymath} \P\big[X^{t+1}_j = 1\,|\, X^{t}\big] = 1 - \prod_{i = 1}^m (1 - p_{i,j})^{X^t_i}. \end{displaymath} Defining $\Theta_{i,j} \defeq \log(1-p_{i,j})$, this can be rewritten as: \begin{equation}\label{eq:ic} \tag{IC} \P\big[X^{t+1}_j = 1\,|\, X^{t}\big] = 1 - \prod_{i = 1}^m e^{\Theta_{i,j}X^t_i} = 1 - e^{\inprod{\theta_j}{X^t}} \end{equation} where we defined $\theta_j\defeq (\Theta_{1,j},\ldots,\Theta_{m,j})$. \subsection{Linear Threshold Model} In the Linear Threshold Model, each node $j\in V$ has a threshold $t_j$ drawn uniformly from the interval $[0,1]$. Furthermore, there is a weight $\Theta_{i,j}\in[0,1]$ for each edge $(i,j)$. We assume that the weights are such that for each node $j\in V$, $\sum_{i=1}^m \Theta_{i,j} \leq 1$. Nodes can be either uninfected or active. At each time step, each uninfected node $j$ becomes active if the sum of the weights $\Theta_{i,j}$ for $i$ an active parent of $j$ is larger than $j$'s threshold $t_j$. Denoting by $X^t$ the indicator variable of the set of active nodes at time step $t-1$, if $j\in V$ is uninfected at time step $t-1$, then: \begin{equation} \label{eq:lt} \tag{LT} \P\big[X^{t+1}_j = 1\,|\, X^{t}\big] = \sum_{i=1}^m \Theta_{i,j}X^t_i = \inprod{\theta_j}{X^t} \end{equation} where we defined again $\theta_j\defeq (\Theta_{1,j},\ldots,\Theta_{m,j})$. \subsection{Maximum Likelihood Estimation} In both models, the unknown influence parameters are described by an $m\times m$ matrix $\Theta$. Furthermore, the information of the edges of $G$ is fully contained in $\Theta$: \begin{displaymath} (i,j)\in E\Leftrightarrow \Theta_{i,j} \neq 0 \end{displaymath} Given observations $(x^1,\ldots,x^n)$ of a cascade model, we can recover $\Theta$ via Maximum Likelihood Estimation (MLE). Denoting by $\mathcal{L}$ the log-likelihood function, we consider the following $\ell_1$-regularized MLE problem: \begin{displaymath} \hat{\Theta} \in \argmax_{\Theta} \mathcal{L}(\Theta\,|\,x^1,\ldots,x^n) + \lambda\|\Theta\|_1 \end{displaymath} where $\lambda$ is the regularization factor which helps at preventing overfitting as well as controlling for the sparsity of the solution. Note that both the (IC) and the (LT) models are decomposable in the following sense: the state evolution of each node at each time step happens independently of the following nodes \emph{and} the state evolution of node $j\in V$ only depends on $\theta_j$, the $j$-th column of $\Theta$. Consequently, the log-likelihood function $\mathcal{L}$ can be written as a sum of $m$ terms, each term $j\in\{1,\ldots,m\}$ only involving $\theta_j$. Since this is equally true for $\|\Theta\|_1$, each column $\theta_j$ of $\Theta$ can be estimated by a separate optimization program: \begin{equation}\label{eq:pre-mle} \hat{\theta}_j \in \argmax_{\theta} \mathcal{L}_j(\theta_j\,|\,x^1,\ldots,x^n) + \lambda\|\theta_j\|_1 \end{equation} Furthermore, the state evolution of a node $j\in V$ has the same structure in both models: the transition from uninfected to active at time step $t+1$ is controlled by a Bernoulli variable whose parameter can be written $f(\inprod{\theta_j}{x^t})$ for some function $f$. Hence, if we denote by $n_j$ the first step at which $j$ becomes active, we can rewrite the MLE program \eqref{eq:pre-mle}: \begin{multline} \hat{\theta}_j \in \argmax_{\theta} \sum_{t=1}^{n_j-1}\log\big(1-f(\inprod{\theta_j}{x^t})\big)\\ \log f(\inprod{\theta_j}{x^{n_j}})+ \lambda\|\theta_j\| \end{multline}