We consider a graph ${\cal G}= (V, E, \Theta)$, where $\Theta$ is a $|V|\times |V|$ matrix of parameters describing the edge weights of $\mathcal{G}$. Intuitively, $\Theta_{i,j}$ captures the ``influence'' of node $i$ on node $j$. Let $m\defeq |V|$. For each node $j$, let $\theta_{j}$ be the $j^{th}$ column vector of $\Theta$. A discrete-time \emph{Cascade model} is a Markov process over a finite state space ${\{0, 1, \dots, K-1\}}^V$ with the following properties: \begin{enumerate} \item Conditioned on the previous time step, the transition events between two states in $\{0,1,\dots, K-1\}$ for each $i \in V$ are mutually independent across $i\in V$. \item Of the $K$ possible states, there exists a \emph{contagious state} such that all transition probabilities of the Markov process can be expressed as a function of the graph parameters $\Theta$ and the set of ``contagious nodes'' at the previous time step. \item The initial probability over ${\{0, 1, \dots, K-1\}}^V$ is such that all nodes can eventually reach a \emph{contagious state} with non-zero probability. The ``contagious'' nodes at $t=0$ are called \emph{source nodes}. \end{enumerate} In other words, a cascade model describes a diffusion process where a set of contagious nodes ``influence'' other nodes in the graph to become contagious. An \emph{influence cascade} is a realisation of this random process, \emph{i.e.} the successive states of the nodes in graph ${\cal G}$. Note that both the ``single source'' assumption made in~\cite{Daneshmand:2014} and \cite{Abrahao:13} as well as the ``uniformly chosen source set'' assumption made in~\cite{Netrapalli:2012} verify condition 3. Also note that the multiple-source node assumption does not reduce to the single-source assumption, even under the assumption that cascades do not overlap. Imagining for example two cascades starting from two different nodes; since we do not observe which node propagated the contagion to which node, we cannot attribute an infected node to either cascade and treat the problem as two independent cascades. In the context of Network Inference,~\cite{Netrapalli:2012} focus on the well-known discrete-time independent cascade model recalled below, which \cite{Abrahao:13} and~\cite{Daneshmand:2014} generalize to continuous time. We extend the independent cascade model in a different direction by considering a more general class of transition probabilities while staying in the discrete-time setting. We observe that despite their obvious differences, both the independent cascade and the voter models make the network inference problem similar to the standard generalized linear model inference problem. In fact, we define a class of diffusion processes for which this is true: the \emph{Generalized Linear Cascade Models}. The linear threshold model is a special case and is discussed in Section~\ref{sec:linear_threshold}. \subsection{Generalized Linear Cascade Models} \label{sec:GLC} Let \emph{susceptible} denote any state which can become contagious at the next time step with a non-zero probability. We draw inspiration from generalized linear models to introduce Generalized Linear Cascades: \begin{definition} \label{def:glcm} Let $X^t$ be the indicator variable of ``contagious nodes'' at time step $t$. A \emph{generalized linear cascade model} is a cascade model such that for each susceptible node $j$ in state $s$ at time step $t$, the probability of $j$ becoming ``contagious'' at time step $t+1$ conditioned on $X^t$ is a Bernoulli variable of parameter $f(\theta_j \cdot X^t)$: \begin{equation} \label{eq:glm} \mathbb{P}(X^{t+1}_j = 1|X^t) = f(\theta_j \cdot X^t) \end{equation} where $f: \reals \rightarrow [0,1]$ \end{definition} In other words, each generalized linear cascade provides, for each node $j \in V$ a series of measurements ${(X^t, X^{t+1}_j)}_{t \in {\cal T}_j}$ sampled from a generalized linear model. Note also that $\E[X^{t+1}_i\,|\,X^t] = f(\inprod{\theta_i}{X^t})$. As such, $f$ can be interpreted as the inverse link function of our generalized linear cascade model. % Denoting by $X^t$ the state of the cascade at time step $t$, we interpret % $X^t_j = 1$ for node $j\in V$ as ``node $j$ is contagious'', \emph{i.e}, ``$j$ % exhibits the source nodes' behavior at time step $t$''. We draw inspiration % from \emph{generalized linear models} (GLM) to define a generalized linear % cascade. % \begin{definition} % \label{def:glcm} % Let us denote by $\{\mathcal{F}_t, t\in\ints\}$ the natural % filtration induced by $\{X_t, t\in\ints\}$. A \emph{generalized linear % cascade} is characterized by the following equation: % \begin{displaymath} % \P[X^{t+1}=x\,|\, \mathcal{F}_t] = % \prod_{i=1}^m f(\inprod{\theta_i}{X^{t}})^{x_i} % \big(1-f(\inprod{\theta_i}{X^{t}}\big)^{1-x_i} % \end{displaymath} % where $f:\mathbb{R}\to[0,1]$ and $\theta_i$ is the $i$-th column of $\Theta$. % \end{definition} % It follows immediately from this definition that a generalized linear cascade % satisfies the Markov property: % \begin{displaymath} % \P[X^{t+1}=x\,|\mathcal{F}_t] = \P[X^{t+1}=x\,|\, X^t] % \end{displaymath} % Note also that $\E[X^{t+1}_i\,|\,X^t] = f(\inprod{\theta_i}{X^t})$. As such, % $f$ can be interpreted as the inverse link function of our generalized linear % cascade model. \subsection{Examples} \subsubsection{Independent Cascade Model} In the independent cascade model, nodes can be either susceptible, contagious or immune. At $t=0$, all source nodes are ``contagious'' and all remaining nodes are ``susceptible''. At each time step $t$, for each edge $(i,j)$ where $j$ is susceptible and $i$ is contagious, $i$ attempts to infect $j$ with probability $p_{i,j}\in[0,1]$; the infection attempts are mutually independent. If $i$ succeeds, $j$ will become contagious at time step $t+1$. Regardless of $i$'s success, node $i$ will be immune at time $t+1$, such that nodes stay contagious for only one time step. The cascade process terminates when no contagious nodes remain. If we denote by $X^t$ the indicator variable of the set of contagious nodes at time step $t$, then if $j$ is susceptible at time step $t+1$, we have: \begin{displaymath} \P\big[X^{t+1}_j = 1\,|\, X^{t}\big] = 1 - \prod_{i = 1}^m {(1 - p_{i,j})}^{X^t_i}. \end{displaymath} Defining $\Theta_{i,j} \defeq \log(\frac{1}{1-p_{i,j}})$, this can be rewritten as: \begin{align*}\label{eq:ic} \tag{IC} \P\big[X^{t+1}_j = 1\,|\, X^{t}\big] &= 1 - \prod_{i = 1}^m e^{-\Theta_{i,j}X^t_i}\\ &= 1 - e^{-\inprod{\Theta_j}{X^t}} \end{align*} Therefore, the independent cascade model is a Generalized Linear Cascade model with inverse link function $f : z \mapsto 1 - e^{-z}$. Note that to write the Independent Cascade Model as a Generalized Linear Cascade Model, we had to introduce the change of variable $\Theta_{i,j} = \log(\frac{1}{1-p_{i,j}})$. The recovery results in Section~\ref{sec:results} pertain to the $\Theta_j$ parameters. Fortunately, the following lemma shows that the recovery error on $\Theta_j$ is an upper bound on the error on the original $p_j$ parameters. \begin{lemma} \label{lem:transform} $\|\hat{\theta} - \theta^* \|_2 \geq \|\hat{p} - p^*\|_2$. \end{lemma} \subsubsection{The Linear Voter Model} In the Linear Voter Model, nodes can be either \emph{red} or \emph{blue}. Without loss of generality, we can suppose that the \emph{blue} nodes are contagious. The parameters of the graph are normalized such that $\forall i, \ \sum_j \Theta_{i,j} = 1$. Each round, every node $j$ independently chooses one of its neighbors with probability $\Theta_{i,j}$ and adopts their color. The cascades stops at a fixed horizon time $T$ or if all nodes are of the same color. If we denote by $X^t$ the indicator variable of the set of blue nodes at time step $t$, then we have: \begin{equation} \mathbb{P}\left[X^{t+1}_j = 1 | X^t \right] = \sum_{i=1}^m \Theta_{i,j} X_i^t = \inprod{\Theta_j}{X^t} \tag{V} \end{equation} Thus, the linear voter model is a Generalized Linear Cascade model with inverse link function $f: z \mapsto z$. \subsubsection{Discretization of Continuous Model} Another motivation for the Generalized Linear Cascade model is that it captures the time-discretized formulation of the well-studied continuous-time independent cascade model with exponential transmission function (CICE) of~\cite{GomezRodriguez:2010, Abrahao:13, Daneshmand:2014}. Assume that the temporal resolution of the discretization is $\varepsilon$, \emph{i.e.} all nodes whose (continuous) infection time is within the interval $[k\varepsilon, (k+1)\varepsilon)$ are considered infected at (discrete) time step $k$. Let $X^k$ be the indicator vector of the set of nodes `infected' before or during the $k^{th}$ time interval. Note that contrary to the discrete-time independent cascade model, $X^k_j = 1 \implies X^{k+1}_j = 1$, that is, there is no immune state and nodes remain contagious forever. Let $\text{Exp}(p)$ be an exponentially-distributed random variable of parameter $p$ and let $\Theta_{i,j}$ be the rate of transmission along directed edge $(i,j)$ in the CICE model. By the memoryless property of the exponential, if $X^k_j \neq 1$: \begin{multline*} \mathbb{P}(X^{k+1}_j = 1 | X^k) = \mathbb{P}(\min_{i \in {\cal N}(j)} \text{Exp}(\Theta_{i,j}) \leq \epsilon) \\ = \mathbb{P}(\text{Exp}( \sum_{i=1}^m \Theta_{i,j} X^t_i) \leq \epsilon) = 1 - e^{- \epsilon \Theta_j \cdot X^t} \end{multline*} Therefore, the $\epsilon$-discretized CICE-induced process is a Generalized Linear Cascade model with inverse link function $f:z\mapsto 1-e^{-\epsilon\cdot z}$. \subsubsection{Logistic Cascades} \label{sec:logistic_cascades} ``Logistic cascades'' is the specific case where the inverse link function is given by the logistic function $f(z) = 1/(1+e^{-z + t})$. Intuitively, this captures the idea that there is a threshold $t$ such that when the sum of the parameters of the infected parents of a node is larger than the threshold, the probability of getting infected is close to one. This is a smooth approximation of the hard threshold rule of the Linear Threshold Model~\cite{Kempe:03}. As we will see later in the analysis, for logistic cascades, the graph inference problem becomes a linear inverse problem. % \subsection{The Linear Threshold Model} % In the Linear Threshold Model, each node $j\in V$ has a threshold $t_j$ from % the interval $[0,1]$ and for each node, the sum of incoming weights is less % than $1$: $\forall j\in V$, $\sum_{i=1}^m \Theta_{i,j} \leq 1$. % Nodes have two states: susceptible or contagious. At each time step, each % susceptible node $j$ becomes contagious if the sum of the incoming weights % from contagious parents is greater than $j$'s threshold $t_j$. Nodes remain % contagious until the end of the cascade, which is reached when no new % susceptible nodes become contagious. % As such, the source nodes are chosen, the process is entirely deterministic. % Let $X^t$ be the indicator variable of the set of contagious nodes at time % step $t-1$, then: \begin{equation} \nonumber X^{t+1}_j = % \mathbbm{1}_{\sum_{i=1}^m \Theta_{i,j}X^t_i > t_j} = % \mathbbm{1}_{\inprod{\theta_j}{X^t} > t_j} \end{equation} where we defined % again $\theta_j\defeq (\Theta_{1,j},\ldots,\Theta_{m,j})$. In other words, for % every step in the linear threshold model, we observe the following signal: % \begin{equation} \label{eq:lt} \tag{LT} \mathbb{P} \left[X^{t+1}_j = 1 | % X^t\right] = \text{sign} \left(\inprod{\theta_j}{X^t} - t_j \right) % \end{equation} where ``sign'' is the function $\mathbbm{1}_{\cdot > 0}$. \subsection{Maximum Likelihood Estimation} \label{sec:mle} \begin{figure} \includegraphics[scale=.4]{figures/drawing.pdf} \caption{Illustration of the sparse-recovery approach. Our objective is to recover the unknown weight vector $\theta_j$ for each node $j$. We observe a Bernoulli realization whose parameters are given by applying $f$ to the matrix-vector product, where the measurement matrix encodes which nodes are ``contagious'' at each time step.} \vspace{-1em} \end{figure} Inferring the model parameter $\Theta$ from observed influence cascades is the central question of the present work. Recovering the edges in $E$ from observed influence cascades is a well-identified problem known as the \emph{Network Inference} problem. However, recovering the influence parameters is no less important. In this work we focus on recovering $\Theta$, noting that the set of edges $E$ can then be recovered through the following equivalence: $(i,j)\in E\Leftrightarrow \Theta_{i,j} \neq 0$ Given observations $(x^1,\ldots,x^n)$ of a cascade model, we can recover $\Theta$ via Maximum Likelihood Estimation (MLE). Denoting by $\mathcal{L}$ the log-likelihood function, we consider the following $\ell_1$-regularized MLE problem: \begin{displaymath} \hat{\Theta} \in \argmax_{\Theta} \frac{1}{n} \mathcal{L}(\Theta\,|\,x^1,\ldots,x^n) - \lambda\|\Theta\|_1 \end{displaymath} where $\lambda$ is the regularization factor which helps prevent overfitting and controls the sparsity of the solution. The generalized linear cascade model is decomposable in the following sense: given Definition~\ref{def:glcm}, the log-likelihood can be written as the sum of $m$ terms, each term $i\in\{1,\ldots,m\}$ only depending on $\theta_i$. Since this is equally true for $\|\Theta\|_1$, each column $\theta_i$ of $\Theta$ can be estimated by a separate optimization program: \begin{equation}\label{eq:pre-mle} \hat{\theta}_i \in \argmax_{\theta} \mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n) - \lambda\|\theta_i\|_1 \end{equation} where we denote by ${\cal T}_i$ the time steps at which node $i$ is susceptible and: \begin{multline*} \mathcal{L}_i(\theta_i\,|\,x^1,\ldots,x^n) = \frac{1}{|{\cal T}_i|} \sum_{t\in {\cal T}_i } x_i^{t+1}\log f(\inprod{\theta_i}{x^{t}}) \\ + (1 - x_i^{t+1})\log\big(1-f(\inprod{\theta_i}{x^t})\big) \end{multline*} In the case of the voter model, the measurements include all time steps until we reach the time horizon $T$ or the graph coalesces to a single state. For the independent cascade model, the measurements include all time steps until node $i$ becomes contagious, after which its behavior is deterministic. Contrary to prior work, our results depend on the number of measurements and not the number of cascades. \paragraph{Regularity assumptions} To solve program~\eqref{eq:pre-mle} efficiently, we would like it to be convex. A sufficient condition is to assume that $\mathcal{L}_i$ is concave, which is the case if $f$ and $(1-f)$ are both log-concave. Remember that a twice-differentiable function $f$ is log-concave iff. $f''f \leq f'^2$. It is easy to verify this property for $f$ and $(1-f)$ in the Independent Cascade Model and Voter Model. Furthermore, the data-dependent bounds in Section~\ref{sec:main_theorem} will require the following regularity assumption on the inverse link function $f$: there exists $\alpha\in(0,1)$ such that \begin{equation} \tag{LF} \max \big\{ | (\log f)'(z_x) |, |(\log (1-f))'(z_x) | \big\} \leq \frac{1}{\alpha} \end{equation} for all $z_x\defeq\inprod{\theta^*}{x}$ such that $f(z_x)\notin\{0,1\}$. In the voter model, $\frac{f'(z)}{f(z)} = \frac{1}{z}$ and $\frac{f'(z)}{(1-f)(z)} = \frac{1}{1-z}$. Hence (LF) will hold as soon as $\alpha\leq \Theta_{i,j}\leq 1-\alpha$ for all $(i,j)\in E$ which is always satisfied for some $\alpha$ for non-isolated nodes. In the Independent Cascade Model, $\frac{f'(z)}{f(z)} = \frac{1}{e^{z}-1}$ and $\frac{f'(z)}{(1-f)(z)} = 1$. Hence (LF) holds as soon as $p_{i,j}\geq \alpha$ for all $(i,j)\in E$ which is always satisfied for some $\alpha\in(0,1)$. For the data-independent bound of Proposition~\ref{prop:fi}, we will require the following additional regularity assumption: \begin{equation} \tag{LF2} \max \big\{ | (\log f)''(z_x) |, |(\log (1-f))''(z_x) | \big\} \leq \frac{1}{\alpha} \end{equation} for some $\alpha\in(0, 1)$ and for all $z_x\defeq\inprod{\theta^*}{x}$ such that $f(z_x)\notin\{0,1\}$. It is again easy to see that this condition is verified for the Independent Cascade Model and the Voter model for the same $\alpha\in(0,1)$. \paragraph{Convex constraints} The voter model is only defined when $\Theta_{i,j}\in (0,1)$ for all $(i,j)\in E$. Similarly the independent cascade model is only defined when $\Theta_{i,j}> 0$. Because the likelihood function $\mathcal{L}_i$ is equal to $-\infty$ when the parameters are outside of the domain of definition of the models, these contraints do not need to appear explicitly in the optimization program. In the specific case of the voter model, the constraint $\sum_j \Theta_{i,j} = 1$ will not necessarily be verified by the estimator obtained in \eqref{eq:pre-mle}. In some applications, the experimenter might not need this constraint to be verified, in which case the results in Section~\ref{sec:results} still give a bound on the recovery error. If this constraint needs to be satisfied, then by Lagrangian duality, there exists a $\lambda\in \reals$ such that adding $\lambda\big(\sum_{j}\theta_j - 1\big)$ to the objective function of~\eqref{eq:pre-mle} enforces the constraint. Then, it suffices to apply the results of Section~\ref{sec:results} to the augmented objective to obtain the same recovery guarantees. Note that the added term is linear and will easily satisfy all the required regularity assumptions.