In this section, we apply the sparse recovery framework to analyze under which
assumptions our program~\eqref{eq:pre-mle} recovers the true parameter
$\theta_i$ of the cascade model. Furthermore, if we can estimate $\theta_i$ to
a sufficiently good accuracy, it is then possible to recover the support of
$\theta_i$ by simple thresholding, which provides a solution to the standard
Graph Inference problem.

We will first give results in the exactly sparse setting in which $\theta_i$
has a support of size exactly $s$. We will then relax this sparsity constraint
and give results in the \emph{stable recovery} setting where $\theta_i$ is
approximately $s$-sparse.

As mentioned in Section~\ref{sec:mle}, the maximum likelihood estimation
program is decomposable. We will henceforth focus on a single node $i\in V$ and
omit the subscript $i$ in the notations when there is no ambiguity. The
recovery problem is now the one of estimating a single vector $\theta^*$ from
a set $\mathcal{T}$ of observations. We will write $n\defeq |\mathcal{T}|$.

\subsection{Main Theorem}
\label{sec:main_theorem}

In this section, we analyze the case where $\theta^*$ is exactly sparse. We
write $S\defeq\text{supp}(\theta^*)$ and $s=|S|$. Recall, that $\theta_i$ is the vector of weights for all edges \emph{directed at} the node we are solving for. In other words, $S$ is the set of all nodes susceptible to influence our node $i$, also referred to as its parents. Our main theorem will rely on the now standard \emph{restricted eigenvalue condition}.

\begin{definition}
    Let $\Sigma\in\mathcal{S}_m(\reals)$ be a real symmetric matrix and $S$ be
    a subset of $\{1,\ldots,m\}$. Defining $\mathcal{C}(S)\defeq
    \{X\in\reals^m\,:\,\|X\|_1\leq 1\text{ and } \|X_{S^c}\|_1\leq
    3\|X_S\|_1\}$. We say that $\Sigma$ satisfies the
    $(S,\gamma)$-\emph{restricted eigenvalue condition} iff:
\begin{equation}
    \forall X \in {\cal C(S)}, \| \Sigma X \|_2^2 \geq \gamma \|X\|_2^2 
\tag{RE}
\label{eq:re}
\end{equation}
\end{definition}

A discussion of the $(S,\gamma)$-{\bf(RE)} assumption in the context of
generalized linear cascade models can be found in Section~\ref{sec:re}. In our
setting we will require that this property holds for the Hessian of the
log-likelihood function $\mathcal{L}$ and essentially captures that the binary
vectors of the set of active nodes are not \emph{too} collinear.

We will also need the following assumption on the inverse link function $f$ of the generalized linear cascade model:
\begin{equation}
    \tag{LF}
    \max\left(\left|\frac{f'}{f}(\inprod{\theta^*}{x})\right|,
    \left|\frac{f'}{1-f}(\inprod{\theta^*}{x})\right|\right)\leq
    \frac{1}{\alpha}
\end{equation}
for some $\alpha\in(0, 1)$ and for all $x\in\reals^m$ such that
$f(\inprod{\theta^*}{x})\notin\{0,1\}$.

In the voter model, $\frac{f'}{f}(z) = \frac{1}{z}$ and $\frac{f'}{f}(1-z)
= \frac{1}{1-z}$. Hence (LF) will hold as soon as $\alpha\leq \Theta_{i,j}\leq
1-\alpha$ for all $(i,j)\in E$ which is always satisfied for some $\alpha$.
Similarly, in the Independent Cascade Model, $\frac{f'}{f}(z)
= \frac{1}{1-e^{-z}}$ and $\frac{f'}{1-f}(z) = -1$ and (LF) is not restrictive.

\begin{comment}
Remember that in this case we have $\Theta_{i,j} = \log(1-p_{i,j})$. These
assumptions are reasonable: if an edge has a weight very close to 0, then the
``infection'' will never happen along that edge for our set of observations and
we can never hope to recover it.
\end{comment}

\begin{theorem}
\label{thm:main}
Assume the Hessian $\nabla^2\mathcal{L}(\theta^*)$ satisfies the
$(S,\gamma)$-{\bf(RE)} for some $\gamma > 0$ and that {\bf (LF)} holds for some
$\alpha > 0$. For any $\delta\in(0,1)$, let $\hat{\theta}$ be the solution of
\eqref{eq:pre-mle} with $\lambda \defeq 2\sqrt{\frac{\log m}{\alpha n^{1
- \delta}}}$, then:
\begin{equation}
    \label{eq:sparse}
    \|\hat \theta - \theta^* \|_2
    \leq \frac{3}{\gamma} \sqrt{\frac{s \log m}{\alpha n^{1-\delta}}}
\quad
\text{w.p.}\;1-\frac{1}{e^{n^\delta \log m}}
\end{equation}
\end{theorem}

Note that we have expressed the convergence rate in the number of measurements
$n$, which is different from the number of cascades. For example, in the case
of the voter model with horizon time $T$ and for $N$ cascades, we can expect
a number of measurements proportional to $N\times T$.

Before moving to the proof of Theorem~\ref{thm:main}, note that interpreting it
in the case of the Independent Cascade Model requires one more step. Indeed, to
cast it as a generalized linear cascade model, we had to perform the
transformation $\Theta_{i,j} = \log(1-p_{i,j})$, where $p_{i,j}$ are the
infection probabilities. Fortunately, if we estimate $p_j$ through $\hat{p}_{j}
= \hat{\theta}_j$, a bound on $\|\hat\theta - \theta^*\|_2$ directly implies
a bound on $\|\hat p - p^*\|$. Indeed we have:

\begin{lemma}
    $\|\hat{\theta} - \theta' \|_2 \geq \|\hat{p} - p^*\|_2$.
\end{lemma}
\begin{proof}
Using the inequality $\forall x>0, \; \log x \geq 1 - \frac{1}{x}$, we have
$|\log (1 - p) - \log (1-p')| \geq \max(1 - \frac{1-p}{1-p'},
1 - \frac{1-p'}{1-p}) \geq \max( p-p', p'-p)$.
\end{proof}

Theorem~\ref{thm:main} is a consequence of Theorem~1 in \cite{Negahban:2009}
which gives a bound on the convergence rate of regularized estimators. We state
their theorem in the context of $\ell_1$ regularization in
Lemma~\ref{lem:negahban}.

\begin{lemma}
    \label{lem:negahban}
Let ${\cal C}(S) \defeq \{ \Delta \in \mathbb{R}^m\,|\,\|\Delta_S\|_1 \leq
3 \|\Delta_{S^c}\|_1 \}$. Suppose that:
\begin{multline}
    \label{eq:rc}
    \forall \Delta \in {\cal C}(S), \;
    {\cal L}(\theta^* + \Delta) - {\cal L}(\theta^*)\\
    - \inprod{\Delta {\cal L}(\theta^*)}{\Delta}
    \geq \kappa_{\cal L} \|\Delta\|_2^2 - \tau_{\cal L}^2(\theta^*)
\end{multline}
for some $\kappa_{\cal L} > 0$ and function $\tau_{\cal L}$. Finally suppose
that $\lambda \geq 2 \|\nabla {\cal L}(\theta^*)\|_{\infty}$, then if
$\hat{\theta}_\lambda$ is the solution of \eqref{eq:pre-mle}:
\begin{displaymath}
\|\hat \theta_\lambda - \theta^* \|_2^2
\leq 9 \frac{\lambda^2 s}{\kappa_{\cal L}}
+ \frac{\lambda}{\kappa_{\cal L}^2} 2 \tau^2_{\cal L}(\theta^*)
\end{displaymath}
\end{lemma}

To prove Theorem~\ref{thm:main}, we apply Lemma~\ref{lem:negahban} with
$\tau_{\mathcal{L}}=0$. Since $\mathcal{L}$ is twice differentiable and convex,
assumption \eqref{eq:rc} is implied by the restricted eigenvalue condition
\eqref{eq:re}. The upper bound on the $\ell_{\infty}$ norm of
$\nabla\mathcal{L}(\theta^*)$ is given by Lemma~\ref{lem:ub}.

\begin{lemma}
\label{lem:ub}
Assume {\bf(LF)} holds for some $\alpha>0$. For any $\delta\in(0,1)$:
\begin{displaymath}
    \|\nabla {\cal L}(\theta^*)\|_{\infty}
    \leq 2 \sqrt{\frac{\log m}{\alpha n^{1 - \delta}}}
    \quad
    \text{w.p.}\; 1-\frac{1}{e^{n^\delta \log m}}
\end{displaymath}
\end{lemma}

\begin{proof}
The gradient of $\mathcal{L}$ is given by:
\begin{multline*}
    \nabla \mathcal{L}(\theta^*) =
    \frac{1}{|\mathcal{T}|}\sum_{t\in \mathcal{T}}x^t\bigg[
        x_i^{t+1}\frac{f'}{f}(\inprod{\theta^*}{x^t})\\
    - (1-x_i^{t+1})\frac{f'}{1-f}(\inprod{\theta^*}{x^t})\bigg]
\end{multline*}

Let $\partial_j \mathcal{L}(\theta)$ be the $j$-th coordinate of
$\nabla\mathcal{L}(\theta^*)$.  Writing:
$\partial_j\mathcal{L}(\theta^*)
= \frac{1}{|\mathcal{T}|}\sum_{t\in\mathcal{T}} Y_t$ and since
$\E[x_i^{t+1}|x^t]= f(\inprod{\theta^*}{x^t})$, we have that $\E[Y_{t+1}|Y_t]
= 0$. Hence $Z_t = \sum_{k=1}^t Y_k$ is a martingale.

Using assumption (LF), we have almost surely $|Z_{t+1}-Z_t|\leq \frac{1}{\alpha}$
and we can apply Azuma's inequality to $Z_t$:
\begin{displaymath}
    \P\big[|Z_{\mathcal{T}}|\geq \lambda\big]\leq
    2\exp\left(\frac{-\lambda^2\alpha}{2n}\right)
\end{displaymath}

Applying a union bound to have the previous inequality hold for all coordinates
of $\nabla\mathcal{L}(\theta)$ implies:
\begin{align*}
    \P\big[\|\nabla\mathcal{L}(\theta)\|_{\infty}\geq \lambda \big]
    &\leq 2m\exp\left(\frac{-\lambda^2n\alpha}{2}\right)
\end{align*}
Choosing $\lambda\defeq 2\sqrt{\frac{\log m}{\alpha n^{1-\delta}}}$ concludes the
proof.
\end{proof}

Note how the proof of Lemma~\ref{lem:ub} relied crucially on Azuma-Hoeffding's
inequality, which allows us to handle correlated observations. We now show how
to use Theorem~\ref{thm:main} to recover the support of $\theta^*$, that is, to
solve the Graph Inference problem.

\begin{corollary}
\label{cor:variable_selection}
Under the same assumptions of Theorem~\ref{thm:main}, let $\hat {\cal S}_\eta
\defeq \{ j \in \{1,\ldots, m\} : \hat{\theta}_j > \eta\}$ for $\eta > 0$. For
$\epsilon>0$ and $\epsilon < \eta$, let ${\cal S}^*_{\eta + \epsilon} \defeq \{
i \in \{1,\ldots,m\} :\theta_i > \eta +\epsilon \}$ be the set of all true `strong'
parents. Suppose the number of measurements verifies:
$ n > \frac{9s\log m}{\alpha\gamma^2\epsilon^2}$.
Then with probability $1-\frac{1}{m}$, ${\cal S}^*_{\eta + \epsilon} \subseteq
\hat {\cal S}_\eta \subseteq {\cal S}^*$. In other words we recover all `strong'
parents and no `false' parents. 
\end{corollary}

\begin{proof}
By choosing $\delta = 0$, if $ n > \frac{9s\log
m}{\alpha\gamma^2\epsilon^2}$, then $\|\hat \theta-\theta\|_2 < \epsilon < \eta$
with probability $1-\frac{1}{m}$. If $\theta_i = 0$ and $\hat \theta > \eta$,
then $\|\hat \theta - \theta\|_2 \geq |\hat \theta_i-\theta_j| > \eta$, which
is a contradiction. Therefore we get no false positives. If $\theta_i \leq \eta
+ \epsilon$, then $|\hat{\theta}_i- \theta_i| < \epsilon \implies \theta_j
> \eta$. Therefore, we get all strong parents.
\end{proof}

Assuming we know a lower bound $\alpha$ on $\Theta_{i,j}$,
Corollary~\ref{cor:variable_selection} can be applied to the Graph Inference
problem in the following manner: pick $\epsilon = \frac{\eta}{2}$ and $\eta
= \frac{\alpha}{3}$, then $S_{\eta+\epsilon}^* = S$ provided that
$n=\Omega\left(\frac{s\log m}{\alpha^3\gamma^2}\right)$. That is, the support
of $\theta^*$ can be found by thresholding $\hat{\theta}$ to the level $\eta$.

\subsection{Approximate Sparsity}
\label{sec:relaxing_sparsity}

In practice, exact sparsity is rarely verified. For social networks in
particular, it is more realistic to assume that each node has few ``strong''
parents' and many ``weak'' parents. In other words, even if $\theta^*$ is not
exactly $s$-sparse, it can be well approximated by $s$-sparse vectors.

Rather than obtaining an impossibility result, we show that the bounds obtained
in Section~\ref{sec:main_theorem} degrade gracefully in this setting. Formally,
let
$
    \theta^*_{\lfloor s \rfloor}
    \in \argmin_{\|\theta\|_0 \leq s} \|\theta - \theta^*\|_1
    $
be the best $s$-approximation to $\theta^*$. Then we pay a cost proportional
to $\|\theta^* - \theta^*_{\lfloor s\rfloor}\|_1$ for recovering the weights of
non-exactly sparse vectors. This cost is simply the ``tail'' of
$\theta^*$: the sum of the $m-s$ smallest coordinates of $\theta^*$.

In other words, the closer $\theta^*$ is to being sparse, the smaller the
price, and we recover the results of Section~\ref{sec:main_theorem} in the
limit of exact sparsity. These results are formalized in the following theorem,
which is also a consequence of Theorem 1 in \cite{Negahban:2009}.

\begin{theorem}
\label{thm:approx_sparse}
Suppose the relaxed {\bf(RE)} assumption holds for the Hessian $\nabla^2
f(\theta^*)$ and for the following set:
\begin{align}
\nonumber
{\cal C}' \defeq & \{X \in \mathbb{R}^p : \|X_{S^c}\|_1 \leq 3 \|X_S\|_1 + 4 \|\theta^*_{\lfloor s \rfloor}\|_1 \} \\ \nonumber
& \cap \{ \|X\|_1 \leq 1 \}
\end{align}
By solving \eqref{eq:pre-mle} for $\lambda \defeq 2\sqrt{\frac{\log m}{\alpha n^{1 - \delta}}}$ we have:
\begin{align*}
    \|\hat \theta - \theta^* \|_2 \leq
    \frac{3}{\gamma} \sqrt{\frac{s\log m}{\alpha n^{1-\delta}}}
    + \frac{2}{\gamma} \sqrt[4]{\frac{s\log m}{\alpha n^{1-\delta}}} \|\theta^* - \theta^*_{\lfloor s \rfloor}\|_1
\end{align*}
\end{theorem}

As before, edge recovery is a consequence of upper-bounding $\|\theta - \hat \theta\|_2$

\begin{corollary}
    Under the same assumptions as Theorem~\ref{thm:approx_sparse} the number of
    measurements verifies: \begin{equation}
        n > \frac{9}{\alpha\gamma^2\epsilon^2}\left(1+
        \frac{4}{\epsilon^2}\| \theta^* - \theta^*_{\lfloor
    s\rfloor}\|_1\right)s\log m
\end{equation}
then similarly: ${\cal S}^*_{\eta + \epsilon} \subset \hat {\cal S}_\eta
\subset {\cal S}^*$ w.p. at least $1-\frac{1}{m}$.
\end{corollary}


\subsection{Restricted Eigenvalue Condition}
\label{sec:re}

In this section, we discuss the main assumption of Theorem~\ref{thm:main}
namely the restricted eigenvalue condition. We then compare to the
irrepresentability condition considered in \cite{Daneshmand:2014}.

\paragraph{Interpretation}

There exists a large class of sufficient conditions under which sparse recovery
is achievable in the context of regularized estimation. A good survey of these
different assumptions can be found in \cite{vandegeer:2009}.

The restricted eigenvalue condition introduce in \cite{bickel:2009} is one of
the weakest such assumption. It can be interpreted as a restricted form of
non-degeneracy. Since we apply it to the Hessian of the log-likelihood function
$\nabla^2 \mathcal{L}(\theta)$, it essentially reduces to a form of restricted
strong convexity, that Lemma~\ref{lem:negahban} ultimately relies on.

In our case we have:
\begin{multline*}
    \nabla^2\mathcal{L}(\theta^*)
    = \frac{1}{|\mathcal{T}|}\sum_{t\in\mathcal{T}}x^t(x^t)^T
    \bigg[x_i^{t+1}\frac{f''f-f'^2}{f^2}(\inprod{\theta^*}{x^t})\\
    -(1-x_i^{t+1})\frac{f''(1-f) + f'^2}{(1-f)^2}(\inprod{\theta^*}{x^t})\bigg]
\end{multline*}
It is interesting to observe that the Hessian of $\mathcal{L}$ can be seen as
a re-weighted Gram matrix of the observations. In other words, the restricted
eigenvalue condition expresses that the observed set of active nodes are not
too collinear with each other.

In the specific case of ``logistic cascades'' (where $f$ is the logistic
function), the Hessian simplifies to $\nabla^2\mathcal{L}(\theta^*)
= \frac{1}{|\mathcal{T}|}XX^T$ where $X$ is the design matrix $[x^1 \ldots
x^\mathcal{|T|}]$. The restricted eigenvalue condition is equivalent in this
case to the assumption made in the Lasso analysis of \cite{bickel:2009}.

\paragraph{(RE) with high probability}

The Generalized Linear Cascade model yields a probability distribution over the
observed sets of infeceted nodes $(x^t)_{t\in\mathcal{T}}$. It is then natural
to ask whether the restricted eigenvalue condition is likely to occur under
this probabilistic model. Several recent papers show that large classes of
correlated designs obey the restricted eigenvalue property with high
probability \cite{raskutti:10, rudelson:13}. 

In our case, we can show that if (RE)-condition holds for the expected Hessian
matrix $\E[\nabla^2\mathcal{L}(\theta^*)]$, then it holds for the finite sample
Hessian matrix $\nabla^2\mathcal{L}(\theta^*)$ with high probability. Note that
the expected Hessian matrix is exactly the Fisher Information matrix of the
generalized linear cascade model which captures the amount of information about
$\theta$ conveyed by the random observations. Therefore, under an assumption
which only involves the probabilistic model and not the actual observations,
Proposition~\ref{prop:fi} allows us to draw the same conclusion as in
Theorem~\ref{thm:main}.

We will need the following additional assumptions on the inverse link function $f$:
\begin{equation}
    \tag{LF2}
    \|f'\|_{\infty} \leq M
    \text{ and }
    \max\left(\left|\frac{f''}{1-f}\right|,
    \left|\frac{f''}{f}\right|\right)
    \leq\frac{1}{\alpha}
\end{equation}
whenever $f(\inprod{\theta^*}{x})\notin\{0,1\}$.

\begin{proposition}
    \label{prop:fi}
    If $\E[\nabla^2\mathcal{L}(\theta^*)]$ verifies the $(S,\gamma)$-{\bf (RE)}
    condition and assuming {\bf (LF)} and {\bf (LF2)}, then for $\delta> 0$, if $n^{1-\delta}\geq
\frac{M+2}{21\gamma\alpha}s^2\log m}
    $, $\nabla^2\mathcal{L}(\theta^*)$ verifies the $(S,\frac{\gamma}{2})$-(RE)
    condition, w.p $\geq 1-e^{-n^\delta\log m}$.
\end{proposition}

\begin{proof}Writing $H\defeq \nabla^2\mathcal{L}(\theta^*)$, if
    $ \forall\Delta\in C(S),\;
        \|\E[H] - H]\|_\infty\leq \lambda $
    and $\E[H]$ verifies the $(S,\gamma)$-(RE)
    condition then:
    \begin{equation}
        \label{eq:foo}
    \forall \Delta\in C(S),\;
    \Delta H\Delta \geq 
    \Delta \E[H]\Delta(1-32s\lambda/\gamma) 
    \end{equation}
    Indeed, $
    |\Delta(H-E[H])\Delta| \leq 2\lambda \|\Delta\|_1^2\leq
    2\lambda(4\sqrt{s}\|\Delta_s\|_2)^2
    $.
    Writing
    $\partial^2_{i,j}\mathcal{L}(\theta^*)=\frac{1}{|\mathcal{T}|}\sum_{t\in
    T}Y_t$ and using $(LF)$ and $(LF2)$ we have $\big|Y_t - \E[Y_t]\big|\leq
    \frac{4(M+2)}{\alpha}$.
    Applying Azuma's inequality as in the proof of Lemma~\ref{lem:ub}, this
    implies:
    \begin{displaymath}
        \P\big[\|\E[H]-H\|_{\infty}\geq\lambda\big] \leq
        2\exp\left(-\frac{n\alpha\lambda^2}{4(M+2)} + 2\log m\right)
    \end{displaymath}
    Thus, if we take $\lambda=\sqrt{\frac{12(M+2)\log m}{\alpha
    n^{1-\delta}}}$, $\|E[H]-H\|_{\infty}\leq\lambda$ w.p at least
    $1-e^{-n^{\delta}\log m}$. When $n^{1-\delta}\geq
    \frac{M+2}{21\gamma\alpha}s^2\log m$, \eqref{eq:foo} implies
    $
    \forall \Delta\in C(S),\;
    \Delta H\Delta \geq \frac{1}{2} \Delta \E[H]\Delta,
    $ w.p. at least $1-e^{-n^{\delta}\log m}$ and the conclusion of
    Proposition~\ref{prop:fi} follows.
\end{proof}

Observe that the number of measurements required in Proposition~\ref{prop:fi}
is now quadratic in $s$. If we only keep the first measurement from each
cascade which are independent, we can apply Theorem 1.8 from
\cite{rudelson:13}, lowering the number of required cascades to $s\log(s\log^3
m)$.

\paragraph{(RE) vs Irrepresentability Condition}

\cite{Daneshmand:2014} rely on an `incoherence' condition on the hessian of the
likelihood function. Their condition is equivalent to the more commonly called
{\it (S,s)-irrepresentability} condition:

\begin{comment}
\begin{definition}
Following similar notation, let $Q^* \defeq \nabla^2 f(\theta^*)$. Let $S$ and $S^c$ be the set of indices of all the parents and non-parents respectively and $Q_{S,S}$, $Q_{S^c,S}$, $Q_{S, S^c}$, and $Q_{S^c, S^c}$ the induced sub-matrices. Consider the following constant:
\begin{equation}
\nu_{\text{irrepresentable}}(S) \defeq \max_{\tau \in \mathbb{R}^p \ :\ \| \tau \|_{\infty} \leq 1} \|Q_{S^c, S} Q_{S, S}^{-1} \tau\|_{\infty}
\end{equation}
The (S,s)-irrepresentability holds if $\nu_{\text{irrepresentable}}(S) < 1 - \epsilon$ for $\epsilon > 0$
\end{definition}
\end{comment}

It is possible to construct examples where the (RE) condition holds but not the
irrepresentability condition \cite{vandegeer:2009}. Theorem 9.1 from the same
paper shows that a `strong' irrepresentability condition directly {\it implies}
the {\bf(RE)} condition for $\ell_2$-recovery.

\begin{comment}
\begin{proposition}
\label{prop:irrepresentability}
If the irrepresentability condition holds with $\epsilon > \frac{2}{3}$, then the restricted eigenvalue condition holds with constant  $\gamma_n \geq \frac{ (1 - 3(1 -\epsilon))^2 \lambda_{\min}^2}{4s}n$, where $\lambda_{\min} > 0$ is the smallest eigenvalue of $Q^*_{S,S}$, on which the results of \cite{Daneshmand:2014} also depend.
\end{proposition}
\end{comment}

\begin{comment}
Furthermore, recent papers \cite{vandegeer:2011}, \cite{Zou:2006}, argue that
the irrepresentability condition is unrealistic in situations where there is
a correlation between variables. Consider the following simplified example from
\cite{vandegeer:2011}:
\begin{equation}
\nonumber
\left( 
\begin{array}{cccc}
I_s & \rho J \\
\rho J & I_s \\
\end{array}
\right)
\end{equation}
where $I_s$ is the $s \times s$ identity matrix, $J$ is the all-ones matrix and $\rho \in \mathbb{R}^+$. It is easy to see that $\nu_{\text{irrepresentable}}(S) = \rho s$ and $\lambda_{\min}(Q) \geq 1 - \rho$, such that for any $\rho > \frac{1}{s}$ and $\rho < 1$, the restricted eigenvalue holds trivially but the (S,s)-irrepresentability does not hold.


\begin{lemma}
Let ${\cal C}({\cal M}, \bar {\cal M}^\perp, \theta^*) \defeq \{ \Delta \in \mathbb{R}^p | {\cal R}(\Delta_{\bar {\cal M}^\perp} \leq 3 {\cal R}(\Delta_{\bar {\cal M}} + 4 {\cal R}(\theta^*_{{\cal M}^\perp}) \}$, where $\cal R$ is a \emph{decomposable} regularizer with respect to $({\cal M}, \bar {\cal M}^\perp)$, and $({\cal M}, \bar {\cal M})$ are two subspaces such that  ${\cal M} \subseteq \bar {\cal M}$. Suppose that $\exists \kappa_{\cal L} > 0, \; \exists \tau_{\cal L}, \; \forall \Delta \in {\cal C}, \; {\cal L}(\theta^* + \Delta) - {\cal L}(\theta^*) - \langle \Delta {\cal L}(\theta^*), \Delta \rangle \geq \kappa_{\cal L} \|\Delta\|^2 - \tau_{\cal L}^2(\theta^*)$. Let $\Psi({\cal M}) \defeq \sup_{u \in {\cal M} \backslash \{0\}} \frac{{\cal R}(u)}{\|u\|}$. Finally suppose that $\lambda \geq 2 {\cal R}(\nabla {\cal L}(\theta^*))$, where ${\cal R}^*$ is the conjugate of ${\cal R}$. Then: $$\|\hat \theta_\lambda - \theta^* \|^2 \leq 9 \frac{\lambda^2}{\kappa_{\cal L}}\Psi^2(\bar {\cal M}) + \frac{\lambda}{\kappa_{\cal L}}\{2 \tau^2_{\cal L}(\theta^*) + 4 {\cal R}(\theta^*_{{\cal M}^\perp}\}$$
\end{lemma}

\subsection{The Independent Cascade Model}

Recall that to cast the Independent Cascade model as a Generalized Linear
Cascade, we performed the following change of variables: $\forall i,j
\ \Theta_{i,j} = \log(1 - p_{i,j})$. The previous results hold \emph{a priori}
for the ``effective'' parameter $\Theta$. The following lemma shows they also
hold for the original infection probabilities $p_{i,j}$:

The results of sections~\ref{sec:main_theorem} and \ref{sec:relaxing_sparsity}
therefore hold for the original transition probabilities $p$.
\end{comment}