In this section, we exploit standard techniques in sparse recovery and leverage the simple nature of Generalized Linear models to address the standard problem of edge detection as well as the less frequently studied problem of coefficient estimation. We discuss both standard diffusion processes, and extend our analysis beyond sparse graphs to approximately sparse graphs.

\paragraph{Recovering Edges vs. Recovering Coefficients} 
Recovering the edges of the graph or estimating the parents of a node comes down to recovering the support (non-zero coefficients) of $\Theta$, a process known as {\it variable selection}. However, there have been a series of papers arguing that the Lasso is an inappropriate variable selection method (see H.Zou and T.Hastie, Sarah van de Geer ...). In fact, the irrepresentability condition (discussed in and was introduced in ) is essentially necessary for variable selection and rarely holds in practical situations where correlation between variable occurs. We defer an extended analysis of this situation to Section~\ref{sec:assumptions}.

Our approach is different. Rather than trying to perform variable selection by finding $\{j: \theta_j \neq 0\}$, we seek to upper-bound $\|\hat \theta - \theta^* \|_2$. We first apply standard techniques to obtain ${\cal O}(\sqrt{\frac{s \log m}{n}})$ in the case of sparse vectors, which is tight to a certain extent as we will show in Section ???. 

It is easy to see that `parent selection' is a direct consequence of the previous result: by thresholding $\hat \theta$, one recovers all `strong' parents without false positives, as shown in corollary~\ref{cor:variable_selection}. 

\paragraph{Main Theorem}
Interestingly, obtaining oracle inequalities depends on the following restricted eigenvalue condition\footnote{which is less restrictive? Irrepresentability implies compatibility? But do we have comptability and do they have irrepresentability?} for a symetric matrix $\Sigma$ and set ${\cal C} := \{X \in \mathbb{R}^p : \|X_{S^c}\|_1 \leq 3 \|X_S\|_1 \} \cap \{ \|X\|_1 \leq 1 \}$

\begin{equation}
\nonumber
\forall X \in {\cal C}, \| \Sigma X \|_2^2 \geq \gamma_n \|X\|_2^2 \qquad \text{\bf (RE)}
\end{equation}

We cite the following Theorem from \cite{Negahban:2009}

\begin{theorem}
\label{thm:neghaban}
Suppose the true vector $\theta^*$ has support S of size s and the {\bf(RE)} assumption holds for the Hessian $\nabla^2 f(\theta^*)$, then by solving \eqref{eq:pre-mle} for $\lambda_n \geq 2 \|\nabla f(\theta^*)\|_{\infty}$ we have:
\begin{equation}
\|\hat \theta - \theta^* \|_2 \leq 3 \frac{\sqrt{s}\lambda_n}{\gamma_n}
\end{equation}
\end{theorem}


\paragraph{Relaxing the Sparsity Constraint}

We can relax the sparsity constraint and express the upper-bound as a function of the best sparse approximation to $\theta^*$. This result is particularly relevant in cases where $\theta^*$ has few `strong' parents, and many `weak' parents, which is frequent in social networks. The results of section~\ref{subsec:icc} and section~\ref{subsec:ltm} that follow can be easily extended to the approximately-sparse case.

\begin{theorem}
\label{thm:approx_sparse}
Let $\theta_s^* \defeq \min_{\|\theta\|_0 \leq s} \|\theta - \theta^*\|_1$ be the best s-sparse approximation to the true vector $\theta^*$. Suppose the {\bf(RE)} assumption holds for the Hessian $\nabla^2 f(\theta^*)$ and for the following set 
\begin{align}
\nonumber
{\cal C}' \defeq & \{X \in \mathbb{R}^p : \|X_{S^c}\|_1 \leq 3 \|X_S\|_1 + 4 \|\theta_s^*\|_1 \} \\ \nonumber
& \cap \{ \|X\|_1 \leq 1 \}
\end{align}
By solving \eqref{eq:pre-mle} for $\lambda_n \geq 2 \|\nabla f(\theta^*)\|_{\infty}$ we have:
\begin{equation}
\|\hat \theta - \theta^* \|_2 \leq 3 \frac{\sqrt{s}\lambda_n}{\gamma_n} + 2 \sqrt{\frac{\lambda_n}{\gamma_n}} \|\theta^*_s\|_1
\end{equation}
\end{theorem}

As we can see, we pay a $\Omega \left(\sqrt{\frac{\lambda_n}{\gamma_n}} \|\theta^*_s\|_1 \right)$ price for not being exactly s-sparse, where $\|\theta^*_s\|_1$ is the sum of the $\|\theta^*\|_0 -s$ weakest coefficients of $\theta^*$.

\subsection{Independent Cascade Model}
\label{subsec:icc}
We analyse the previous conditions in the case of the Independent Cascade model. Lemma 1. provides a ${\cal O}(\sqrt{n})$-upper-bound w.h.p. on $\|\nabla f(\theta^*)\|$
\begin{lemma}
\label{lem:icc_lambda_upper_bound}
For any $\delta > 0$, with probability $1-e^{-n^\delta \log m}$, $\|\nabla f(\theta^*)\|_{\infty} \leq 2 \sqrt{\frac{n^{\delta + 1} \log m}{p_{\min}}}$
\end{lemma}

The following corollaries follows immediately from Theorem~\ref{thm:neghaban} and Lemma~\ref{lem:icc_lambda_upper_bound}.

\begin{corollary}
Assume that ${\bf (RE)}$ holds with $\gamma_n = n \gamma$ for $\gamma > 0$. Then for $\lambda_n := 2 \sqrt{\frac{n^{\delta + 1} \log m}{p_{\min}}}$ and with probability $1-e^{n^\delta \log p}$
\begin{equation}
\|\hat \theta - \theta^* \|_2 \leq \frac{2}{\gamma} \sqrt{\frac{s \log p}{p_{\min} n^{1-\delta}}}
\end{equation}
\end{corollary}

The following corollary follows trivially and gives the first $\Omega(s \log p)$ algorithm for graph reconstruction on general graphs.

\begin{corollary}
\label{cor:variable_selection}
Assume that ${\bf (RE)}$ holds with $\gamma_n = n \gamma$ for $\gamma > 0$. Suppose that after solving Eq.~\ref{eq:mle}, we construct the set $\hat {\cal S}_\eta := \{ j \in [1..p] : \hat \theta_j > \eta\}$ for $\eta > 0$. Suppose the number of measurements verifies:
\begin{equation}
n > \frac{4}{\gamma^2 p_{\min} \eta^2} s \log p
\end{equation}
Then with probability $1-e^{n^\delta \log p}$, $\hat {\cal S}_\eta = {\cal S}^*_{2\eta}$, where ${\cal S}^*_{2\eta} = \{ j \in [1..p] :\theta^*_j > 2 \eta \} $. In other words, we incur no false positives (false parents) and recover all `strong' parents such that $\theta^*_j > 2 \eta$. 
\end{corollary}

Note that $n$ is the number of measurements and not the number of cascades. This is an improvement over prior work since we expect several measurements per cascade. We claim that graph recovery is better understood as a function of $n$, the cumulative number of steps in each cascade, rather than as a function $N$, the number of individual cascades.


\subsection{Linear Threshold Model}
\label{subsec:ltm}