2 files changed, 26 insertions, 3 deletions
diff --git a/paper/sections/assumptions.tex b/paper/sections/assumptions.tex
new file mode 100644
index 0000000..eccd095
--- /dev/null
+++ b/paper/sections/assumptions.tex
@@ -0,0 +1,6 @@
+\subsection{The Restricted Eigenvalue Condition}
+
+Proving the restricted eigenvalue assumption for correlated measurements is non-trivial. Under reasonable assumptions on the graph parameters, we can show a very crude ${\cal O}(N)$-lower bound for $\gamma_n$ by exploiting only the first set of measurements, where only the source nodes are active. Note that even though we waste a lot of information, we obtain similar asymptotic behavior than previous work.
+
+\subsection{The Irrepresentability Condition}
+
diff --git a/paper/sections/results.tex b/paper/sections/results.tex
index c26c499..8787241 100644
--- a/paper/sections/results.tex
+++ b/paper/sections/results.tex
@@ -1,7 +1,7 @@
-In this section, we exploit standard techniques in sparse recovery, and leveraging the simple nature of Generalized Linear models to address the standard problem of edge detection as well as the (new?) problem of coefficient estimation. We discuss both standard diffusion processes, and extend our analysis beyond sparse graphs to approximately sparse graphs.
+In this section, we exploit standard techniques in sparse recovery and leverage the simple nature of Generalized Linear models to address the standard problem of edge detection as well as the less frequently studied problem of coefficient estimation. We discuss both standard diffusion processes, and extend our analysis beyond sparse graphs to approximately sparse graphs.
 
 \paragraph{Recovering Edges vs. Recovering Coefficients} 
-There have been a series of papers arguing that the Lasso is an inappropriate variable selection method (see H.Zou and T.Hastie, Sarah van de Geer ...). In fact, the irrepresentability condition, though essentially necessary for variable selection, rarely holds in practical situations where correlation between variable occurs. We defer an extended analysis of this situation to Section~\ref{sec:assumptions}.
+Recovering the edges of the graph or estimating the parents of a node comes down to recovering the support (non-zero coefficients) of $\Theta$, a process known as {\it variable selection}. However, there have been a series of papers arguing that the Lasso is an inappropriate variable selection method (see H.Zou and T.Hastie, Sarah van de Geer ...). In fact, the irrepresentability condition (discussed in and was introduced in ) is essentially necessary for variable selection and rarely holds in practical situations where correlation between variable occurs. We defer an extended analysis of this situation to Section~\ref{sec:assumptions}.
 
 Our approach is different. Rather than trying to perform variable selection by finding $\{j: \theta_j \neq 0\}$, we seek to upper-bound $\|\hat \theta - \theta^* \|_2$. We first apply standard techniques to obtain ${\cal O}(\sqrt{\frac{s \log m}{n}})$ in the case of sparse vectors, which is tight to a certain extent as we will show in Section ???. 
 
@@ -19,16 +19,33 @@ We cite the following Theorem from \cite{Negahban:2009}
 
 \begin{theorem}
 \label{thm:neghaban}
-Suppose the true vector $\theta^*$ has support S of size s and the {\bf(RE)} assumption holds for the Hessian $\nabla^2 f(\theta^*)$, then by solving Eq.~\ref{eq:mle} for $\lambda_n \geq 2 \|\nabla f(\theta^*)\|_{\infty}$ we have:
+Suppose the true vector $\theta^*$ has support S of size s and the {\bf(RE)} assumption holds for the Hessian $\nabla^2 f(\theta^*)$, then by solving \eqref{eq:pre-mle} for $\lambda_n \geq 2 \|\nabla f(\theta^*)\|_{\infty}$ we have:
 \begin{equation}
 \|\hat \theta - \theta^* \|_2 \leq 3 \frac{\sqrt{s}\lambda_n}{\gamma_n}
 \end{equation}
 \end{theorem}
 
+
 \paragraph{Relaxing the Sparsity Constraint}
 
 We can relax the sparsity constraint and express the upper-bound as a function of the best sparse approximation to $\theta^*$. This result is particularly relevant in cases where $\theta^*$ has few `strong' parents, and many `weak' parents, which is frequent in social networks. The results of section~\ref{subsec:icc} and section~\ref{subsec:ltm} that follow can be easily extended to the approximately-sparse case.
 
+\begin{theorem}
+\label{thm:approx_sparse}
+Let $\theta_s^* \defeq \min_{\|\theta\|_0 \leq s} \|\theta - \theta^*\|_1$ be the best s-sparse approximation to the true vector $\theta^*$. Suppose the {\bf(RE)} assumption holds for the Hessian $\nabla^2 f(\theta^*)$ and for the following set 
+\begin{align}
+\nonumber
+{\cal C}' \defeq & \{X \in \mathbb{R}^p : \|X_{S^c}\|_1 \leq 3 \|X_S\|_1 + 4 \|\theta_s^*\|_1 \} \\ \nonumber
+& \cap \{ \|X\|_1 \leq 1 \}
+\end{align}
+By solving \eqref{eq:pre-mle} for $\lambda_n \geq 2 \|\nabla f(\theta^*)\|_{\infty}$ we have:
+\begin{equation}
+\|\hat \theta - \theta^* \|_2 \leq 3 \frac{\sqrt{s}\lambda_n}{\gamma_n} + 2 \sqrt{\frac{\lambda_n}{\gamma_n}} \|\theta^*_s\|_1
+\end{equation}
+\end{theorem}
+
+As we can see, we pay a $\Omega \left(\sqrt{\frac{\lambda_n}{\gamma_n}} \|\theta^*_s\|_1 \right)$ price for not being exactly s-sparse, where $\|\theta^*_s\|_1$ is the sum of the $\|\theta^*\|_0 -s$ weakest coefficients of $\theta^*$.
+
 \subsection{Independent Cascade Model}
 \label{subsec:icc}
 We analyse the previous conditions in the case of the Independent Cascade model. Lemma 1. provides a ${\cal O}(\sqrt{n})$-upper-bound w.h.p. on $\|\nabla f(\theta^*)\|$