1 files changed, 36 insertions, 39 deletions
diff --git a/paper/sections/results.tex b/paper/sections/results.tex
index a0d3159..f68ecee 100644
--- a/paper/sections/results.tex
+++ b/paper/sections/results.tex
@@ -42,18 +42,23 @@ by~\cite{bickel2009simultaneous}.
 
 A discussion of the $(S,\gamma)$-{\bf(RE)} assumption in the context of
 generalized linear cascade models can be found in Section~\ref{sec:re}. In our
-setting we require that the (RE)-condition holds for the Hessian of the
+setting we require that the {\bf(RE)}-condition holds for the Hessian of the
 log-likelihood function $\mathcal{L}$: it essentially captures the fact that the
 binary vectors of the set of active nodes are not \emph{too} collinear.
 
 {\color{red} Rewrite the minimal assumptions necessary}
 We will also need the following assumption on the inverse link function $f$ of
 the generalized linear cascade model:
+%\begin{equation}
+    %\tag{LF}
+    %\max\left(\left|\frac{f'}{f}(\inprod{\theta^*}{x})\right|,
+    %\left|\frac{f'}{1-f}(\inprod{\theta^*}{x})\right|\right)\leq
+    %\frac{1}{\alpha}
+%\end{equation}
 \begin{equation}
-    \tag{LF}
-    \max\left(\left|\frac{f'}{f}(\inprod{\theta^*}{x})\right|,
-    \left|\frac{f'}{1-f}(\inprod{\theta^*}{x})\right|\right)\leq
-    \frac{1}{\alpha}
+  \tag{LF}
+  \max \left( \| (\log f)' \|_{\infty}, \|(\log (1-f))' \|_{\infty} \right)
+  \leq \frac{1}{\alpha}
 \end{equation}
 for some $\alpha\in(0, 1)$ and for all $x\in\reals^m$ such that
 $f(\inprod{\theta^*}{x})\notin\{0,1\}$.
@@ -147,13 +152,12 @@ solve the Network Inference problem.
 \label{cor:variable_selection}
 Under the same assumptions as Theorem~\ref{thm:main}, let $\hat {\cal S}_\eta
 \defeq \{ j \in \{1,\ldots, m\} : \hat{\theta}_j > \eta\}$ for $\eta > 0$. For
-$\epsilon>0$ and $\epsilon < \eta$, let ${\cal S}^*_{\eta + \epsilon} \defeq \{
-i \in \{1,\ldots,m\} :\theta_i^* > \eta +\epsilon \}$ be the set of all true
-`strong' parents. Suppose the number of measurements verifies: $ n >
-\frac{9s\log m}{\alpha\gamma^2\epsilon^2}$.  Then with probability
-$1-\frac{1}{m}$, ${\cal S}^*_{\eta + \epsilon} \subseteq \hat {\cal S}_\eta
-\subseteq {\cal S}^*$. In other words we recover all `strong' parents and no
-`false' parents.
+$0< \epsilon < \eta$, let ${\cal S}^*_{\eta + \epsilon} \defeq \{ i \in
+\{1,\ldots,m\} :\theta_i^* > \eta +\epsilon \}$ be the set of all true `strong'
+parents. Suppose the number of measurements verifies: $ n > \frac{9s\log
+m}{\alpha\gamma^2\epsilon^2}$.  Then with probability $1-\frac{1}{m}$, ${\cal
+S}^*_{\eta + \epsilon} \subseteq \hat {\cal S}_\eta \subseteq {\cal S}^*$. In
+other words we recover all `strong' parents and no `false' parents.
 \end{corollary}
 
 \begin{proof}
@@ -191,13 +195,10 @@ $
 be the best $s$-approximation to $\theta^*$. Then we pay a cost proportional
 to $\|\theta^* - \theta^*_{\lfloor s\rfloor}\|_1$ for recovering the weights of
 non-exactly sparse vectors. This cost is simply the ``tail'' of
-$\theta^*$: the sum of the $m-s$ smallest coordinates of $\theta^*$.
-
-In other words, the closer $\theta^*$ is to being sparse, the smaller the
-price, and we recover the results of Section~\ref{sec:main_theorem} in the
-limit of exact sparsity. These results are formalized in the following theorem,
-which is also a consequence of Theorem 1 in \cite{Negahban:2009}.
-
+$\theta^*$: the sum of the $m-s$ smallest coordinates of $\theta^*$. We recover
+the results of Section~\ref{sec:main_theorem} in the limit of exact sparsity.
+These results are formalized in the following theorem, which is also a
+consequence of Theorem 1 in \cite{Negahban:2009}.
 \begin{theorem}
 \label{thm:approx_sparse}
 Suppose the {\bf(RE)} assumption holds for the Hessian $\nabla^2 f(\theta^*)$
@@ -238,21 +239,19 @@ then similarly: ${\cal S}^*_{\eta + \epsilon} \subset \hat {\cal S}_\eta
 
 In this section, we discuss the main assumption of Theorem~\ref{thm:main} namely
 the restricted eigenvalue condition introduced by~\cite{bickel2009simultaneous}.
-We then compare to the irrepresentability condition considered in
-\cite{Daneshmand:2014}.
 
 \paragraph{Interpretation}
 
 There exists a large class of sufficient conditions under which sparse recovery
-is achievable in the context of regularized estimation. A good survey of these
-different assumptions can be found in \cite{vandegeer:2009}.
-
+is achievable in the context of regularized estimation~\cite{vandegeer:2009}.
 The restricted eigenvalue condition is one of the weakest such assumption. It
 can be interpreted as a restricted form of non-degeneracy. Since we apply it to
 the Hessian of the log-likelihood function $\nabla^2 \mathcal{L}(\theta)$, it
 essentially reduces to a form of restricted strong convexity, that
-Lemma~\ref{lem:negahban} ultimately relies on. Observe that the Hessian of
-$\mathcal{L}$ can be seen as a re-weighted Gram matrix of the observations:
+Lemma~\ref{lem:negahban} ultimately relies on.
+
+Observe that the Hessian of $\mathcal{L}$ can be seen as a re-weighted
+\emph{Gram matrix} of the observations:
 \begin{multline*}
     \nabla^2\mathcal{L}(\theta^*)
     = \frac{1}{|\mathcal{T}|}\sum_{t\in\mathcal{T}}x^t(x^t)^T
@@ -269,8 +268,7 @@ and the $(S, \gamma)$-({\bf RE}) condition on the Hessian in
 Theorem~\ref{thm:main} and Theorem~\ref{thm:approx_sparse} reduces to a
 condition on the \emph{gram matrix} of the observations $X^T X =
 \frac{1}{|\mathcal{T}|}\sum_{t\in\mathcal{T}}x^t(x^t)^T$ with $\gamma \leftarrow
-\gamma\cdot c$. The (RE)-condition is then equivalent to the assumption made in
-the Lasso analysis of~\cite{bickel:2009}.
+\gamma\cdot c$.
 
 \paragraph{(RE) with high probability}
 
@@ -281,19 +279,18 @@ this probabilistic model. Several recent papers show that large classes of
 correlated designs obey the restricted eigenvalue property with high
 probability \cite{raskutti:10, rudelson:13}.
 
-The (RE)-condition is well-behaved in the following sense: if it holds for the
-expected Hessian matrix $\E[\nabla^2\mathcal{L}(\theta^*)]$, then it holds for
-the finite sample Hessian matrix $\nabla^2\mathcal{L}(\theta^*)$ with high
+The {\bf(RE)}-condition is well-behaved in the following sense: if it holds for
+the expected Hessian matrix $\E[\nabla^2\mathcal{L}(\theta^*)]$, then it holds
+for the finite sample Hessian matrix $\nabla^2\mathcal{L}(\theta^*)$ with high
 probability.
 
-Note that the expected Hessian matrix is exactly the Fisher
-Information matrix of the generalized linear cascade model which captures the
-amount of information about $\theta^*$ conveyed by the random observations.
-The (RE)-condition has an even more natural interpretation when $f$ and $1-f$
-are strictly log-convex, since the expected \emph{gram matrix} $A \equiv
-\mathbb{E}[X^T X]$ is a matrix whose entry $a_{i,j}$ is the average number of
-times node $i$ and node $j$ are infected at the same time in the cascade
-process, and $a_{i,i}$ is the average number of times node $i$ is infected.
+If $f$ and $1-f$ are strictly log-convex, then the previous observations yields
+the following natural interpretation of the {\bf(RE)}-condition: the expected
+\emph{gram matrix} $A \equiv \mathbb{E}[X^T X]$ is a matrix whose entry
+$a_{i,j}$ is the average fraction of times node $i$ and node $j$ are infected
+at the same time during several runs of the cascade process. Note that the
+diagonal terms $a_{i,i}$ are simply the average fraction of times the
+corresponding node $i$ is infected.
 
 Therefore, under an assumption which only involves the probabilistic model and
 not the actual observations, Proposition~\ref{prop:fi} allows us to draw the