diff options
Diffstat (limited to 'paper/sections/results.tex')
| -rw-r--r-- | paper/sections/results.tex | 75 |
1 files changed, 36 insertions, 39 deletions
diff --git a/paper/sections/results.tex b/paper/sections/results.tex index a0d3159..f68ecee 100644 --- a/paper/sections/results.tex +++ b/paper/sections/results.tex @@ -42,18 +42,23 @@ by~\cite{bickel2009simultaneous}. A discussion of the $(S,\gamma)$-{\bf(RE)} assumption in the context of generalized linear cascade models can be found in Section~\ref{sec:re}. In our -setting we require that the (RE)-condition holds for the Hessian of the +setting we require that the {\bf(RE)}-condition holds for the Hessian of the log-likelihood function $\mathcal{L}$: it essentially captures the fact that the binary vectors of the set of active nodes are not \emph{too} collinear. {\color{red} Rewrite the minimal assumptions necessary} We will also need the following assumption on the inverse link function $f$ of the generalized linear cascade model: +%\begin{equation} + %\tag{LF} + %\max\left(\left|\frac{f'}{f}(\inprod{\theta^*}{x})\right|, + %\left|\frac{f'}{1-f}(\inprod{\theta^*}{x})\right|\right)\leq + %\frac{1}{\alpha} +%\end{equation} \begin{equation} - \tag{LF} - \max\left(\left|\frac{f'}{f}(\inprod{\theta^*}{x})\right|, - \left|\frac{f'}{1-f}(\inprod{\theta^*}{x})\right|\right)\leq - \frac{1}{\alpha} + \tag{LF} + \max \left( \| (\log f)' \|_{\infty}, \|(\log (1-f))' \|_{\infty} \right) + \leq \frac{1}{\alpha} \end{equation} for some $\alpha\in(0, 1)$ and for all $x\in\reals^m$ such that $f(\inprod{\theta^*}{x})\notin\{0,1\}$. @@ -147,13 +152,12 @@ solve the Network Inference problem. \label{cor:variable_selection} Under the same assumptions as Theorem~\ref{thm:main}, let $\hat {\cal S}_\eta \defeq \{ j \in \{1,\ldots, m\} : \hat{\theta}_j > \eta\}$ for $\eta > 0$. For -$\epsilon>0$ and $\epsilon < \eta$, let ${\cal S}^*_{\eta + \epsilon} \defeq \{ -i \in \{1,\ldots,m\} :\theta_i^* > \eta +\epsilon \}$ be the set of all true -`strong' parents. Suppose the number of measurements verifies: $ n > -\frac{9s\log m}{\alpha\gamma^2\epsilon^2}$. Then with probability -$1-\frac{1}{m}$, ${\cal S}^*_{\eta + \epsilon} \subseteq \hat {\cal S}_\eta -\subseteq {\cal S}^*$. In other words we recover all `strong' parents and no -`false' parents. +$0< \epsilon < \eta$, let ${\cal S}^*_{\eta + \epsilon} \defeq \{ i \in +\{1,\ldots,m\} :\theta_i^* > \eta +\epsilon \}$ be the set of all true `strong' +parents. Suppose the number of measurements verifies: $ n > \frac{9s\log +m}{\alpha\gamma^2\epsilon^2}$. Then with probability $1-\frac{1}{m}$, ${\cal +S}^*_{\eta + \epsilon} \subseteq \hat {\cal S}_\eta \subseteq {\cal S}^*$. In +other words we recover all `strong' parents and no `false' parents. \end{corollary} \begin{proof} @@ -191,13 +195,10 @@ $ be the best $s$-approximation to $\theta^*$. Then we pay a cost proportional to $\|\theta^* - \theta^*_{\lfloor s\rfloor}\|_1$ for recovering the weights of non-exactly sparse vectors. This cost is simply the ``tail'' of -$\theta^*$: the sum of the $m-s$ smallest coordinates of $\theta^*$. - -In other words, the closer $\theta^*$ is to being sparse, the smaller the -price, and we recover the results of Section~\ref{sec:main_theorem} in the -limit of exact sparsity. These results are formalized in the following theorem, -which is also a consequence of Theorem 1 in \cite{Negahban:2009}. - +$\theta^*$: the sum of the $m-s$ smallest coordinates of $\theta^*$. We recover +the results of Section~\ref{sec:main_theorem} in the limit of exact sparsity. +These results are formalized in the following theorem, which is also a +consequence of Theorem 1 in \cite{Negahban:2009}. \begin{theorem} \label{thm:approx_sparse} Suppose the {\bf(RE)} assumption holds for the Hessian $\nabla^2 f(\theta^*)$ @@ -238,21 +239,19 @@ then similarly: ${\cal S}^*_{\eta + \epsilon} \subset \hat {\cal S}_\eta In this section, we discuss the main assumption of Theorem~\ref{thm:main} namely the restricted eigenvalue condition introduced by~\cite{bickel2009simultaneous}. -We then compare to the irrepresentability condition considered in -\cite{Daneshmand:2014}. \paragraph{Interpretation} There exists a large class of sufficient conditions under which sparse recovery -is achievable in the context of regularized estimation. A good survey of these -different assumptions can be found in \cite{vandegeer:2009}. - +is achievable in the context of regularized estimation~\cite{vandegeer:2009}. The restricted eigenvalue condition is one of the weakest such assumption. It can be interpreted as a restricted form of non-degeneracy. Since we apply it to the Hessian of the log-likelihood function $\nabla^2 \mathcal{L}(\theta)$, it essentially reduces to a form of restricted strong convexity, that -Lemma~\ref{lem:negahban} ultimately relies on. Observe that the Hessian of -$\mathcal{L}$ can be seen as a re-weighted Gram matrix of the observations: +Lemma~\ref{lem:negahban} ultimately relies on. + +Observe that the Hessian of $\mathcal{L}$ can be seen as a re-weighted +\emph{Gram matrix} of the observations: \begin{multline*} \nabla^2\mathcal{L}(\theta^*) = \frac{1}{|\mathcal{T}|}\sum_{t\in\mathcal{T}}x^t(x^t)^T @@ -269,8 +268,7 @@ and the $(S, \gamma)$-({\bf RE}) condition on the Hessian in Theorem~\ref{thm:main} and Theorem~\ref{thm:approx_sparse} reduces to a condition on the \emph{gram matrix} of the observations $X^T X = \frac{1}{|\mathcal{T}|}\sum_{t\in\mathcal{T}}x^t(x^t)^T$ with $\gamma \leftarrow -\gamma\cdot c$. The (RE)-condition is then equivalent to the assumption made in -the Lasso analysis of~\cite{bickel:2009}. +\gamma\cdot c$. \paragraph{(RE) with high probability} @@ -281,19 +279,18 @@ this probabilistic model. Several recent papers show that large classes of correlated designs obey the restricted eigenvalue property with high probability \cite{raskutti:10, rudelson:13}. -The (RE)-condition is well-behaved in the following sense: if it holds for the -expected Hessian matrix $\E[\nabla^2\mathcal{L}(\theta^*)]$, then it holds for -the finite sample Hessian matrix $\nabla^2\mathcal{L}(\theta^*)$ with high +The {\bf(RE)}-condition is well-behaved in the following sense: if it holds for +the expected Hessian matrix $\E[\nabla^2\mathcal{L}(\theta^*)]$, then it holds +for the finite sample Hessian matrix $\nabla^2\mathcal{L}(\theta^*)$ with high probability. -Note that the expected Hessian matrix is exactly the Fisher -Information matrix of the generalized linear cascade model which captures the -amount of information about $\theta^*$ conveyed by the random observations. -The (RE)-condition has an even more natural interpretation when $f$ and $1-f$ -are strictly log-convex, since the expected \emph{gram matrix} $A \equiv -\mathbb{E}[X^T X]$ is a matrix whose entry $a_{i,j}$ is the average number of -times node $i$ and node $j$ are infected at the same time in the cascade -process, and $a_{i,i}$ is the average number of times node $i$ is infected. +If $f$ and $1-f$ are strictly log-convex, then the previous observations yields +the following natural interpretation of the {\bf(RE)}-condition: the expected +\emph{gram matrix} $A \equiv \mathbb{E}[X^T X]$ is a matrix whose entry +$a_{i,j}$ is the average fraction of times node $i$ and node $j$ are infected +at the same time during several runs of the cascade process. Note that the +diagonal terms $a_{i,i}$ are simply the average fraction of times the +corresponding node $i$ is infected. Therefore, under an assumption which only involves the probabilistic model and not the actual observations, Proposition~\ref{prop:fi} allows us to draw the |
