diff options
| -rw-r--r-- | paper/sections/assumptions.tex | 19 | ||||
| -rw-r--r-- | paper/sections/results.tex | 6 |
2 files changed, 10 insertions, 15 deletions
diff --git a/paper/sections/assumptions.tex b/paper/sections/assumptions.tex index 5e412a4..89113b4 100644 --- a/paper/sections/assumptions.tex +++ b/paper/sections/assumptions.tex @@ -1,23 +1,20 @@ -%There have been a series of papers arguing that the standard Lasso is an inappropriate exact variable selection method \cite{Zou:2006}, \cite{vandegeer:2011}, since it relies on the essentially necessary irrepresentability condition, introduced in \cite{Zhao:2006}. This is the condition on which the analysis of \cite{Daneshmand:2014} relies. It has been noted this condition is rather stringent and rarely holds in practical situations where correlation between variables occurs. Several alternatives have been suggested, including the adaptive and thresholded lasso, which relax this assumption. We defer an extended analysis of the irrepresentability assumption to Section~\ref{sec:assumptions}. - -In this section, we compare the main assumption of \cite{Daneshmand:2014}, commonly known as the {\it irrepresentability condition}, to the restricted eigenvalue condition. We argue that the restricted eigenvalue is more likely to hold in practical situations. +In this section, we discuss the main assumption of Theorem~\ref{thm:neghaban} namely the restricted eigenvalue condition. We begin by comparing to the irrepresentability condition considered in \cite{Daneshmand:2014}. \subsection{The Irrepresentability Condition} \cite{Daneshmand:2014} rely on an `incoherence' condition on the hessian of the likelihood function. It is in fact easy to see that their condition is equivalent to the more commonly called {\it (S,s)-irrepresentability} condition: \begin{definition} -Following similar notation, let $Q^* \defeq \nabla^2 f(\theta^*)$. Let $S$ and $S^c$ be the set of indices of all the parents and non-parents respectively and $Q_{S,S}$, $Q_{S^c,S}$, $Q_{S, S^c}$, and $Q_{S^c, S^c}$ the induced sub-matrices. - - -The {\it (S,s)-irrepresentability} condition is defined as: +Following similar notation, let $Q^* \defeq \nabla^2 f(\theta^*)$. Let $S$ and $S^c$ be the set of indices of all the parents and non-parents respectively and $Q_{S,S}$, $Q_{S^c,S}$, $Q_{S, S^c}$, and $Q_{S^c, S^c}$ the induced sub-matrices. Consider the following constant: \begin{equation} \nu_{\text{irrepresentable}}(S) \defeq \max_{\tau \in \mathbb{R}^p \ :\ \| \tau \|_{\infty} \leq 1} \|Q_{S^c, S} Q_{S, S}^{-1} \tau\|_{\infty} \end{equation} +The (S,s)-irrepresentability holds if $\nu_{\text{irrepresentable}}(S) < 1 - \epsilon$ for $\epsilon > 0$ \end{definition} -If our objective is to recover the support of the graph exactly, the irrepresentability condition has been shown to be essentially necessary \cite{Zhao:2006}. However, several recent papers \cite{vandegeer:2011}, \cite{Zou:2006}, argue this condition is unrealistic in situations where there is a correlation between variables. Consider the following simplified example from \cite{vandegeer:2011}: +This condition has been shown to be essentially necessary for variable selection \cite{Zhao:2006}. Several recent papers \cite{vandegeer:2011}, \cite{Zou:2006}, argue this condition is unrealistic in situations where there is a correlation between variables. Consider the following simplified example from \cite{vandegeer:2011}: \begin{equation} +\nonumber \left( \begin{array}{cccc} I_s & \rho J \\ @@ -27,7 +24,7 @@ I_s & \rho J \\ \end{equation} where $I_s$ is the $s \times s$ identity matrix, $J$ is the all-ones matrix and $\rho \in \mathbb{R}^+$. It is easy to see that $\nu_{\text{irrepresentable}}(S) = \rho s$ and $\lambda_{\min}(Q) \geq 1 - \rho$, such that for any $\rho > \frac{1}{s}$ and $\rho < 1$, the restricted eigenvalue holds trivially but the (S,s)-irrepresentability does not hold. -As mentioned previously, it is intuitive that the irrepresentability condition is stronger than our suggested {\bf(RE)} assumption. In fact, by adapting slightly a result from \cite{vandegeer:2009}, we can show that a `strong' irrepresentability condition directly {\it implies} the {\bf(RE)} condition for $\ell2$-recovery: +It is intuitive that the irrepresentability condition is stronger than the {\bf(RE)} assumption. In fact, a slightly modified result from \cite{vandegeer:2009}shows that a `strong' irrepresentability condition directly {\it implies} the {\bf(RE)} condition for $\ell2$-recovery: \begin{proposition} \label{prop:irrepresentability} @@ -37,9 +34,7 @@ If the irrepresentability condition holds with $\epsilon > \frac{2}{3}$, then th \subsection{The Restricted Eigenvalue Condition} -Yet, assuming we only wish to recover all edges above a certain threshold, bounding the $\ell2$-error allows us to recover all edges with weights above a certain minimum threshold under an intuitively weaker {\bf(RE)} condition. In practical scenarios, such as in social networks, where one seeks to recover significant edges, this is a reasonable assumption. - -Expressing the restricted eigenvalue assumption for correlated measurements as parameters of the graph and the cascade diffusion process is non-trivial. The restricted eigenvalue property is however well behaved in the following sense: +In practical scenarios, such as in social networks, recovering only the `significant' edges is a reasonable assumption. This can be done under the less restrictive eigenvalue assumption. Expressing $\gamma$ as a function of the cascade model parameters process is non-trivial. The restricted eigenvalue property is however well behaved in the following sense: \begin{lemma} \label{lem:expected_hessian} diff --git a/paper/sections/results.tex b/paper/sections/results.tex index 2831a51..0c2c7e5 100644 --- a/paper/sections/results.tex +++ b/paper/sections/results.tex @@ -51,7 +51,7 @@ Then with probability $1-\frac{1}{m}$, ${\cal S}^*_{\eta + \epsilon} \subset \ha Suppose $\theta$ is exactly s-sparse. By choosing $\delta = 0$, if $n>\frac{36}{p_{\min}\gamma^2 \epsilon^2} s \log m$, then $\|\hat \theta-\theta\|_2 < \epsilon < \eta$ with probability $1-\frac{1}{m}$. If $\theta_i = 0$ and $\hat \theta > \eta$, then $\|\hat \theta - \theta\|_2 \geq |\hat \theta_i-\theta_j| > \eta$, which is a contradiction. Therefore we get no false positives. If $\theta_i = \eta + \epsilon$, then $|\hat \theta_i - (\eta+\epsilon)| < \epsilon/2 \implies \theta_j > \eta + \epsilon/2$. Therefore, we get all strong parents. \end{proof} -Note that in order to allow for \emph{perfect recovery}, we must estimate $p_{\min}$ within $\epsilon$ and set $\eta \defeq p_{\min} - \epsilon$. +Note that if we want \emph{perfect recovery} of all edges, we must estimate $p_{\min}$ within $\epsilon'$ and set $\eta \defeq p_{\min} - \epsilon'$ \subsection{Relaxing the Sparsity Constraint} @@ -91,10 +91,10 @@ then similarly: ${\cal S}^*_{\eta + \epsilon} \subset \hat {\cal S}_\eta \subset We begin our analysis with the following simple lemma: \begin{lemma} \label{lem:theta_p_upperbound} -$\|\theta - \theta^* \|_2 \geq \|p - p^*\|_2$ +$\|\theta - \theta' \|_2 \geq \|p - p^*\|_2$ and $\|\theta_{\lfloor s \rfloor}\|_1 \geq \| p_{\lfloor s \rfloor} \|_1$ \end{lemma} \begin{proof} -Using the inequality $\forall x>0, \; \log x \geq 1 - \frac{1}{x}$, we have $|\log (1 - p) - \log (1-p')| \geq \max(1 - \frac{1-p}{1-p'}, 1 - \frac{1-p'}{1-p}) \geq \max( p-p', p'-p)$. The result follows easily. +Using the inequality $\forall x>0, \; \log x \geq 1 - \frac{1}{x}$, we have $|\log (1 - p) - \log (1-p')| \geq \max(1 - \frac{1-p}{1-p'}, 1 - \frac{1-p'}{1-p}) \geq \max( p-p', p'-p)$. The second result follows easily from the monotonicity of $\log$ and $\forall x > 0, | \log (1 -x) | \geq x$ \end{proof} In other words, finding an upper-bound for the estimation error of the `effective' parameters $\theta_{i,j} \defeq \log(1-p_{i,j})$ provides immediately an upper-bound for the estimation error of the true parameters $(p_{i,j})_{i,j}$. |
