diff options
| -rw-r--r-- | paper/paper.tex | 6 | ||||
| -rw-r--r-- | paper/sections/assumptions.tex | 46 | ||||
| -rw-r--r-- | paper/sections/results.tex | 138 |
3 files changed, 108 insertions, 82 deletions
diff --git a/paper/paper.tex b/paper/paper.tex index 4627890..4a25484 100644 --- a/paper/paper.tex +++ b/paper/paper.tex @@ -109,9 +109,9 @@ \label{sec:lowerbound} \input{sections/lowerbound.tex} -\section{Assumptions} -\label{sec:assumptions} -\input{sections/assumptions.tex} +%\section{Assumptions} +%\label{sec:assumptions} +%\input{sections/assumptions.tex} \section{Linear Threshold Model} \label{sec:linear_threshold} diff --git a/paper/sections/assumptions.tex b/paper/sections/assumptions.tex index 33d8e69..e69de29 100644 --- a/paper/sections/assumptions.tex +++ b/paper/sections/assumptions.tex @@ -1,46 +0,0 @@ -In this section, we discuss the main assumption of Theorem~\ref{thm:neghaban} namely the restricted eigenvalue condition. We then compare to the irrepresentability condition considered in \cite{Daneshmand:2014}. - -\subsection{The Restricted Eigenvalue Condition} - -The restricted eigenvalue condition, introduced in \cite{bickel:2009}, is one of the weakest sufficient condition on the design matrix for successful sparse recovery \cite{vandegeer:2009}. Several recent papers show that large classes of correlated designs obey the restricted eigenvalue property with high probability \cite{raskutti:10} \cite{rudelson:13}. - -Expressing the minimum restricted eigenvalue $\gamma$ as a function of the cascade model parameters is highly non-trivial. Yet, the restricted eigenvalue property is however well behaved in the following sense: under reasonable assumptions, if the population matrix of the hessian $\mathbb{E} \left[\nabla^2 {\cal L}(\theta) \right]$, corresponding to the \emph{Fisher Information Matrix} of the Cascade Model as a function of $\Theta$, verifies the restricted eigenvalue property, then the finite sample hessian also verifies the restricted eigenvalue property with overwhelming probability. It is straightforward to show this holds when $n \geq C s^2 \log m$ \cite{vandegeer:2009}, where $C$ is an absolute constant. By adapting Theorem 8 \cite{rudelson:13}\footnote{this result still needs to be confirmed!}, this can be reduced to: - -$$n \geq C s \log m \log^3 \left( \frac{s \log m}{C'} \right)$$ - -where $C, C'$ are constants not depending on $(s, m, n)$. - - -\subsection{The Irrepresentability Condition} - -\cite{Daneshmand:2014} rely on an `incoherence' condition on the hessian of the likelihood function. It is in fact easy to see that their condition is equivalent to the more commonly called {\it (S,s)-irrepresentability} condition: - -\begin{definition} -Following similar notation, let $Q^* \defeq \nabla^2 f(\theta^*)$. Let $S$ and $S^c$ be the set of indices of all the parents and non-parents respectively and $Q_{S,S}$, $Q_{S^c,S}$, $Q_{S, S^c}$, and $Q_{S^c, S^c}$ the induced sub-matrices. Consider the following constant: -\begin{equation} -\nu_{\text{irrepresentable}}(S) \defeq \max_{\tau \in \mathbb{R}^p \ :\ \| \tau \|_{\infty} \leq 1} \|Q_{S^c, S} Q_{S, S}^{-1} \tau\|_{\infty} -\end{equation} -The (S,s)-irrepresentability holds if $\nu_{\text{irrepresentable}}(S) < 1 - \epsilon$ for $\epsilon > 0$ -\end{definition} - -This condition has been shown to be essentially necessary for variable selection \cite{Zhao:2006}. Several recent papers \cite{vandegeer:2011}, \cite{Zou:2006}, argue this condition is unrealistic in situations where there is a correlation between variables. Consider the following simplified example from \cite{vandegeer:2011}: -\begin{equation} -\nonumber -\left( -\begin{array}{cccc} -I_s & \rho J \\ -\rho J & I_s \\ -\end{array} -\right) -\end{equation} -where $I_s$ is the $s \times s$ identity matrix, $J$ is the all-ones matrix and $\rho \in \mathbb{R}^+$. It is easy to see that $\nu_{\text{irrepresentable}}(S) = \rho s$ and $\lambda_{\min}(Q) \geq 1 - \rho$, such that for any $\rho > \frac{1}{s}$ and $\rho < 1$, the restricted eigenvalue holds trivially but the (S,s)-irrepresentability does not hold. - -It is intuitive that the irrepresentability condition is stronger than the {\bf(RE)} assumption. In fact, a slightly modified result from \cite{vandegeer:2009} shows that a `strong' irrepresentability condition directly {\it implies} the {\bf(RE)} condition for $\ell2$-recovery: - -\begin{proposition} -\label{prop:irrepresentability} -If the irrepresentability condition holds with $\epsilon > \frac{2}{3}$, then the restricted eigenvalue condition holds with constant $\gamma_n \geq \frac{ (1 - 3(1 -\epsilon))^2 \lambda_{\min}^2}{4s}n$, where $\lambda_{\min} > 0$ is the smallest eigenvalue of $Q^*_{S,S}$, on which the results of \cite{Daneshmand:2014} also depend. -\end{proposition} - - - diff --git a/paper/sections/results.tex b/paper/sections/results.tex index a948b20..6e67058 100644 --- a/paper/sections/results.tex +++ b/paper/sections/results.tex @@ -16,21 +16,6 @@ omit the subscript $i$ in the notations when there is no ambiguity. The recovery problem is now the one of estimating a single vector $\theta^*$ from a set $\mathcal{T}$ of observations. We will write $n\defeq |\mathcal{T}|$. -\begin{comment} -As mentioned previously, our objective is twofold: -\begin{enumerate} - \item We seek to recover the edges of the graph. In other words, for each - node $j$, we wish to identify the set of indices $\{i: \Theta_{ij} \neq - 0\}$ . - \item We seek to recover the edge weights $\Theta$ of ${\cal G}$ i.e. - upper-bound the error $\|\hat \Theta - \Theta \|_2$. -\end{enumerate} -It is easy to see that solving the second objective allows us to solve the -first. If $\|\hat \Theta - \Theta \|_2$ is sufficiently small, we can recover -all large coordinates of $\Theta$ by keeping only the coordinates above -a certain threshold. -\end{comment} - \subsection{Main Theorem} \label{sec:main_theorem} @@ -52,7 +37,7 @@ on the now standard \emph{restricted eigenvalue condition}. \end{definition} A discussion of this assumption in the context of generalized linear cascade -models can be found in Section~\ref{sec:}. +models can be found in Section~\ref{sec:re}. We will also need the following assumption on the inverse link function $f$ of the generalized linear cascade model: @@ -62,11 +47,15 @@ the generalized linear cascade model: \left|\frac{f'}{1-f}(\inprod{\theta^*}{x})\right|\right)\leq \frac{1}{\alpha} \end{equation} -for all $x\in\reals^m$ such that $f(\inprod{\theta^*}{x})\notin\{0,1\}$. +for some $\alpha\in(0, 1)$ and for all $x\in\reals^m$ such that +$f(\inprod{\theta^*}{x})\notin\{0,1\}$. -It is easy to verify that this will be true if for all -$(i,j)\in E$, $\delta\leq p_{i,j}\leq 1-\delta$ in the Voter model and when -$p_{i,j}\leq 1-\delta$ in the Independent Cascade model. +In the voter model, $\frac{f'}{f}(z) = \frac{1}{z}$ and $\frac{f'}{f}(1-z) += \frac{1}{1-z}$. Hence (LF) will hold as soon as $\alpha\leq \Theta_{i,j}\leq +1-\alpha$ for all $(i,j)\in E$. Similarly, in the Independent Cascade Model, +$\frac{f'}{f}(z) = \frac{1}{1-e^z}$ and $\frac{f'}{1-f}(z) = -1$ and (LF) holds +if $p_{i, j}\leq 1-\alpha$ for all $(i, j)\in E$. Remember that in this case we +have $\Theta_{i,j} = \log(1-p_{i,j})$. \begin{theorem} \label{thm:main} @@ -84,9 +73,27 @@ of \eqref{eq:pre-mle} with $\lambda \defeq 2\sqrt{\frac{\log m}{\alpha n^{1 \end{equation} \end{theorem} -This theorem is a consequence of Theorem~1 in \cite{Negahban:2009} which gives -a bound on the convergence rate of regularized estimators. We state their -theorem in the context of $\ell_1$ regularization in Lemma~\ref{lem:negahban}. +Before moving to the proof of Theorem~\ref{thm:main}, note that interpreting it +in the case of the Independent Cascade Model requires one more step. Indeed, to +cast it as a generalized linear cascade model, we had to perform the +transformation $\Theta_{i,j} = \log(1-p_{i,j})$, where $p_{i,j}$ are the +infection probabilities. Fortunately, if we estimate $p_j}$ through +$\hat{p}_{j} = \hat{\theta}_j$, a bound on $\|\hat\theta - \theta^*\|_2$ +directly implies a bound on $\|\hat p - p^*\|$. Indeed we have: + +\begin{lemma} + $\|\hat{\theta} - \theta' \|_2 \geq \|\hat{p} - p^*\|_2$. +\end{lemma} +\begin{proof} +Using the inequality $\forall x>0, \; \log x \geq 1 - \frac{1}{x}$, we have +$|\log (1 - p) - \log (1-p')| \geq \max(1 - \frac{1-p}{1-p'}, +1 - \frac{1-p'}{1-p}) \geq \max( p-p', p'-p)$. +\end{proof} + +Theorem~\ref{thm:main} is a consequence of Theorem~1 in \cite{Negahban:2009} +which gives a bound on the convergence rate of regularized estimators. We state +their theorem in the context of $\ell_1$ regularization in +Lemma~\ref{lem:negahban}. \begin{lemma} \label{lem:negahban} @@ -208,21 +215,86 @@ then similarly: ${\cal S}^*_{\eta + \epsilon} \subset \hat {\cal S}_\eta \subset \end{corollary} -\subsection{The Independent Cascade Model} -Recall that to cast the Independent Cascade model as a Generalized Linear Cascade, we performed the following change of variables: $\forall i,j \ \Theta_{i,j} = \log(1 - p_{i,j})$. The previous results hold \emph{a priori} for the ``effective'' parameter $\Theta$. The following lemma shows they also hold for the original infection probabilities $p_{i,j}$: -\begin{lemma} -\label{lem:theta_p_upperbound} -$\|\theta - \theta' \|_2 \geq \|p - p^*\|_2$ and $\|\theta_{\lfloor s \rfloor}\|_1 \geq \| p_{\lfloor s \rfloor} \|_1$ -\end{lemma} -\begin{proof} -Using the inequality $\forall x>0, \; \log x \geq 1 - \frac{1}{x}$, we have $|\log (1 - p) - \log (1-p')| \geq \max(1 - \frac{1-p}{1-p'}, 1 - \frac{1-p'}{1-p}) \geq \max( p-p', p'-p)$. The second result follows easily from the monotonicity of $\log$ and $\forall x > 0, | \log (1 -x) | \geq x$ -\end{proof} +\subsection{Restricted Eigenvalue Condition} +\label{sec:re} + +In this section, we discuss the main assumption of Theorem~\ref{thm:main} +namely the restricted eigenvalue condition. We then compare to the +irrepresentability condition considered in \cite{Daneshmand:2014}. + +\paragraph{Interpretation} + +The restricted eigenvalue condition, introduced in \cite{bickel:2009}, is one +of the weakest sufficient condition on the design matrix for successful sparse +recovery \cite{vandegeer:2009}. Several recent papers show that large classes +of correlated designs obey the restricted eigenvalue property with high +probability \cite{raskutti:10} \cite{rudelson:13}. + + +\paragraph{(RE) with high probability} -The results of sections~\ref{sec:main_theorem} and \ref{sec:relaxing_sparsity} therefore hold for the original transition probabilities $p$. +Expressing the minimum restricted eigenvalue $\gamma$ as a function of the +cascade model parameters is highly non-trivial. Yet, the restricted eigenvalue +property is however well behaved in the following sense: under reasonable +assumptions, if the population matrix of the hessian $\mathbb{E} \left[\nabla^2 +{\cal L}(\theta) \right]$, corresponding to the \emph{Fisher Information +Matrix} of the Cascade Model as a function of $\Theta$, verifies the restricted +eigenvalue property, then the finite sample hessian also verifies the +restricted eigenvalue property with overwhelming probability. It is +straightforward to show this holds when $n \geq C s^2 \log m$ +\cite{vandegeer:2009}, where $C$ is an absolute constant. By adapting Theorem +8 \cite{rudelson:13}\footnote{this result still needs to be confirmed!}, this + can be reduced to: +\begin{displaymath} +n \geq C s \log m \log^3 \left( \frac{s \log m}{C'} \right) +\end{displaymath} +where $C, C'$ are constants not depending on $(s, m, n)$. + +\paragraph{(RE) vs Irrepresentability Condition} + +\cite{Daneshmand:2014} rely on an `incoherence' condition on the hessian of the likelihood function. It is in fact easy to see that their condition is equivalent to the more commonly called {\it (S,s)-irrepresentability} condition: + +\begin{definition} +Following similar notation, let $Q^* \defeq \nabla^2 f(\theta^*)$. Let $S$ and $S^c$ be the set of indices of all the parents and non-parents respectively and $Q_{S,S}$, $Q_{S^c,S}$, $Q_{S, S^c}$, and $Q_{S^c, S^c}$ the induced sub-matrices. Consider the following constant: +\begin{equation} +\nu_{\text{irrepresentable}}(S) \defeq \max_{\tau \in \mathbb{R}^p \ :\ \| \tau \|_{\infty} \leq 1} \|Q_{S^c, S} Q_{S, S}^{-1} \tau\|_{\infty} +\end{equation} +The (S,s)-irrepresentability holds if $\nu_{\text{irrepresentable}}(S) < 1 - \epsilon$ for $\epsilon > 0$ +\end{definition} + +This condition has been shown to be essentially necessary for variable selection \cite{Zhao:2006}. Several recent papers \cite{vandegeer:2011}, \cite{Zou:2006}, argue this condition is unrealistic in situations where there is a correlation between variables. Consider the following simplified example from \cite{vandegeer:2011}: +\begin{equation} +\nonumber +\left( +\begin{array}{cccc} +I_s & \rho J \\ +\rho J & I_s \\ +\end{array} +\right) +\end{equation} +where $I_s$ is the $s \times s$ identity matrix, $J$ is the all-ones matrix and $\rho \in \mathbb{R}^+$. It is easy to see that $\nu_{\text{irrepresentable}}(S) = \rho s$ and $\lambda_{\min}(Q) \geq 1 - \rho$, such that for any $\rho > \frac{1}{s}$ and $\rho < 1$, the restricted eigenvalue holds trivially but the (S,s)-irrepresentability does not hold. + +It is intuitive that the irrepresentability condition is stronger than the {\bf(RE)} assumption. In fact, a slightly modified result from \cite{vandegeer:2009} shows that a `strong' irrepresentability condition directly {\it implies} the {\bf(RE)} condition for $\ell2$-recovery: + +\begin{proposition} +\label{prop:irrepresentability} +If the irrepresentability condition holds with $\epsilon > \frac{2}{3}$, then the restricted eigenvalue condition holds with constant $\gamma_n \geq \frac{ (1 - 3(1 -\epsilon))^2 \lambda_{\min}^2}{4s}n$, where $\lambda_{\min} > 0$ is the smallest eigenvalue of $Q^*_{S,S}$, on which the results of \cite{Daneshmand:2014} also depend. +\end{proposition} \begin{comment} \begin{lemma} Let ${\cal C}({\cal M}, \bar {\cal M}^\perp, \theta^*) \defeq \{ \Delta \in \mathbb{R}^p | {\cal R}(\Delta_{\bar {\cal M}^\perp} \leq 3 {\cal R}(\Delta_{\bar {\cal M}} + 4 {\cal R}(\theta^*_{{\cal M}^\perp}) \}$, where $\cal R$ is a \emph{decomposable} regularizer with respect to $({\cal M}, \bar {\cal M}^\perp)$, and $({\cal M}, \bar {\cal M})$ are two subspaces such that ${\cal M} \subseteq \bar {\cal M}$. Suppose that $\exists \kappa_{\cal L} > 0, \; \exists \tau_{\cal L}, \; \forall \Delta \in {\cal C}, \; {\cal L}(\theta^* + \Delta) - {\cal L}(\theta^*) - \langle \Delta {\cal L}(\theta^*), \Delta \rangle \geq \kappa_{\cal L} \|\Delta\|^2 - \tau_{\cal L}^2(\theta^*)$. Let $\Psi({\cal M}) \defeq \sup_{u \in {\cal M} \backslash \{0\}} \frac{{\cal R}(u)}{\|u\|}$. Finally suppose that $\lambda \geq 2 {\cal R}(\nabla {\cal L}(\theta^*))$, where ${\cal R}^*$ is the conjugate of ${\cal R}$. Then: $$\|\hat \theta_\lambda - \theta^* \|^2 \leq 9 \frac{\lambda^2}{\kappa_{\cal L}}\Psi^2(\bar {\cal M}) + \frac{\lambda}{\kappa_{\cal L}}\{2 \tau^2_{\cal L}(\theta^*) + 4 {\cal R}(\theta^*_{{\cal M}^\perp}\}$$ \end{lemma} + +\subsection{The Independent Cascade Model} + +Recall that to cast the Independent Cascade model as a Generalized Linear +Cascade, we performed the following change of variables: $\forall i,j +\ \Theta_{i,j} = \log(1 - p_{i,j})$. The previous results hold \emph{a priori} +for the ``effective'' parameter $\Theta$. The following lemma shows they also +hold for the original infection probabilities $p_{i,j}$: + +The results of sections~\ref{sec:main_theorem} and \ref{sec:relaxing_sparsity} +therefore hold for the original transition probabilities $p$. \end{comment} |
