aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorThibaut Horel <thibaut.horel@gmail.com>2015-02-04 18:02:17 -0500
committerThibaut Horel <thibaut.horel@gmail.com>2015-02-04 18:02:17 -0500
commit692c93dcd1be86bae445dfbbbcf2ddb263653c52 (patch)
tree620177bc51d48da221d5dafbc287b3477c0ef3d7
parent0e6ef8ce1055b3a524e2432ffda76f1acceed3d3 (diff)
downloadcascades-692c93dcd1be86bae445dfbbbcf2ddb263653c52.tar.gz
More rewriting of 3.1, move assumptions back in section 3
-rw-r--r--paper/paper.tex6
-rw-r--r--paper/sections/assumptions.tex46
-rw-r--r--paper/sections/results.tex138
3 files changed, 108 insertions, 82 deletions
diff --git a/paper/paper.tex b/paper/paper.tex
index f922b3b..3110252 100644
--- a/paper/paper.tex
+++ b/paper/paper.tex
@@ -108,9 +108,9 @@
\label{sec:lowerbound}
\input{sections/lowerbound.tex}
-\section{Assumptions}
-\label{sec:assumptions}
-\input{sections/assumptions.tex}
+%\section{Assumptions}
+%\label{sec:assumptions}
+%\input{sections/assumptions.tex}
\section{Linear Threshold Model}
\label{sec:linear_threshold}
diff --git a/paper/sections/assumptions.tex b/paper/sections/assumptions.tex
index 33d8e69..e69de29 100644
--- a/paper/sections/assumptions.tex
+++ b/paper/sections/assumptions.tex
@@ -1,46 +0,0 @@
-In this section, we discuss the main assumption of Theorem~\ref{thm:neghaban} namely the restricted eigenvalue condition. We then compare to the irrepresentability condition considered in \cite{Daneshmand:2014}.
-
-\subsection{The Restricted Eigenvalue Condition}
-
-The restricted eigenvalue condition, introduced in \cite{bickel:2009}, is one of the weakest sufficient condition on the design matrix for successful sparse recovery \cite{vandegeer:2009}. Several recent papers show that large classes of correlated designs obey the restricted eigenvalue property with high probability \cite{raskutti:10} \cite{rudelson:13}.
-
-Expressing the minimum restricted eigenvalue $\gamma$ as a function of the cascade model parameters is highly non-trivial. Yet, the restricted eigenvalue property is however well behaved in the following sense: under reasonable assumptions, if the population matrix of the hessian $\mathbb{E} \left[\nabla^2 {\cal L}(\theta) \right]$, corresponding to the \emph{Fisher Information Matrix} of the Cascade Model as a function of $\Theta$, verifies the restricted eigenvalue property, then the finite sample hessian also verifies the restricted eigenvalue property with overwhelming probability. It is straightforward to show this holds when $n \geq C s^2 \log m$ \cite{vandegeer:2009}, where $C$ is an absolute constant. By adapting Theorem 8 \cite{rudelson:13}\footnote{this result still needs to be confirmed!}, this can be reduced to:
-
-$$n \geq C s \log m \log^3 \left( \frac{s \log m}{C'} \right)$$
-
-where $C, C'$ are constants not depending on $(s, m, n)$.
-
-
-\subsection{The Irrepresentability Condition}
-
-\cite{Daneshmand:2014} rely on an `incoherence' condition on the hessian of the likelihood function. It is in fact easy to see that their condition is equivalent to the more commonly called {\it (S,s)-irrepresentability} condition:
-
-\begin{definition}
-Following similar notation, let $Q^* \defeq \nabla^2 f(\theta^*)$. Let $S$ and $S^c$ be the set of indices of all the parents and non-parents respectively and $Q_{S,S}$, $Q_{S^c,S}$, $Q_{S, S^c}$, and $Q_{S^c, S^c}$ the induced sub-matrices. Consider the following constant:
-\begin{equation}
-\nu_{\text{irrepresentable}}(S) \defeq \max_{\tau \in \mathbb{R}^p \ :\ \| \tau \|_{\infty} \leq 1} \|Q_{S^c, S} Q_{S, S}^{-1} \tau\|_{\infty}
-\end{equation}
-The (S,s)-irrepresentability holds if $\nu_{\text{irrepresentable}}(S) < 1 - \epsilon$ for $\epsilon > 0$
-\end{definition}
-
-This condition has been shown to be essentially necessary for variable selection \cite{Zhao:2006}. Several recent papers \cite{vandegeer:2011}, \cite{Zou:2006}, argue this condition is unrealistic in situations where there is a correlation between variables. Consider the following simplified example from \cite{vandegeer:2011}:
-\begin{equation}
-\nonumber
-\left(
-\begin{array}{cccc}
-I_s & \rho J \\
-\rho J & I_s \\
-\end{array}
-\right)
-\end{equation}
-where $I_s$ is the $s \times s$ identity matrix, $J$ is the all-ones matrix and $\rho \in \mathbb{R}^+$. It is easy to see that $\nu_{\text{irrepresentable}}(S) = \rho s$ and $\lambda_{\min}(Q) \geq 1 - \rho$, such that for any $\rho > \frac{1}{s}$ and $\rho < 1$, the restricted eigenvalue holds trivially but the (S,s)-irrepresentability does not hold.
-
-It is intuitive that the irrepresentability condition is stronger than the {\bf(RE)} assumption. In fact, a slightly modified result from \cite{vandegeer:2009} shows that a `strong' irrepresentability condition directly {\it implies} the {\bf(RE)} condition for $\ell2$-recovery:
-
-\begin{proposition}
-\label{prop:irrepresentability}
-If the irrepresentability condition holds with $\epsilon > \frac{2}{3}$, then the restricted eigenvalue condition holds with constant $\gamma_n \geq \frac{ (1 - 3(1 -\epsilon))^2 \lambda_{\min}^2}{4s}n$, where $\lambda_{\min} > 0$ is the smallest eigenvalue of $Q^*_{S,S}$, on which the results of \cite{Daneshmand:2014} also depend.
-\end{proposition}
-
-
-
diff --git a/paper/sections/results.tex b/paper/sections/results.tex
index a948b20..6e67058 100644
--- a/paper/sections/results.tex
+++ b/paper/sections/results.tex
@@ -16,21 +16,6 @@ omit the subscript $i$ in the notations when there is no ambiguity. The
recovery problem is now the one of estimating a single vector $\theta^*$ from
a set $\mathcal{T}$ of observations. We will write $n\defeq |\mathcal{T}|$.
-\begin{comment}
-As mentioned previously, our objective is twofold:
-\begin{enumerate}
- \item We seek to recover the edges of the graph. In other words, for each
- node $j$, we wish to identify the set of indices $\{i: \Theta_{ij} \neq
- 0\}$ .
- \item We seek to recover the edge weights $\Theta$ of ${\cal G}$ i.e.
- upper-bound the error $\|\hat \Theta - \Theta \|_2$.
-\end{enumerate}
-It is easy to see that solving the second objective allows us to solve the
-first. If $\|\hat \Theta - \Theta \|_2$ is sufficiently small, we can recover
-all large coordinates of $\Theta$ by keeping only the coordinates above
-a certain threshold.
-\end{comment}
-
\subsection{Main Theorem}
\label{sec:main_theorem}
@@ -52,7 +37,7 @@ on the now standard \emph{restricted eigenvalue condition}.
\end{definition}
A discussion of this assumption in the context of generalized linear cascade
-models can be found in Section~\ref{sec:}.
+models can be found in Section~\ref{sec:re}.
We will also need the following assumption on the inverse link function $f$ of
the generalized linear cascade model:
@@ -62,11 +47,15 @@ the generalized linear cascade model:
\left|\frac{f'}{1-f}(\inprod{\theta^*}{x})\right|\right)\leq
\frac{1}{\alpha}
\end{equation}
-for all $x\in\reals^m$ such that $f(\inprod{\theta^*}{x})\notin\{0,1\}$.
+for some $\alpha\in(0, 1)$ and for all $x\in\reals^m$ such that
+$f(\inprod{\theta^*}{x})\notin\{0,1\}$.
-It is easy to verify that this will be true if for all
-$(i,j)\in E$, $\delta\leq p_{i,j}\leq 1-\delta$ in the Voter model and when
-$p_{i,j}\leq 1-\delta$ in the Independent Cascade model.
+In the voter model, $\frac{f'}{f}(z) = \frac{1}{z}$ and $\frac{f'}{f}(1-z)
+= \frac{1}{1-z}$. Hence (LF) will hold as soon as $\alpha\leq \Theta_{i,j}\leq
+1-\alpha$ for all $(i,j)\in E$. Similarly, in the Independent Cascade Model,
+$\frac{f'}{f}(z) = \frac{1}{1-e^z}$ and $\frac{f'}{1-f}(z) = -1$ and (LF) holds
+if $p_{i, j}\leq 1-\alpha$ for all $(i, j)\in E$. Remember that in this case we
+have $\Theta_{i,j} = \log(1-p_{i,j})$.
\begin{theorem}
\label{thm:main}
@@ -84,9 +73,27 @@ of \eqref{eq:pre-mle} with $\lambda \defeq 2\sqrt{\frac{\log m}{\alpha n^{1
\end{equation}
\end{theorem}
-This theorem is a consequence of Theorem~1 in \cite{Negahban:2009} which gives
-a bound on the convergence rate of regularized estimators. We state their
-theorem in the context of $\ell_1$ regularization in Lemma~\ref{lem:negahban}.
+Before moving to the proof of Theorem~\ref{thm:main}, note that interpreting it
+in the case of the Independent Cascade Model requires one more step. Indeed, to
+cast it as a generalized linear cascade model, we had to perform the
+transformation $\Theta_{i,j} = \log(1-p_{i,j})$, where $p_{i,j}$ are the
+infection probabilities. Fortunately, if we estimate $p_j}$ through
+$\hat{p}_{j} = \hat{\theta}_j$, a bound on $\|\hat\theta - \theta^*\|_2$
+directly implies a bound on $\|\hat p - p^*\|$. Indeed we have:
+
+\begin{lemma}
+ $\|\hat{\theta} - \theta' \|_2 \geq \|\hat{p} - p^*\|_2$.
+\end{lemma}
+\begin{proof}
+Using the inequality $\forall x>0, \; \log x \geq 1 - \frac{1}{x}$, we have
+$|\log (1 - p) - \log (1-p')| \geq \max(1 - \frac{1-p}{1-p'},
+1 - \frac{1-p'}{1-p}) \geq \max( p-p', p'-p)$.
+\end{proof}
+
+Theorem~\ref{thm:main} is a consequence of Theorem~1 in \cite{Negahban:2009}
+which gives a bound on the convergence rate of regularized estimators. We state
+their theorem in the context of $\ell_1$ regularization in
+Lemma~\ref{lem:negahban}.
\begin{lemma}
\label{lem:negahban}
@@ -208,21 +215,86 @@ then similarly: ${\cal S}^*_{\eta + \epsilon} \subset \hat {\cal S}_\eta \subset
\end{corollary}
-\subsection{The Independent Cascade Model}
-Recall that to cast the Independent Cascade model as a Generalized Linear Cascade, we performed the following change of variables: $\forall i,j \ \Theta_{i,j} = \log(1 - p_{i,j})$. The previous results hold \emph{a priori} for the ``effective'' parameter $\Theta$. The following lemma shows they also hold for the original infection probabilities $p_{i,j}$:
-\begin{lemma}
-\label{lem:theta_p_upperbound}
-$\|\theta - \theta' \|_2 \geq \|p - p^*\|_2$ and $\|\theta_{\lfloor s \rfloor}\|_1 \geq \| p_{\lfloor s \rfloor} \|_1$
-\end{lemma}
-\begin{proof}
-Using the inequality $\forall x>0, \; \log x \geq 1 - \frac{1}{x}$, we have $|\log (1 - p) - \log (1-p')| \geq \max(1 - \frac{1-p}{1-p'}, 1 - \frac{1-p'}{1-p}) \geq \max( p-p', p'-p)$. The second result follows easily from the monotonicity of $\log$ and $\forall x > 0, | \log (1 -x) | \geq x$
-\end{proof}
+\subsection{Restricted Eigenvalue Condition}
+\label{sec:re}
+
+In this section, we discuss the main assumption of Theorem~\ref{thm:main}
+namely the restricted eigenvalue condition. We then compare to the
+irrepresentability condition considered in \cite{Daneshmand:2014}.
+
+\paragraph{Interpretation}
+
+The restricted eigenvalue condition, introduced in \cite{bickel:2009}, is one
+of the weakest sufficient condition on the design matrix for successful sparse
+recovery \cite{vandegeer:2009}. Several recent papers show that large classes
+of correlated designs obey the restricted eigenvalue property with high
+probability \cite{raskutti:10} \cite{rudelson:13}.
+
+
+\paragraph{(RE) with high probability}
-The results of sections~\ref{sec:main_theorem} and \ref{sec:relaxing_sparsity} therefore hold for the original transition probabilities $p$.
+Expressing the minimum restricted eigenvalue $\gamma$ as a function of the
+cascade model parameters is highly non-trivial. Yet, the restricted eigenvalue
+property is however well behaved in the following sense: under reasonable
+assumptions, if the population matrix of the hessian $\mathbb{E} \left[\nabla^2
+{\cal L}(\theta) \right]$, corresponding to the \emph{Fisher Information
+Matrix} of the Cascade Model as a function of $\Theta$, verifies the restricted
+eigenvalue property, then the finite sample hessian also verifies the
+restricted eigenvalue property with overwhelming probability. It is
+straightforward to show this holds when $n \geq C s^2 \log m$
+\cite{vandegeer:2009}, where $C$ is an absolute constant. By adapting Theorem
+8 \cite{rudelson:13}\footnote{this result still needs to be confirmed!}, this
+ can be reduced to:
+\begin{displaymath}
+n \geq C s \log m \log^3 \left( \frac{s \log m}{C'} \right)
+\end{displaymath}
+where $C, C'$ are constants not depending on $(s, m, n)$.
+
+\paragraph{(RE) vs Irrepresentability Condition}
+
+\cite{Daneshmand:2014} rely on an `incoherence' condition on the hessian of the likelihood function. It is in fact easy to see that their condition is equivalent to the more commonly called {\it (S,s)-irrepresentability} condition:
+
+\begin{definition}
+Following similar notation, let $Q^* \defeq \nabla^2 f(\theta^*)$. Let $S$ and $S^c$ be the set of indices of all the parents and non-parents respectively and $Q_{S,S}$, $Q_{S^c,S}$, $Q_{S, S^c}$, and $Q_{S^c, S^c}$ the induced sub-matrices. Consider the following constant:
+\begin{equation}
+\nu_{\text{irrepresentable}}(S) \defeq \max_{\tau \in \mathbb{R}^p \ :\ \| \tau \|_{\infty} \leq 1} \|Q_{S^c, S} Q_{S, S}^{-1} \tau\|_{\infty}
+\end{equation}
+The (S,s)-irrepresentability holds if $\nu_{\text{irrepresentable}}(S) < 1 - \epsilon$ for $\epsilon > 0$
+\end{definition}
+
+This condition has been shown to be essentially necessary for variable selection \cite{Zhao:2006}. Several recent papers \cite{vandegeer:2011}, \cite{Zou:2006}, argue this condition is unrealistic in situations where there is a correlation between variables. Consider the following simplified example from \cite{vandegeer:2011}:
+\begin{equation}
+\nonumber
+\left(
+\begin{array}{cccc}
+I_s & \rho J \\
+\rho J & I_s \\
+\end{array}
+\right)
+\end{equation}
+where $I_s$ is the $s \times s$ identity matrix, $J$ is the all-ones matrix and $\rho \in \mathbb{R}^+$. It is easy to see that $\nu_{\text{irrepresentable}}(S) = \rho s$ and $\lambda_{\min}(Q) \geq 1 - \rho$, such that for any $\rho > \frac{1}{s}$ and $\rho < 1$, the restricted eigenvalue holds trivially but the (S,s)-irrepresentability does not hold.
+
+It is intuitive that the irrepresentability condition is stronger than the {\bf(RE)} assumption. In fact, a slightly modified result from \cite{vandegeer:2009} shows that a `strong' irrepresentability condition directly {\it implies} the {\bf(RE)} condition for $\ell2$-recovery:
+
+\begin{proposition}
+\label{prop:irrepresentability}
+If the irrepresentability condition holds with $\epsilon > \frac{2}{3}$, then the restricted eigenvalue condition holds with constant $\gamma_n \geq \frac{ (1 - 3(1 -\epsilon))^2 \lambda_{\min}^2}{4s}n$, where $\lambda_{\min} > 0$ is the smallest eigenvalue of $Q^*_{S,S}$, on which the results of \cite{Daneshmand:2014} also depend.
+\end{proposition}
\begin{comment}
\begin{lemma}
Let ${\cal C}({\cal M}, \bar {\cal M}^\perp, \theta^*) \defeq \{ \Delta \in \mathbb{R}^p | {\cal R}(\Delta_{\bar {\cal M}^\perp} \leq 3 {\cal R}(\Delta_{\bar {\cal M}} + 4 {\cal R}(\theta^*_{{\cal M}^\perp}) \}$, where $\cal R$ is a \emph{decomposable} regularizer with respect to $({\cal M}, \bar {\cal M}^\perp)$, and $({\cal M}, \bar {\cal M})$ are two subspaces such that ${\cal M} \subseteq \bar {\cal M}$. Suppose that $\exists \kappa_{\cal L} > 0, \; \exists \tau_{\cal L}, \; \forall \Delta \in {\cal C}, \; {\cal L}(\theta^* + \Delta) - {\cal L}(\theta^*) - \langle \Delta {\cal L}(\theta^*), \Delta \rangle \geq \kappa_{\cal L} \|\Delta\|^2 - \tau_{\cal L}^2(\theta^*)$. Let $\Psi({\cal M}) \defeq \sup_{u \in {\cal M} \backslash \{0\}} \frac{{\cal R}(u)}{\|u\|}$. Finally suppose that $\lambda \geq 2 {\cal R}(\nabla {\cal L}(\theta^*))$, where ${\cal R}^*$ is the conjugate of ${\cal R}$. Then: $$\|\hat \theta_\lambda - \theta^* \|^2 \leq 9 \frac{\lambda^2}{\kappa_{\cal L}}\Psi^2(\bar {\cal M}) + \frac{\lambda}{\kappa_{\cal L}}\{2 \tau^2_{\cal L}(\theta^*) + 4 {\cal R}(\theta^*_{{\cal M}^\perp}\}$$
\end{lemma}
+
+\subsection{The Independent Cascade Model}
+
+Recall that to cast the Independent Cascade model as a Generalized Linear
+Cascade, we performed the following change of variables: $\forall i,j
+\ \Theta_{i,j} = \log(1 - p_{i,j})$. The previous results hold \emph{a priori}
+for the ``effective'' parameter $\Theta$. The following lemma shows they also
+hold for the original infection probabilities $p_{i,j}$:
+
+The results of sections~\ref{sec:main_theorem} and \ref{sec:relaxing_sparsity}
+therefore hold for the original transition probabilities $p$.
\end{comment}