diff options
| author | jeanpouget-abadie <jean.pougetabadie@gmail.com> | 2015-01-25 15:51:10 -0500 |
|---|---|---|
| committer | jeanpouget-abadie <jean.pougetabadie@gmail.com> | 2015-01-25 15:51:10 -0500 |
| commit | 889412dfb1e698af483cbdd363b9358ce2fd2a4c (patch) | |
| tree | 9671aa842ac4c7bfbc3356ec4edeabb8fd68dd4f /paper | |
| parent | 8e6df4714495efc3bb482da5471023283c48b7ef (diff) | |
| download | cascades-889412dfb1e698af483cbdd363b9358ce2fd2a4c.tar.gz | |
small push
Diffstat (limited to 'paper')
| -rw-r--r-- | paper/paper.tex | 3 | ||||
| -rw-r--r-- | paper/sections/results.tex | 18 |
2 files changed, 12 insertions, 9 deletions
diff --git a/paper/paper.tex b/paper/paper.tex index c546ed7..068e5d2 100644 --- a/paper/paper.tex +++ b/paper/paper.tex @@ -100,6 +100,9 @@ blabla \section{Results} \input{sections/results.tex} +\section{Assumptions} +\label{sec:assumptions} +\input{sections/assumptions.tex} \section{Experiments} diff --git a/paper/sections/results.tex b/paper/sections/results.tex index 7fbc25b..eadc0ba 100644 --- a/paper/sections/results.tex +++ b/paper/sections/results.tex @@ -1,6 +1,6 @@ -There have been a series of papers arguing that the Lasso is an inappropriate variable selection method (see H.Zou and T.Hastie, Sarah van de Geer ...). In fact, the irrepresentability condition, though essentially necessary for variable selection, rarely holds in practical situations where correlation between variable occurs. We defer an extended analysis of this situation to section ... +There have been a series of papers arguing that the Lasso is an inappropriate variable selection method (see H.Zou and T.Hastie, Sarah van de Geer ...). In fact, the irrepresentability condition, though essentially necessary for variable selection, rarely holds in practical situations where correlation between variable occurs. We defer an extended analysis of this situation to Section~\ref{sec:assumptions}. -Our approach is different. Rather than trying to perform variable selection by finding $\{j: \theta_j \neq 0\}$, we seek to obtain oracle inequalities by upper-bounding $\|\hat \theta - \theta^* \|_2$. It is easy to see that by thresholding $\hat \theta$, one recovers all `strong' parents with no false positives, as shown in Theorem~\ref{...}. Interestingly, obtaining oracle inequalities depends on a the following restricted eigenvalue condition\footnote{which is less restrictive? Irrepresentability implies compatibility? But do we have comptability and do they have irrepresentability?} for a symetric matrix $\Sigma$ and set ${\cal C} := \{X \in \mathbb{R}^p : \|X_{S^c}\|_1 \leq 3 \|X_S\|_1 \}$ +Our approach is different. Rather than trying to perform variable selection by finding $\{j: \theta_j \neq 0\}$, we seek to obtain oracle inequalities by upper-bounding $\|\hat \theta - \theta^* \|_2$. It is easy to see that by thresholding $\hat \theta$, one recovers all `strong' parents without false positives, as shown in corollary~\ref{cor:variable_selection}. Interestingly, obtaining oracle inequalities depends on a the following restricted eigenvalue condition\footnote{which is less restrictive? Irrepresentability implies compatibility? But do we have comptability and do they have irrepresentability?} for a symetric matrix $\Sigma$ and set ${\cal C} := \{X \in \mathbb{R}^p : \|X_{S^c}\|_1 \leq 3 \|X_S\|_1 \}$ \begin{equation} \nonumber @@ -10,7 +10,7 @@ Our approach is different. Rather than trying to perform variable selection by f We cite the following Theorem from \cite{Negahban:2009}: \begin{theorem} -\label{theorem:neghaban} +\label{thm:neghaban} Suppose the true vector $\theta^*$ has support S of size s and the {\bf(RE)} assumption holds for the Hessian $\nabla^2 f(\theta^*)$, then by solving Eq.~\ref{...} for $\lambda_n \geq 2 \|\nabla f(\theta^*)\|_{\infty}$ we have: \begin{equation} \|\hat \theta - \theta^* \|_2 \leq \frac{\sqrt{s}\lambda_n}{\gamma_n} @@ -21,11 +21,11 @@ Suppose the true vector $\theta^*$ has support S of size s and the {\bf(RE)} ass We analyse the previous conditions in the case of the Independent Cascade model. Lemma 1. provides a ${\cal O}(\sqrt{n})$-upper-bound w.h.p. on $\|\nabla f(\theta^*)\|$ \begin{lemma} -\label{lemma:icc_lambda_upper_bound} +\label{lem:icc_lambda_upper_bound} For any $\delta > 0$, with probability $1-e^{-n^\delta \log m}$, $\|\nabla f(\theta^*)\|_{\infty} \leq 2 \sqrt{\frac{n^{\delta + 1} \log m}{p_{\min}}}$ \end{lemma} -The following corollaries follows immediately from Theorem~\ref{theorem:neghaban} and Lemma~\ref{lemma:icc_lambda_upper_bound}. +The following corollaries follows immediately from Theorem~\ref{thm:neghaban} and Lemma~\ref{lem:icc_lambda_upper_bound}. \begin{corollary} Assume that ${\bf (RE)}$ holds with $\gamma_n = n \gamma$ for $\gamma > 0$. Then for $\lambda_n := 2 \sqrt{\frac{n^{\delta + 1} \log m}{p_{\min}}}$ and with probability $1-e^{n^\delta \log p}$ @@ -37,16 +37,16 @@ Assume that ${\bf (RE)}$ holds with $\gamma_n = n \gamma$ for $\gamma > 0$. Then The following corollary follows trivially and gives the first $\Omega(s \log p)$ algorithm for graph reconstruction on general graphs. \begin{corollary} -Assume that ${\bf (RE)}$ holds with $\gamma_n = n \gamma$ for $\gamma > 0$. Suppose that after solving Eq.~\ref{eq:mle}, we construct the set $\hat {\cal S}_\eta := \{ i \in [1..p] : \hat \theta_i > \eta\}$ for $\eta > 0$. Suppose the number of measurements verifies: +\label{cor:variable_selection} +Assume that ${\bf (RE)}$ holds with $\gamma_n = n \gamma$ for $\gamma > 0$. Suppose that after solving Eq.~\ref{eq:mle}, we construct the set $\hat {\cal S}_\eta := \{ j \in [1..p] : \hat \theta_j > \eta\}$ for $\eta > 0$. Suppose the number of measurements verifies: \begin{equation} n > \frac{4}{\gamma^2 p_{\min} \eta^2} s \log p \end{equation} -Then $\hat {\cal S}_\eta = {\cal S}^*_{2\eta}$, where ${\cal S}^*_{2\eta} = \{ i \in [1..p] :\theta^*_i > 2 \eta \} $. In other words, we incur no false positives (false parents) and recover all `strong' parents, such that $\theta^*_j > 2 \eta$. +Then with probability $1-e^{n^\delta \log p}$, $\hat {\cal S}_\eta = {\cal S}^*_{2\eta}$, where ${\cal S}^*_{2\eta} = \{ j \in [1..p] :\theta^*_j > 2 \eta \} $. In other words, we incur no false positives (false parents) and recover all `strong' parents such that $\theta^*_j > 2 \eta$. \end{corollary} -Notice that by choosing $\eta < \frac{p_{\min}}{2}$, we recover all parents exactly. Note that $n$ is the number of measurements and not the number of cascades. This is an important improvement over prior work since we expect several measurements per cascade. We claim that graph recovery is better understood as a function of $n$, the cumulative number of steps in each cascade, rather than as a function $N$, the number of individual cascades. +Note that $n$ is the number of measurements and not the number of cascades. This is an important improvement over prior work since we expect several measurements per cascade. We claim that graph recovery is better understood as a function of $n$, the cumulative number of steps in each cascade, rather than as a function $N$, the number of individual cascades. -Unfortunately, proving the {\bf (RE)} assumption for correlated measurements is highly non-trivial. We can show a very crude ${\cal O}(N)$-lower bound for $\gamma_n$ by exploiting only the first set of measurements, where only the source nodes are active. Note that even though we waste a lot of information, we obtain similar asymptotic behavior than previous work. \subsection{Linear Threshold Model} |
