aboutsummaryrefslogtreecommitdiffstats
path: root/paper/sections/results.tex
diff options
context:
space:
mode:
Diffstat (limited to 'paper/sections/results.tex')
-rw-r--r--paper/sections/results.tex84
1 files changed, 58 insertions, 26 deletions
diff --git a/paper/sections/results.tex b/paper/sections/results.tex
index 8eda03f..e5aa861 100644
--- a/paper/sections/results.tex
+++ b/paper/sections/results.tex
@@ -35,8 +35,13 @@ write $S\defeq\text{supp}(\theta^*)$ and $s=|S|$. Recall, that $\theta_i$ is the
\end{equation}
\end{definition}
-This condition, well-known in the sparse recovery literature, captures the idea that the binary vectors of active nodes are not \emph{too} colinear with each other, an essentially necessary condition for recovering $\theta$. In fact, the $(S,\gamma)$-{\bf(RE)} assumption is closely linked to the \emph{Fisher Information Matrix}, which captures the amount of information carried by an observable random variable (here the set of active nodes) about an unknown parameter $\theta$ on which the random variable depends.
-A discussion of the $(S,\gamma)$-{\bf(RE)} assumption in the context of generalized linear cascade models can be found in Section~\ref{sec:re}. We will also need the following assumption on the inverse link function $f$ of the generalized linear cascade model:
+A discussion of the $(S,\gamma)$-{\bf(RE)} assumption in the context of
+generalized linear cascade models can be found in Section~\ref{sec:re}. In our
+setting we will require that this property holds for the Hessian of the
+log-likelihood function $\mathcal{L}$ and essentially captures that the binary
+vectors of the set of active nodes are not \emph{too} collinear.
+
+We will also need the following assumption on the inverse link function $f$ of the generalized linear cascade model:
\begin{equation}
\tag{LF}
\max\left(\left|\frac{f'}{f}(\inprod{\theta^*}{x})\right|,
@@ -48,10 +53,16 @@ $f(\inprod{\theta^*}{x})\notin\{0,1\}$.
In the voter model, $\frac{f'}{f}(z) = \frac{1}{z}$ and $\frac{f'}{f}(1-z)
= \frac{1}{1-z}$. Hence (LF) will hold as soon as $\alpha\leq \Theta_{i,j}\leq
-1-\alpha$ for all $(i,j)\in E$. Similarly, in the Independent Cascade Model,
-$\frac{f'}{f}(z) = \frac{1}{1-e^z}$ and $\frac{f'}{1-f}(z) = -1$ and (LF) holds
-if $p_{i, j}\geq 1-\alpha$ for all $(i, j)\in E$. Remember that in this case we
-have $\Theta_{i,j} = \log(1-p_{i,j})$. These assumptions are reasonable: if an edge has a weight very close to 0, then the ``infection'' will never happen along that edge for our set of observations and we can never hope to recover it.
+1-\alpha$ for all $(i,j)\in E$ which is always satisfied for some $\alpha$.
+Similarly, in the Independent Cascade Model, $\frac{f'}{f}(z)
+= \frac{1}{1-e^{-z}}$ and $\frac{f'}{1-f}(z) = -1$ and (LF) is not restrictive.
+
+\begin{comment}
+Remember that in this case we have $\Theta_{i,j} = \log(1-p_{i,j})$. These
+assumptions are reasonable: if an edge has a weight very close to 0, then the
+``infection'' will never happen along that edge for our set of observations and
+we can never hope to recover it.
+\end{comment}
\begin{theorem}
\label{thm:main}
@@ -69,7 +80,10 @@ $\alpha > 0$. For any $\delta\in(0,1)$, let $\hat{\theta}$ be the solution of
\end{equation}
\end{theorem}
-Theorem~\ref{thm:main} states that, under the two previous assumptions, the decay rate of the error term is ${\cal O}(1/\sqrt{n})$, which is believed to be optimal. Note how we have expressed the bound in the number of measurements $n$, which is different from the number of cascades! For example, in the case of the voter model with horizon time $T$ and for $N$ cascades, we can expect $N\times T$ measurements.
+Note that we have expressed the convergence rate in the number of measurements
+$n$, which is different from the number of cascades. For example, in the case
+of the voter model with horizon time $T$ and for $N$ cascades, we can expect
+a number of measurements proportional to $N\times T$.
Before moving to the proof of Theorem~\ref{thm:main}, note that interpreting it
in the case of the Independent Cascade Model requires one more step. Indeed, to
@@ -164,7 +178,10 @@ Choosing $\lambda\defeq 2\sqrt{\frac{\log m}{\alpha n^{1-\delta}}}$ concludes th
proof.
\end{proof}
-Note how the proof of Lemma~\ref{lem:ub} relied crucially on Azuma-Hoeffding's inequality, which allows us to handle correlated observations, and obtain bounds on the number of measurements rather than the number of cascades. We now show how to use Theorem~\ref{thm:main} to recover the support of $\theta^*$, that is, to solve the Graph Inference problem.
+Note how the proof of Lemma~\ref{lem:ub} relied crucially on Azuma-Hoeffding's
+inequality, which allows us to handle correlated observations. We now show how
+to use Theorem~\ref{thm:main} to recover the support of $\theta^*$, that is, to
+solve the Graph Inference problem.
\begin{corollary}
\label{cor:variable_selection}
@@ -223,26 +240,32 @@ which is also a consequence of Theorem 1 in \cite{Negahban:2009}.
\begin{theorem}
\label{thm:approx_sparse}
-Let $\theta^*_{\lfloor s \rfloor}$ be the best s-sparse approximation to the true vector $\theta^*$. Suppose the {\bf(RE)} assumption holds for the Hessian $\nabla^2 f(\theta^*)$ and for the following set
+Suppose the relaxed {\bf(RE)} assumption holds for the Hessian $\nabla^2
+f(\theta^*)$ and for the following set:
\begin{align}
\nonumber
{\cal C}' \defeq & \{X \in \mathbb{R}^p : \|X_{S^c}\|_1 \leq 3 \|X_S\|_1 + 4 \|\theta^*_{\lfloor s \rfloor}\|_1 \} \\ \nonumber
& \cap \{ \|X\|_1 \leq 1 \}
\end{align}
-By solving \eqref{eq:pre-mle} for $\lambda \defeq 2\sqrt{\frac{\log m}{\alpha n^{1 - \delta} p_{\min}}}$ we have:
-\begin{align}
-\|\hat p - p^* \|_2 \leq 3 \frac{\sqrt{s}\lambda_n}{\gamma_n} + 2 \sqrt{\frac{\lambda_n}{\gamma_n}} \|p^*_{\lfloor s \rfloor}\|_1
-\end{align}
+By solving \eqref{eq:pre-mle} for $\lambda \defeq 2\sqrt{\frac{\log m}{\alpha n^{1 - \delta}}}$ we have:
+\begin{align*}
+ \|\hat \theta - \theta^* \|_2 \leq
+ \frac{3}{\gamma} \sqrt{\frac{s\log m}{\alpha n^{1-\delta}}}
+ + \frac{2}{\gamma} \sqrt[4]{\frac{s\log m}{\alpha n^{1-\delta}}} \|\theta^* - \theta^*_{\lfloor s \rfloor}\|_1
+\end{align*}
\end{theorem}
As before, edge recovery is a consequence of upper-bounding $\|\theta - \hat \theta\|_2$
\begin{corollary}
-If $\theta$ is not exactly s-sparse and the number of measurements verifies:
-\begin{equation}
-n > \frac{36 \| p^*_{\lfloor s\rfloor}\|_1}{p_{\min}\gamma^2 \epsilon^4} s \log m
+ Under the same assumptions as Theorem~\ref{thm:approx_sparse} the number of
+ measurements verifies: \begin{equation}
+ n > \frac{9}{\alpha\gamma^2\epsilon^2}\left(1+
+ \frac{4}{\epsilon^2}\| \theta^* - \theta^*_{\lfloor
+ s\rfloor}\|_1\right)s\log m
\end{equation}
-then similarly: ${\cal S}^*_{\eta + \epsilon} \subset \hat {\cal S}_\eta \subset {\cal S}^*$ w.h.p.
+then similarly: ${\cal S}^*_{\eta + \epsilon} \subset \hat {\cal S}_\eta
+\subset {\cal S}^*$ w.p. at least $1-\frac{1}{m}$.
\end{corollary}
@@ -303,7 +326,7 @@ eigenvalue property, then the finite sample hessian also verifies the
restricted eigenvalue property with overwhelming probability. It is
straightforward to show this holds when $n \geq C s^2 \log m$
\cite{vandegeer:2009}, where $C$ is an absolute constant. By adapting Theorem
-8 \cite{rudelson:13}\footnote{this result still needs to be confirmed!}, this
+8 \cite{rudelson:13}}, this
can be reduced to:
\begin{displaymath}
n \geq C s \log m \log^3 \left( \frac{s \log m}{C'} \right)
@@ -312,7 +335,9 @@ where $C, C'$ are constants not depending on $(s, m, n)$.
\paragraph{(RE) vs Irrepresentability Condition}
-\cite{Daneshmand:2014} rely on an `incoherence' condition on the hessian of the likelihood function. It is in fact easy to see that their condition is equivalent to the more commonly called {\it (S,s)-irrepresentability} condition:
+\cite{Daneshmand:2014} rely on an `incoherence' condition on the hessian of the
+likelihood function. Their condition is equivalent to the more commonly called
+{\it (S,s)-irrepresentability} condition:
\begin{definition}
Following similar notation, let $Q^* \defeq \nabla^2 f(\theta^*)$. Let $S$ and $S^c$ be the set of indices of all the parents and non-parents respectively and $Q_{S,S}$, $Q_{S^c,S}$, $Q_{S, S^c}$, and $Q_{S^c, S^c}$ the induced sub-matrices. Consider the following constant:
@@ -322,7 +347,20 @@ Following similar notation, let $Q^* \defeq \nabla^2 f(\theta^*)$. Let $S$ and $
The (S,s)-irrepresentability holds if $\nu_{\text{irrepresentable}}(S) < 1 - \epsilon$ for $\epsilon > 0$
\end{definition}
-This condition has been shown to be essentially necessary for variable selection \cite{Zhao:2006}. Several recent papers \cite{vandegeer:2011}, \cite{Zou:2006}, argue this condition is unrealistic in situations where there is a correlation between variables. Consider the following simplified example from \cite{vandegeer:2011}:
+It is intuitive that the irrepresentability condition is stronger than the
+{\bf(RE)} assumption. In fact, a slightly modified result from
+\cite{vandegeer:2009} shows that a `strong' irrepresentability condition
+directly {\it implies} the {\bf(RE)} condition for $\ell_2$-recovery:
+
+\begin{proposition}
+\label{prop:irrepresentability}
+If the irrepresentability condition holds with $\epsilon > \frac{2}{3}$, then the restricted eigenvalue condition holds with constant $\gamma_n \geq \frac{ (1 - 3(1 -\epsilon))^2 \lambda_{\min}^2}{4s}n$, where $\lambda_{\min} > 0$ is the smallest eigenvalue of $Q^*_{S,S}$, on which the results of \cite{Daneshmand:2014} also depend.
+\end{proposition}
+
+Furthermore, recent papers \cite{vandegeer:2011}, \cite{Zou:2006}, argue that
+the irrepresentability condition is unrealistic in situations where there is
+a correlation between variables. Consider the following simplified example from
+\cite{vandegeer:2011}:
\begin{equation}
\nonumber
\left(
@@ -334,12 +372,6 @@ I_s & \rho J \\
\end{equation}
where $I_s$ is the $s \times s$ identity matrix, $J$ is the all-ones matrix and $\rho \in \mathbb{R}^+$. It is easy to see that $\nu_{\text{irrepresentable}}(S) = \rho s$ and $\lambda_{\min}(Q) \geq 1 - \rho$, such that for any $\rho > \frac{1}{s}$ and $\rho < 1$, the restricted eigenvalue holds trivially but the (S,s)-irrepresentability does not hold.
-It is intuitive that the irrepresentability condition is stronger than the {\bf(RE)} assumption. In fact, a slightly modified result from \cite{vandegeer:2009} shows that a `strong' irrepresentability condition directly {\it implies} the {\bf(RE)} condition for $\ell2$-recovery:
-
-\begin{proposition}
-\label{prop:irrepresentability}
-If the irrepresentability condition holds with $\epsilon > \frac{2}{3}$, then the restricted eigenvalue condition holds with constant $\gamma_n \geq \frac{ (1 - 3(1 -\epsilon))^2 \lambda_{\min}^2}{4s}n$, where $\lambda_{\min} > 0$ is the smallest eigenvalue of $Q^*_{S,S}$, on which the results of \cite{Daneshmand:2014} also depend.
-\end{proposition}
\begin{comment}
\begin{lemma}