diff options
Diffstat (limited to 'paper/sections/results.tex')
| -rw-r--r-- | paper/sections/results.tex | 84 |
1 files changed, 58 insertions, 26 deletions
diff --git a/paper/sections/results.tex b/paper/sections/results.tex index 8eda03f..e5aa861 100644 --- a/paper/sections/results.tex +++ b/paper/sections/results.tex @@ -35,8 +35,13 @@ write $S\defeq\text{supp}(\theta^*)$ and $s=|S|$. Recall, that $\theta_i$ is the \end{equation} \end{definition} -This condition, well-known in the sparse recovery literature, captures the idea that the binary vectors of active nodes are not \emph{too} colinear with each other, an essentially necessary condition for recovering $\theta$. In fact, the $(S,\gamma)$-{\bf(RE)} assumption is closely linked to the \emph{Fisher Information Matrix}, which captures the amount of information carried by an observable random variable (here the set of active nodes) about an unknown parameter $\theta$ on which the random variable depends. -A discussion of the $(S,\gamma)$-{\bf(RE)} assumption in the context of generalized linear cascade models can be found in Section~\ref{sec:re}. We will also need the following assumption on the inverse link function $f$ of the generalized linear cascade model: +A discussion of the $(S,\gamma)$-{\bf(RE)} assumption in the context of +generalized linear cascade models can be found in Section~\ref{sec:re}. In our +setting we will require that this property holds for the Hessian of the +log-likelihood function $\mathcal{L}$ and essentially captures that the binary +vectors of the set of active nodes are not \emph{too} collinear. + +We will also need the following assumption on the inverse link function $f$ of the generalized linear cascade model: \begin{equation} \tag{LF} \max\left(\left|\frac{f'}{f}(\inprod{\theta^*}{x})\right|, @@ -48,10 +53,16 @@ $f(\inprod{\theta^*}{x})\notin\{0,1\}$. In the voter model, $\frac{f'}{f}(z) = \frac{1}{z}$ and $\frac{f'}{f}(1-z) = \frac{1}{1-z}$. Hence (LF) will hold as soon as $\alpha\leq \Theta_{i,j}\leq -1-\alpha$ for all $(i,j)\in E$. Similarly, in the Independent Cascade Model, -$\frac{f'}{f}(z) = \frac{1}{1-e^z}$ and $\frac{f'}{1-f}(z) = -1$ and (LF) holds -if $p_{i, j}\geq 1-\alpha$ for all $(i, j)\in E$. Remember that in this case we -have $\Theta_{i,j} = \log(1-p_{i,j})$. These assumptions are reasonable: if an edge has a weight very close to 0, then the ``infection'' will never happen along that edge for our set of observations and we can never hope to recover it. +1-\alpha$ for all $(i,j)\in E$ which is always satisfied for some $\alpha$. +Similarly, in the Independent Cascade Model, $\frac{f'}{f}(z) += \frac{1}{1-e^{-z}}$ and $\frac{f'}{1-f}(z) = -1$ and (LF) is not restrictive. + +\begin{comment} +Remember that in this case we have $\Theta_{i,j} = \log(1-p_{i,j})$. These +assumptions are reasonable: if an edge has a weight very close to 0, then the +``infection'' will never happen along that edge for our set of observations and +we can never hope to recover it. +\end{comment} \begin{theorem} \label{thm:main} @@ -69,7 +80,10 @@ $\alpha > 0$. For any $\delta\in(0,1)$, let $\hat{\theta}$ be the solution of \end{equation} \end{theorem} -Theorem~\ref{thm:main} states that, under the two previous assumptions, the decay rate of the error term is ${\cal O}(1/\sqrt{n})$, which is believed to be optimal. Note how we have expressed the bound in the number of measurements $n$, which is different from the number of cascades! For example, in the case of the voter model with horizon time $T$ and for $N$ cascades, we can expect $N\times T$ measurements. +Note that we have expressed the convergence rate in the number of measurements +$n$, which is different from the number of cascades. For example, in the case +of the voter model with horizon time $T$ and for $N$ cascades, we can expect +a number of measurements proportional to $N\times T$. Before moving to the proof of Theorem~\ref{thm:main}, note that interpreting it in the case of the Independent Cascade Model requires one more step. Indeed, to @@ -164,7 +178,10 @@ Choosing $\lambda\defeq 2\sqrt{\frac{\log m}{\alpha n^{1-\delta}}}$ concludes th proof. \end{proof} -Note how the proof of Lemma~\ref{lem:ub} relied crucially on Azuma-Hoeffding's inequality, which allows us to handle correlated observations, and obtain bounds on the number of measurements rather than the number of cascades. We now show how to use Theorem~\ref{thm:main} to recover the support of $\theta^*$, that is, to solve the Graph Inference problem. +Note how the proof of Lemma~\ref{lem:ub} relied crucially on Azuma-Hoeffding's +inequality, which allows us to handle correlated observations. We now show how +to use Theorem~\ref{thm:main} to recover the support of $\theta^*$, that is, to +solve the Graph Inference problem. \begin{corollary} \label{cor:variable_selection} @@ -223,26 +240,32 @@ which is also a consequence of Theorem 1 in \cite{Negahban:2009}. \begin{theorem} \label{thm:approx_sparse} -Let $\theta^*_{\lfloor s \rfloor}$ be the best s-sparse approximation to the true vector $\theta^*$. Suppose the {\bf(RE)} assumption holds for the Hessian $\nabla^2 f(\theta^*)$ and for the following set +Suppose the relaxed {\bf(RE)} assumption holds for the Hessian $\nabla^2 +f(\theta^*)$ and for the following set: \begin{align} \nonumber {\cal C}' \defeq & \{X \in \mathbb{R}^p : \|X_{S^c}\|_1 \leq 3 \|X_S\|_1 + 4 \|\theta^*_{\lfloor s \rfloor}\|_1 \} \\ \nonumber & \cap \{ \|X\|_1 \leq 1 \} \end{align} -By solving \eqref{eq:pre-mle} for $\lambda \defeq 2\sqrt{\frac{\log m}{\alpha n^{1 - \delta} p_{\min}}}$ we have: -\begin{align} -\|\hat p - p^* \|_2 \leq 3 \frac{\sqrt{s}\lambda_n}{\gamma_n} + 2 \sqrt{\frac{\lambda_n}{\gamma_n}} \|p^*_{\lfloor s \rfloor}\|_1 -\end{align} +By solving \eqref{eq:pre-mle} for $\lambda \defeq 2\sqrt{\frac{\log m}{\alpha n^{1 - \delta}}}$ we have: +\begin{align*} + \|\hat \theta - \theta^* \|_2 \leq + \frac{3}{\gamma} \sqrt{\frac{s\log m}{\alpha n^{1-\delta}}} + + \frac{2}{\gamma} \sqrt[4]{\frac{s\log m}{\alpha n^{1-\delta}}} \|\theta^* - \theta^*_{\lfloor s \rfloor}\|_1 +\end{align*} \end{theorem} As before, edge recovery is a consequence of upper-bounding $\|\theta - \hat \theta\|_2$ \begin{corollary} -If $\theta$ is not exactly s-sparse and the number of measurements verifies: -\begin{equation} -n > \frac{36 \| p^*_{\lfloor s\rfloor}\|_1}{p_{\min}\gamma^2 \epsilon^4} s \log m + Under the same assumptions as Theorem~\ref{thm:approx_sparse} the number of + measurements verifies: \begin{equation} + n > \frac{9}{\alpha\gamma^2\epsilon^2}\left(1+ + \frac{4}{\epsilon^2}\| \theta^* - \theta^*_{\lfloor + s\rfloor}\|_1\right)s\log m \end{equation} -then similarly: ${\cal S}^*_{\eta + \epsilon} \subset \hat {\cal S}_\eta \subset {\cal S}^*$ w.h.p. +then similarly: ${\cal S}^*_{\eta + \epsilon} \subset \hat {\cal S}_\eta +\subset {\cal S}^*$ w.p. at least $1-\frac{1}{m}$. \end{corollary} @@ -303,7 +326,7 @@ eigenvalue property, then the finite sample hessian also verifies the restricted eigenvalue property with overwhelming probability. It is straightforward to show this holds when $n \geq C s^2 \log m$ \cite{vandegeer:2009}, where $C$ is an absolute constant. By adapting Theorem -8 \cite{rudelson:13}\footnote{this result still needs to be confirmed!}, this +8 \cite{rudelson:13}}, this can be reduced to: \begin{displaymath} n \geq C s \log m \log^3 \left( \frac{s \log m}{C'} \right) @@ -312,7 +335,9 @@ where $C, C'$ are constants not depending on $(s, m, n)$. \paragraph{(RE) vs Irrepresentability Condition} -\cite{Daneshmand:2014} rely on an `incoherence' condition on the hessian of the likelihood function. It is in fact easy to see that their condition is equivalent to the more commonly called {\it (S,s)-irrepresentability} condition: +\cite{Daneshmand:2014} rely on an `incoherence' condition on the hessian of the +likelihood function. Their condition is equivalent to the more commonly called +{\it (S,s)-irrepresentability} condition: \begin{definition} Following similar notation, let $Q^* \defeq \nabla^2 f(\theta^*)$. Let $S$ and $S^c$ be the set of indices of all the parents and non-parents respectively and $Q_{S,S}$, $Q_{S^c,S}$, $Q_{S, S^c}$, and $Q_{S^c, S^c}$ the induced sub-matrices. Consider the following constant: @@ -322,7 +347,20 @@ Following similar notation, let $Q^* \defeq \nabla^2 f(\theta^*)$. Let $S$ and $ The (S,s)-irrepresentability holds if $\nu_{\text{irrepresentable}}(S) < 1 - \epsilon$ for $\epsilon > 0$ \end{definition} -This condition has been shown to be essentially necessary for variable selection \cite{Zhao:2006}. Several recent papers \cite{vandegeer:2011}, \cite{Zou:2006}, argue this condition is unrealistic in situations where there is a correlation between variables. Consider the following simplified example from \cite{vandegeer:2011}: +It is intuitive that the irrepresentability condition is stronger than the +{\bf(RE)} assumption. In fact, a slightly modified result from +\cite{vandegeer:2009} shows that a `strong' irrepresentability condition +directly {\it implies} the {\bf(RE)} condition for $\ell_2$-recovery: + +\begin{proposition} +\label{prop:irrepresentability} +If the irrepresentability condition holds with $\epsilon > \frac{2}{3}$, then the restricted eigenvalue condition holds with constant $\gamma_n \geq \frac{ (1 - 3(1 -\epsilon))^2 \lambda_{\min}^2}{4s}n$, where $\lambda_{\min} > 0$ is the smallest eigenvalue of $Q^*_{S,S}$, on which the results of \cite{Daneshmand:2014} also depend. +\end{proposition} + +Furthermore, recent papers \cite{vandegeer:2011}, \cite{Zou:2006}, argue that +the irrepresentability condition is unrealistic in situations where there is +a correlation between variables. Consider the following simplified example from +\cite{vandegeer:2011}: \begin{equation} \nonumber \left( @@ -334,12 +372,6 @@ I_s & \rho J \\ \end{equation} where $I_s$ is the $s \times s$ identity matrix, $J$ is the all-ones matrix and $\rho \in \mathbb{R}^+$. It is easy to see that $\nu_{\text{irrepresentable}}(S) = \rho s$ and $\lambda_{\min}(Q) \geq 1 - \rho$, such that for any $\rho > \frac{1}{s}$ and $\rho < 1$, the restricted eigenvalue holds trivially but the (S,s)-irrepresentability does not hold. -It is intuitive that the irrepresentability condition is stronger than the {\bf(RE)} assumption. In fact, a slightly modified result from \cite{vandegeer:2009} shows that a `strong' irrepresentability condition directly {\it implies} the {\bf(RE)} condition for $\ell2$-recovery: - -\begin{proposition} -\label{prop:irrepresentability} -If the irrepresentability condition holds with $\epsilon > \frac{2}{3}$, then the restricted eigenvalue condition holds with constant $\gamma_n \geq \frac{ (1 - 3(1 -\epsilon))^2 \lambda_{\min}^2}{4s}n$, where $\lambda_{\min} > 0$ is the smallest eigenvalue of $Q^*_{S,S}$, on which the results of \cite{Daneshmand:2014} also depend. -\end{proposition} \begin{comment} \begin{lemma} |
