diff options
Diffstat (limited to 'paper')
| -rw-r--r-- | paper/paper.tex | 1 | ||||
| -rw-r--r-- | paper/sections/appendix.tex | 12 | ||||
| -rw-r--r-- | paper/sections/assumptions.tex | 32 | ||||
| -rw-r--r-- | paper/sections/results.tex | 26 |
4 files changed, 52 insertions, 19 deletions
diff --git a/paper/paper.tex b/paper/paper.tex index d2fdcb9..36eccc7 100644 --- a/paper/paper.tex +++ b/paper/paper.tex @@ -63,6 +63,7 @@ \newtheorem{corollary}{Corollary} \newtheorem{remark}{Remark} \newtheorem{proposition}{Proposition} +\newtheorem{definition}{Definition} \begin{document} diff --git a/paper/sections/appendix.tex b/paper/sections/appendix.tex index 80692d0..65576bb 100644 --- a/paper/sections/appendix.tex +++ b/paper/sections/appendix.tex @@ -1,9 +1,9 @@ -\subsection*{Proof of support recovery lemma} +\subsection{Proof of support recovery lemma} -\subsection*{Upper-bound for $\|\nabla f(\theta^*)\|_\infty$} +\subsection{Upper-bound for $\|\nabla f(\theta^*)\|_\infty$} -We show an upper-bound for $ 2 \|\nabla f(\theta^*) \|_\infty$. If we only consider the first measurement of every cascade, this can be done easily. Let $N$ be the number of cascades (to distinguish from $n$ number of total measurements). Begin by noting that: +We show an upper-bound for $ 2 \|\nabla f(\theta^*) \|_\infty$. If we only consider the first measurement of every cascade, this can be done easily. Let $N$ be the number of cascades (to distinguish from $n$ number of total measurements). Begin by noting that for a node $\alpha$: \begin{align} \nabla_k f(\theta) & = \sum^n_{i=1} \frac{b^i x^i_k}{1 - e^{\langle x^i, \theta \rangle}} - \sum^n_{i=1} x^i_k \nonumber \\ @@ -19,7 +19,7 @@ $\nabla f(\theta^*)$ is a $N/p_{\min}$-subgaussian variable, where $p_{\min}$ is We show that its expectation is $0$ by conditioning on the seed set $S$: \begin{align} -\mathbb{E} \left( \nabla_k f(\theta) \right) & = \sum_{i=1}^N \mathbb{E} \left[ x^i_k \left( \frac{b^i}{\mathbb{P}(\text{node } \alpha \text { infected})} - 1\right) \right] \nonumber \\ +\mathbb{E} \left( \nabla_k f(\theta) \right) & = \sum_{i=1}^N \mathbb{E} \left[ x^i_k \left( \frac{b^i}{\mathbb{P}(\alpha \text { infected})} - 1\right) \right] \nonumber \\ & = \sum_{i=1}^N \mathbb{E}_S \left[ \mathbb{E}\left[x^i_k \left( \frac{b^i}{\mathbb{P}(\alpha \text { infected})} - 1\right) \middle| S \right] \right] \nonumber \\ & = \sum_{i=1}^N \mathbb{E}\left[x^i_k \left( \frac{ \mathbb{E}_S \left[ b^i \middle| S \right]}{\mathbb{P}(\alpha \text { infected})} - 1\right) \right] \nonumber \\ & = 0 \nonumber @@ -30,7 +30,7 @@ Therefore, $\nabla f(\theta^*)$ is the sum of zero-mean variables, upper-bounded By union bound and characterization of sub-gaussian variables: \begin{equation} -\mathbb{P}(\| \nabla f(\theta) \|_{\infty} > \lambda) \leq 2 \exp \left( -\frac{\lambda^2 p_{\min}}{2n} + \log p \right) +\mathbb{P}(\| \nabla f(\theta) \|_{\infty} > \lambda) \leq 2 \exp \left( -\frac{\lambda^2 p_{\min}}{2n} + \log m \right) \end{equation} -In conclusion, for $\delta>0$, $\lambda := 2 \sqrt{\frac{n^{\delta + 1} \log p}{p_{\min}}}$ is a valid regularizer with probability $1 - \exp(-n^\delta \log p )$ +In conclusion, for $\delta>0$, $\lambda := 2 \sqrt{\frac{n^{\delta + 1} \log m}{p_{\min}}}$ is a valid regularizer with probability $1 - \exp(-n^\delta \log m )$ diff --git a/paper/sections/assumptions.tex b/paper/sections/assumptions.tex index d6fed83..1f88494 100644 --- a/paper/sections/assumptions.tex +++ b/paper/sections/assumptions.tex @@ -1,13 +1,33 @@ -\subsection{The Restricted Eigenvalue Condition} +\subsection{The Irrepresentability Condition} -Proving the restricted eigenvalue assumption for correlated measurements is non-trivial. Under reasonable assumptions on the graph parameters, we can show a very crude ${\cal O}(N)$-lower bound for $\gamma_n$ by exploiting only the first set of measurements, where only the source nodes are active. Note that even though we waste a lot of information, we obtain similar asymptotic behavior than previous work. +\cite{Daneshmand:2014} rely on an `incoherence' condition on the hessian of the likelihood function. It is in fact easy to see that their condition is equivalent to the more commonly called {\it (S, s)-irrepresentability} condition: -\subsection{The Irrepresentability Condition} +\begin{definition} +Following similar notation, let $Q^* \nabla^2 f(\theta^*)$. Let $Q_{S^C,S} XXX$, the {\it (S, s)-irrepresentability} condition is defined as: +\begin{equation} +blabla +\end{equation} +\end{definition} -If our objective is to recover only the support of the graph and to do it exactly , then the irrepresentability condition\footnote{{\it incoherence} in the text} used in \cite{Daneshmand:2014} is believed to be necessary. +If our objective is to recover the support of the graph exactly, the irrepresentability condition has been shown to be essentially necessary \cite{Zhao:2006}. However, several recent papers \cite{vandegeer:2011}, \cite{Zou:2006}, argue this condition is unrealistic in situations where there is a correlation between variables. -However, assuming we only wish to recover all edges above a certain threshold, bounding the $\ell2$-error allows us to recover the full graph above a certain minimum threshold. Furthermore, it is intuitive that the irrepresentability condition is stronger than our suggested {\bf(RE)} assumption. In fact, by adapting slightly the following result from \cite{vandegeer:2009}, we can show that a `strong' irrepresentability condition directly {\it implies} the {\bf(RE)} condition for $\ell2$-recovery +Consider for example the following matrix: + +Yet, assuming we only wish to recover all edges above a certain threshold, bounding the $\ell2$-error allows us to recover all edges with weights above a certain minimum threshold under an intuitively weaker {\bf(RE)} condition. In practical scenarios, such as in social networks, where one seeks to recover significant edges, this is a reasonable assumption. + +As mentioned previously, it is intuitive that the irrepresentability condition is stronger than our suggested {\bf(RE)} assumption. In fact, by adapting slightly the following result from \cite{vandegeer:2009}, we can show that a `strong' irrepresentability condition directly {\it implies} the {\bf(RE)} condition for $\ell2$-recovery \begin{proposition} If the irrepresentability condition holds with $\epsilon > \frac{2}{3}$, then the restricted eigenvalue condition holds with constant: $(1 - 3\epsilon)^2 \lambda_{\min}n/s$ -\end{proposition}
\ No newline at end of file +\end{proposition} + + +\subsection{The Restricted Eigenvalue Condition} + +Expressing the restricted eigenvalue assumption for correlated measurements is non-trivial as parameters of the graph is non-trivial. Under reasonable assumptions on the graph parameters, we can show a very crude ${\cal O}(N)$-lower bound for $\gamma_n$ by exploiting only the first set of measurements, where only the source nodes are active. Note that even though we waste a lot of information, we obtain similar asymptotic behavior than previous work. + +Similarly to the analysis conducted in \cite{Daneshmand:2014}, we can show that if the eigenvalue can be showed to hold for the `expected' hessian, it can be showed to hold for the hessian itself. + +\begin{proposition} +If result holds for the expected hessian, then it holds for the hessian! +\end{proposition} diff --git a/paper/sections/results.tex b/paper/sections/results.tex index 5c2fe5f..449ae02 100644 --- a/paper/sections/results.tex +++ b/paper/sections/results.tex @@ -74,25 +74,37 @@ We analyse the previous conditions in the case of the Independent Cascade model. For any $\delta > 0$, with probability $1-e^{-n^\delta \log m}$, $\|\nabla f(\theta^*)\|_{\infty} \leq 2 \sqrt{\frac{n^{\delta + 1} \log m}{p_{\min}}}$ \end{lemma} -We include a proof of this result in the Appendix. The following corollaries follow immediately from Theorem~\ref{thm:neghaban} and Lemma~\ref{lem:icc_lambda_upper_bound}. +We include a proof of this result in the Appendix. The following corollaries follow immediately from Theorem~\ref{thm:neghaban}, Theorem~\ref{thm:approx_sparse} and Lemma~\ref{lem:icc_lambda_upper_bound}: \begin{corollary} -Assume that ${\bf (RE)}$ holds with $\gamma_n = n \gamma$ for $\gamma > 0$. Then for $\lambda_n := 2 \sqrt{\frac{n^{\delta + 1} \log m}{p_{\min}}}$ and with probability $1-e^{n^\delta \log m}$ +Assume that ${\bf (RE)}$ holds with $\gamma_n = n \gamma$ for $\gamma > 0$. Then for $\lambda_n \defeq 2 \sqrt{\frac{n^{\delta + 1} \log m}{p_{\min}}}$ and with probability $1-e^{n^\delta \log m}$: \begin{equation} -\|\hat p - p^* \|_2 \leq \frac{2}{\gamma} \sqrt{\frac{s \log m}{p_{\min} n^{1-\delta}}} +\|\hat p - p^* \|_2 \leq \frac{3}{\gamma} \sqrt{\frac{s \log m}{p_{\min} n^{1-\delta}}} + 2 \sqrt[4]{\frac{\log m}{n^{1-\delta} \gamma^2 p_{\min}}} \| p^*_{\lfloor s \rfloor} \|_1 +\end{equation} +Note that if $p$ is exactly s-sparse, $\| p^*_{\lfloor s \rfloor} \|_1 = 0$: +\begin{equation} +\|\hat p - p^* \|_2 \leq \frac{3}{\gamma} \sqrt{\frac{s \log m}{p_{\min} n^{1-\delta}}} \end{equation} \end{corollary} -The following corollary follows trivially and gives the first $\Omega(s \log p)$ algorithm for graph reconstruction on general graphs. The proofs are included in the Appendix. +The following corollary follows easily and gives the first $\Omega(s \log p)$ algorithm for graph reconstruction on general graphs. The proofs are included in the Appendix. \begin{corollary} \label{cor:variable_selection} -Assume that ${\bf (RE)}$ holds with $\gamma_n = n \gamma$ for $\gamma > 0$. Suppose that after solving for $\hat \theta$, we construct the set $\hat {\cal S}_\eta := \{ j \in [1..p] : \hat p_j > \eta\}$ for $\eta > 0$. Suppose the number of measurements verifies: +Assume that ${\bf (RE)}$ holds with $\gamma_n = n \gamma$ for $\gamma > 0$ and that $\theta$ is s-sparse. Suppose that after solving for $\hat \theta$, we construct the set $\hat {\cal S}_\eta \defeq \{ j \in [1..p] : \hat p_j > \eta\}$ for $\eta > 0$. For $\epsilon>0$ and $\epsilon < \eta$, let ${\cal S}^*_{\eta + \epsilon} \defeq \{ j \in [1..p] :p^*_j > \eta +\epsilon \}$ be the set of all true `strong' parents. Suppose the number of measurements verifies: +\begin{equation} +n > \frac{36}{p_{\min}\gamma^2 \epsilon^2} s \log m +\end{equation} +Then with probability $1-\frac{1}{m}$, ${\cal S}^*_{\eta + \epsilon} \subset \hat {\cal S}_\eta \subset {\cal S}^*$. In other words we recover all `strong' parents and no `false' parents. Note that if $\theta$ is not exactly s-sparse and the number of measurements verifies: \begin{equation} -n > \frac{4}{\gamma^2 p_{\min} \eta^2} s \log m +n > \frac{36 \| p^*_{\lfloor s\rfloor}\|_1}{p_{\min}\gamma^2 \epsilon^4} s \log m \end{equation} -Then with probability $1-e^{n^\delta \log m}$, $\hat {\cal S}_\eta = {\cal S}^*_{2\eta}$, where ${\cal S}^*_{2\eta} = \{ j \in [1..p] :p^*_j > 2 \eta \} $. In other words, we incur no false positives (false parents) and recover all `strong' parents such that $p^*_j > 2 \eta$. +then similarly: ${\cal S}^*_{\eta + \epsilon} \subset \hat {\cal S}_\eta \subset {\cal S}^*$ w.h.p. \end{corollary} +\begin{proof} +By choosing $\delta = 0$, if $n>\frac{36}{p_{\min}\gamma^2 \epsilon^2} s \log m$, then $\|p-p^*\|_2 < \epsilon < \eta$ with probability $1-\frac{1}{m}$. If $p^*_j = 0$ and $\hat p > \eta$, then $\|p - p^*\|_2 \geq |\hat p_j-p^*_j| > \eta$, which is a contradiction. Therefore we get no false positives. If $p^*_j = \eta + \epsilon$, then $|\hat p_j - (\eta+\epsilon)| < \epsilon/2 \implies p_j > \eta + \epsilon/2$. Therefore, we get all strong parents. +\end{proof} + Note that $n$ is the number of measurements and not the number of cascades. This is an improvement over prior work since we expect several measurements per cascade. |
