4 files changed, 52 insertions, 19 deletions
diff --git a/paper/paper.tex b/paper/paper.tex
index d2fdcb9..36eccc7 100644
--- a/paper/paper.tex
+++ b/paper/paper.tex
@@ -63,6 +63,7 @@
 \newtheorem{corollary}{Corollary}
 \newtheorem{remark}{Remark}
 \newtheorem{proposition}{Proposition}
+\newtheorem{definition}{Definition}
 
 \begin{document} 
 
diff --git a/paper/sections/appendix.tex b/paper/sections/appendix.tex
index 80692d0..65576bb 100644
--- a/paper/sections/appendix.tex
+++ b/paper/sections/appendix.tex
@@ -1,9 +1,9 @@
-\subsection*{Proof of support recovery lemma}
+\subsection{Proof of support recovery lemma}
 
 
-\subsection*{Upper-bound for $\|\nabla f(\theta^*)\|_\infty$}
+\subsection{Upper-bound for $\|\nabla f(\theta^*)\|_\infty$}
 
-We show an upper-bound for $ 2 \|\nabla f(\theta^*) \|_\infty$. If we only consider the first measurement of every cascade, this can be done easily. Let $N$ be the number of cascades (to distinguish from $n$ number of total measurements). Begin by noting that:
+We show an upper-bound for $ 2 \|\nabla f(\theta^*) \|_\infty$. If we only consider the first measurement of every cascade, this can be done easily. Let $N$ be the number of cascades (to distinguish from $n$ number of total measurements). Begin by noting that for a node $\alpha$:
 
 \begin{align}
 \nabla_k f(\theta) & = \sum^n_{i=1} \frac{b^i x^i_k}{1 - e^{\langle x^i, \theta \rangle}} - \sum^n_{i=1} x^i_k \nonumber \\
@@ -19,7 +19,7 @@ $\nabla f(\theta^*)$ is a $N/p_{\min}$-subgaussian variable, where $p_{\min}$ is
 We show that its expectation is $0$ by conditioning on the seed set $S$:
 
 \begin{align}
-\mathbb{E} \left( \nabla_k f(\theta) \right) & = \sum_{i=1}^N \mathbb{E} \left[ x^i_k \left( \frac{b^i}{\mathbb{P}(\text{node } \alpha \text { infected})} - 1\right) \right] \nonumber \\
+\mathbb{E} \left( \nabla_k f(\theta) \right) & = \sum_{i=1}^N \mathbb{E} \left[ x^i_k \left( \frac{b^i}{\mathbb{P}(\alpha \text { infected})} - 1\right) \right] \nonumber \\
 & = \sum_{i=1}^N \mathbb{E}_S \left[ \mathbb{E}\left[x^i_k \left( \frac{b^i}{\mathbb{P}(\alpha \text { infected})} - 1\right) \middle| S \right] \right] \nonumber \\
 & = \sum_{i=1}^N  \mathbb{E}\left[x^i_k \left( \frac{ \mathbb{E}_S \left[ b^i \middle| S \right]}{\mathbb{P}(\alpha \text { infected})} - 1\right)  \right] \nonumber \\
 & = 0 \nonumber
@@ -30,7 +30,7 @@ Therefore, $\nabla f(\theta^*)$ is the sum of zero-mean variables, upper-bounded
 By union bound and characterization of sub-gaussian variables:
 
 \begin{equation}
-\mathbb{P}(\| \nabla f(\theta) \|_{\infty} > \lambda) \leq 2 \exp \left( -\frac{\lambda^2 p_{\min}}{2n} + \log p \right)
+\mathbb{P}(\| \nabla f(\theta) \|_{\infty} > \lambda) \leq 2 \exp \left( -\frac{\lambda^2 p_{\min}}{2n} + \log m \right)
 \end{equation}
 
-In conclusion, for $\delta>0$, $\lambda := 2 \sqrt{\frac{n^{\delta + 1} \log p}{p_{\min}}}$ is a valid regularizer with probability $1 - \exp(-n^\delta \log p )$
+In conclusion, for $\delta>0$, $\lambda := 2 \sqrt{\frac{n^{\delta + 1} \log m}{p_{\min}}}$ is a valid regularizer with probability $1 - \exp(-n^\delta \log m )$
diff --git a/paper/sections/assumptions.tex b/paper/sections/assumptions.tex
index d6fed83..1f88494 100644
--- a/paper/sections/assumptions.tex
+++ b/paper/sections/assumptions.tex
@@ -1,13 +1,33 @@
-\subsection{The Restricted Eigenvalue Condition}
+\subsection{The Irrepresentability Condition}
 
-Proving the restricted eigenvalue assumption for correlated measurements is non-trivial. Under reasonable assumptions on the graph parameters, we can show a very crude ${\cal O}(N)$-lower bound for $\gamma_n$ by exploiting only the first set of measurements, where only the source nodes are active. Note that even though we waste a lot of information, we obtain similar asymptotic behavior than previous work.
+\cite{Daneshmand:2014} rely on an `incoherence' condition on the hessian of the likelihood function. It is in fact easy to see that their condition is equivalent to the more commonly called {\it (S, s)-irrepresentability} condition:
 
-\subsection{The Irrepresentability Condition}
+\begin{definition}
+Following similar notation, let $Q^* \nabla^2 f(\theta^*)$. Let $Q_{S^C,S} XXX$, the {\it (S, s)-irrepresentability} condition is defined as:
+\begin{equation}
+blabla
+\end{equation}
+\end{definition}
 
-If our objective is to recover only the support of the graph and to do it exactly , then the irrepresentability condition\footnote{{\it incoherence} in the text} used in \cite{Daneshmand:2014} is believed to be necessary. 
+If our objective is to recover the support of the graph exactly, the irrepresentability condition has been shown to be essentially necessary \cite{Zhao:2006}. However, several recent papers \cite{vandegeer:2011}, \cite{Zou:2006}, argue this condition is unrealistic in situations where there is a correlation between variables.
 
-However, assuming we only wish to recover all edges above a certain threshold, bounding the $\ell2$-error allows us to recover the full graph above a certain minimum threshold. Furthermore, it is intuitive that the irrepresentability condition is stronger than our suggested {\bf(RE)} assumption. In fact, by adapting slightly the following result from \cite{vandegeer:2009}, we can show that a `strong' irrepresentability condition directly {\it implies} the {\bf(RE)} condition for $\ell2$-recovery
+Consider for example the following matrix:
+
+Yet, assuming we only wish to recover all edges above a certain threshold, bounding the $\ell2$-error allows us to recover all edges with weights above a certain minimum threshold under an intuitively weaker {\bf(RE)} condition. In practical scenarios, such as in social networks, where one seeks to recover significant edges, this is a reasonable assumption.
+
+As mentioned previously, it is intuitive that the irrepresentability condition is stronger than our suggested {\bf(RE)} assumption. In fact, by adapting slightly the following result from \cite{vandegeer:2009}, we can show that a `strong' irrepresentability condition directly {\it implies} the {\bf(RE)} condition for $\ell2$-recovery
 
 \begin{proposition}
 If the irrepresentability condition holds with $\epsilon > \frac{2}{3}$, then the restricted eigenvalue condition holds with constant: $(1 - 3\epsilon)^2 \lambda_{\min}n/s$
-\end{proposition}
-\ No newline at end of file
+\end{proposition}
+
+
+\subsection{The Restricted Eigenvalue Condition}
+
+Expressing the restricted eigenvalue assumption for correlated measurements is non-trivial as parameters of the graph is non-trivial. Under reasonable assumptions on the graph parameters, we can show a very crude ${\cal O}(N)$-lower bound for $\gamma_n$ by exploiting only the first set of measurements, where only the source nodes are active. Note that even though we waste a lot of information, we obtain similar asymptotic behavior than previous work.
+
+Similarly to the analysis conducted in \cite{Daneshmand:2014}, we can show that if the eigenvalue can be showed to hold for the `expected' hessian, it can be showed to hold for the hessian itself.
+
+\begin{proposition}
+If result holds for the expected hessian, then it holds for the hessian!
+\end{proposition}
diff --git a/paper/sections/results.tex b/paper/sections/results.tex
index 5c2fe5f..449ae02 100644
--- a/paper/sections/results.tex
+++ b/paper/sections/results.tex
@@ -74,25 +74,37 @@ We analyse the previous conditions in the case of the Independent Cascade model.
 For any $\delta > 0$, with probability $1-e^{-n^\delta \log m}$, $\|\nabla f(\theta^*)\|_{\infty} \leq 2 \sqrt{\frac{n^{\delta + 1} \log m}{p_{\min}}}$
 \end{lemma}
 
-We include a proof of this result in the Appendix. The following corollaries follow immediately from Theorem~\ref{thm:neghaban} and Lemma~\ref{lem:icc_lambda_upper_bound}.
+We include a proof of this result in the Appendix. The following corollaries follow immediately from Theorem~\ref{thm:neghaban}, Theorem~\ref{thm:approx_sparse} and Lemma~\ref{lem:icc_lambda_upper_bound}:
 
 \begin{corollary}
-Assume that ${\bf (RE)}$ holds with $\gamma_n = n \gamma$ for $\gamma > 0$. Then for $\lambda_n := 2 \sqrt{\frac{n^{\delta + 1} \log m}{p_{\min}}}$ and with probability $1-e^{n^\delta \log m}$
+Assume that ${\bf (RE)}$ holds with $\gamma_n = n \gamma$ for $\gamma > 0$. Then for $\lambda_n \defeq 2 \sqrt{\frac{n^{\delta + 1} \log m}{p_{\min}}}$ and with probability $1-e^{n^\delta \log m}$:
 \begin{equation}
-\|\hat p - p^* \|_2 \leq \frac{2}{\gamma} \sqrt{\frac{s \log m}{p_{\min} n^{1-\delta}}}
+\|\hat p - p^* \|_2 \leq \frac{3}{\gamma} \sqrt{\frac{s \log m}{p_{\min} n^{1-\delta}}} + 2 \sqrt[4]{\frac{\log m}{n^{1-\delta} \gamma^2 p_{\min}}} \| p^*_{\lfloor s \rfloor} \|_1
+\end{equation}
+Note that if $p$ is exactly s-sparse, $\| p^*_{\lfloor s \rfloor} \|_1 = 0$:
+\begin{equation}
+\|\hat p - p^* \|_2 \leq \frac{3}{\gamma} \sqrt{\frac{s \log m}{p_{\min} n^{1-\delta}}}
 \end{equation}
 \end{corollary}
 
-The following corollary follows trivially and gives the first $\Omega(s \log p)$ algorithm for graph reconstruction on general graphs. The proofs are included in the Appendix.
+The following corollary follows easily and gives the first $\Omega(s \log p)$ algorithm for graph reconstruction on general graphs. The proofs are included in the Appendix.
 
 \begin{corollary}
 \label{cor:variable_selection}
-Assume that ${\bf (RE)}$ holds with $\gamma_n = n \gamma$ for $\gamma > 0$. Suppose that after solving for $\hat \theta$, we construct the set $\hat {\cal S}_\eta := \{ j \in [1..p] : \hat p_j > \eta\}$ for $\eta > 0$. Suppose the number of measurements verifies:
+Assume that ${\bf (RE)}$ holds with $\gamma_n = n \gamma$ for $\gamma > 0$ and that $\theta$ is s-sparse. Suppose that after solving for $\hat \theta$, we construct the set $\hat {\cal S}_\eta \defeq \{ j \in [1..p] : \hat p_j > \eta\}$ for $\eta > 0$. For $\epsilon>0$ and $\epsilon < \eta$, let ${\cal S}^*_{\eta + \epsilon} \defeq \{ j \in [1..p] :p^*_j > \eta +\epsilon \}$ be the set of all true `strong' parents. Suppose the number of measurements verifies:
+\begin{equation}
+n > \frac{36}{p_{\min}\gamma^2 \epsilon^2} s \log m
+\end{equation}
+Then with probability $1-\frac{1}{m}$, ${\cal S}^*_{\eta + \epsilon} \subset \hat {\cal S}_\eta \subset {\cal S}^*$. In other words we recover all `strong' parents and no `false' parents. Note that if $\theta$ is not exactly s-sparse and the number of measurements verifies:
 \begin{equation}
-n > \frac{4}{\gamma^2 p_{\min} \eta^2} s \log m
+n > \frac{36 \| p^*_{\lfloor s\rfloor}\|_1}{p_{\min}\gamma^2 \epsilon^4} s \log m
 \end{equation}
-Then with probability $1-e^{n^\delta \log m}$, $\hat {\cal S}_\eta = {\cal S}^*_{2\eta}$, where ${\cal S}^*_{2\eta} = \{ j \in [1..p] :p^*_j > 2 \eta \} $. In other words, we incur no false positives (false parents) and recover all `strong' parents such that $p^*_j > 2 \eta$. 
+then similarly: ${\cal S}^*_{\eta + \epsilon} \subset \hat {\cal S}_\eta \subset {\cal S}^*$ w.h.p.
 \end{corollary}
 
+\begin{proof}
+By choosing $\delta = 0$, if $n>\frac{36}{p_{\min}\gamma^2 \epsilon^2} s \log m$, then $\|p-p^*\|_2 < \epsilon < \eta$ with probability $1-\frac{1}{m}$. If $p^*_j = 0$ and $\hat p > \eta$, then $\|p - p^*\|_2 \geq |\hat p_j-p^*_j| > \eta$, which is a contradiction. Therefore we get no false positives. If $p^*_j = \eta + \epsilon$, then $|\hat p_j - (\eta+\epsilon)| < \epsilon/2 \implies p_j > \eta + \epsilon/2$. Therefore, we get all strong parents.
+\end{proof}
+
 Note that $n$ is the number of measurements and not the number of cascades. This is an improvement over prior work since we expect several measurements per cascade.