5 files changed, 20 insertions, 10 deletions
diff --git a/paper/paper.tex b/paper/paper.tex
index 36eccc7..b172a5d 100644
--- a/paper/paper.tex
+++ b/paper/paper.tex
@@ -40,7 +40,7 @@
 % been accepted, when creating the final version.  This will set the
 % note in the first column to ``Proceedings of the...''
 %\usepackage[accepted]{icml2015}
-
+\usepackage[utf8]{inputenc}
 
 % The \icmltitle you define below is probably too long as a header.
 % Therefore, a short form for the running title is supplied here:
diff --git a/paper/sections/appendix.tex b/paper/sections/appendix.tex
index 65576bb..eef3f45 100644
--- a/paper/sections/appendix.tex
+++ b/paper/sections/appendix.tex
@@ -1,6 +1,3 @@
-\subsection{Proof of support recovery lemma}
-
-
 \subsection{Upper-bound for $\|\nabla f(\theta^*)\|_\infty$}
 
 We show an upper-bound for $ 2 \|\nabla f(\theta^*) \|_\infty$. If we only consider the first measurement of every cascade, this can be done easily. Let $N$ be the number of cascades (to distinguish from $n$ number of total measurements). Begin by noting that for a node $\alpha$:
@@ -34,3 +31,14 @@ By union bound and characterization of sub-gaussian variables:
 \end{equation}
 
 In conclusion, for $\delta>0$, $\lambda := 2 \sqrt{\frac{n^{\delta + 1} \log m}{p_{\min}}}$ is a valid regularizer with probability $1 - \exp(-n^\delta \log m )$
+
+\subsection{Proposition~\ref{prop:irrepresentability}}
+In the words and notation of Theorem 9.1 in \cite{vandegeer:2009}: 
+\begin{lemma}
+\label{lemm:irrepresentability_proof}
+Let $\phi^2_{\text{compatible}}(L,S) \defeq \min \{ \frac{s \|f_\beta\|^2_2}{\|\beta_S\|^2_1} \ : \ \beta \in {\cal R}(L, S) \}$, where $\|f_\beta\|^2_2 \defeq \{ \beta^T \Sigma \beta \}$ and ${\cal R}(L,S) \defeq \{\beta : \|\beta_{S^c}\|_1 \leq L \|\beta_S\|_1 \neq 0\}$. If $\nu_{\text{irrepresentable}(S,s)} < 1/L$, then $\phi^2_{\text{compatible}}(L,S) \geq (1 - L \nu_{\text{irrepresentable}(S,s)})^2 \lambda_{\min}^2$. 
+\end{lemma}
+
+Since ${\cal R}(3, S) = {\cal C}$, $\|\beta_S\|_1 \geq \|\beta_S\|_2$, and $\|\beta_S\|_1 \geq \frac{1}{3} \|\beta_{S^c}\|_1$ it is easy to see that $\|\beta_S\|_1 \geq \frac{1}{4} \|\beta\|_2$ and therefore that: $\gamma_n \geq \frac{n}{4s}\phi^2_{\text{compatible}}(3,S)$
+
+Consequently, if $\epsilon > \frac{2}{3}$, then $\nu_{\text{irrepresentable}(S,s)} < 1/3$ and the conditions of Lemma~\ref{lemm:irrepresentability_proof} hold.
diff --git a/paper/sections/assumptions.tex b/paper/sections/assumptions.tex
index 1f88494..14b16d4 100644
--- a/paper/sections/assumptions.tex
+++ b/paper/sections/assumptions.tex
@@ -15,19 +15,21 @@ Consider for example the following matrix:
 
 Yet, assuming we only wish to recover all edges above a certain threshold, bounding the $\ell2$-error allows us to recover all edges with weights above a certain minimum threshold under an intuitively weaker {\bf(RE)} condition. In practical scenarios, such as in social networks, where one seeks to recover significant edges, this is a reasonable assumption.
 
-As mentioned previously, it is intuitive that the irrepresentability condition is stronger than our suggested {\bf(RE)} assumption. In fact, by adapting slightly the following result from \cite{vandegeer:2009}, we can show that a `strong' irrepresentability condition directly {\it implies} the {\bf(RE)} condition for $\ell2$-recovery
+As mentioned previously, it is intuitive that the irrepresentability condition is stronger than our suggested {\bf(RE)} assumption. In fact, by adapting slightly a result from \cite{vandegeer:2009}, we can show that a `strong' irrepresentability condition directly {\it implies} the {\bf(RE)} condition for $\ell2$-recovery:
 
 \begin{proposition}
-If the irrepresentability condition holds with $\epsilon > \frac{2}{3}$, then the restricted eigenvalue condition holds with constant: $(1 - 3\epsilon)^2 \lambda_{\min}n/s$
+\label{prop:irrepresentability}
+If the irrepresentability condition holds with $\epsilon > \frac{2}{3}$, then the restricted eigenvalue condition holds with constant $\gamma_n \geq (1 - 3(1 -\epsilon))^2 \lambda_{\min}^2n/(4s)$, where $\lambda_{\min} > 0$ is the smallest eigenvalue of $Q^*_{S,S}$, on which the results of \cite{Daneshmand:2014} also depend.
 \end{proposition}
 
 
 \subsection{The Restricted Eigenvalue Condition}
 
-Expressing the restricted eigenvalue assumption for correlated measurements is non-trivial as parameters of the graph is non-trivial. Under reasonable assumptions on the graph parameters, we can show a very crude ${\cal O}(N)$-lower bound for $\gamma_n$ by exploiting only the first set of measurements, where only the source nodes are active. Note that even though we waste a lot of information, we obtain similar asymptotic behavior than previous work.
+Expressing the restricted eigenvalue assumption for correlated measurements as parameters of the graph and the cascade diffusion process is non-trivial. Under reasonable assumptions on the graph parameters, we can show a very crude ${\cal O}(N)$-lower bound for $\gamma_n$ by exploiting only the first set of measurements, where only the source nodes are active. Note that even though we waste a lot of information, we obtain similar asymptotic behavior than previous work.
 
 Similarly to the analysis conducted in \cite{Daneshmand:2014}, we can show that if the eigenvalue can be showed to hold for the `expected' hessian, it can be showed to hold for the hessian itself.
 
 \begin{proposition}
+\label{prop:expected_hessian}
 If result holds for the expected hessian, then it holds for the hessian!
 \end{proposition}
diff --git a/paper/sections/intro.tex b/paper/sections/intro.tex
index 5f507d9..8edf341 100644
--- a/paper/sections/intro.tex
+++ b/paper/sections/intro.tex
@@ -14,7 +14,7 @@ The study of edge prediction in graph has been an active field of research for o
 
 The work of \cite{Abrahao:13} study the same continous-model framework as \cite{GomezRodriguez:2010} and obtain a ${\cal O}(\Delta^9 \log^2 \Delta \log m)$ support recovery algorithm. Their work also studies the information leveraged by different `parts' of the cascade, showing that a surprisingly important amount of information can be gleaned from the first `infections' of the cascade. We will reach a similar conclusion in section~\ref{sec:assumptions}. However, their model supposes a single-source model, where only one source is selected at random, which is less realistic in practice. Oftentimes, the `patient 0' is unknown to us, and a multi-source model intuitively models the more common situation of a delayed observation of the cascade.
 
-The recent work of \cite{Daneshmand:2014} is very similar to our own, in fact we only grew aware of their contribution at a late stage in the writing of this paper. Similarly to our approach, they consider a $\ell1$-norm penalty to their objective function, adapting standard results from sparse recovery to obtain a ${\cal O}(\Delta^3 \log m)$ algorithm under an irrepresentability condition. With stronger assumptions, they match the \cite{Netrapalli:2012} bound of ${\cal O}(\Delta^2 \log m)$, by exploiting a similar proof technique based around the KKT conditions of the objective function. Their work has the merit of studying a general framework of continuous functions. However, they also place themselves in the restrictive single-source context. 
+The recent work of \cite{Daneshmand:2014} is very similar to our own, they consider a $\ell1$-norm penalty to their objective function, adapting standard results from sparse recovery to obtain a ${\cal O}(\Delta^3 \log m)$ algorithm under an irrepresentability condition. With stronger assumptions, they match the \cite{Netrapalli:2012} bound of ${\cal O}(\Delta^2 \log m)$, by exploiting a similar proof technique based around the KKT conditions of the objective function. Their work has the merit of studying a general framework of continuous functions. Similarly to \cite{Abrahao:13}, they place themselves in the restrictive single-source context. 
 
 \paragraph{Our contributions}
-Though our work follows closely the spirit of \cite{Netrapalli:2012} and \cite{Daneshmand:2014}, we believe we provide several significant improvements to their work. We study sparse recovery under less restrictive assumptions and obtain the first, to the best of the authors' knowledge, ${\cal O}(\Delta \log m)$ algorithm for graph inference from cascades, which is tight to a certain extent. We also study the seemingly overlooked problem of weight recovery as well as the setting of the relaxed sparsity setting. We study the case of the two best known diffusion processes for simplicity as outlined in \cite{Kempe:03}, but many arguments can be extended to more general diffusion processes.
+Though our work follows closely the spirit of \cite{Netrapalli:2012} and \cite{Daneshmand:2014}, we believe we provide several significant improvements to their work. We study sparse recovery under less restrictive assumptions and obtain the first ${\cal O}(\Delta \log m)$ algorithm for graph inference from cascades. We also study the seemingly overlooked problem of weight recovery as well as the setting of the relaxed sparsity setting. Finally, we show these results are almost tight, by proving in section~\ref{sec:lowerbound} the first lower bound on the number of observations required to recover the edges and the edge weights of a graph in the general case. We study the case of the two best known diffusion processes for simplicity as outlined in \cite{Kempe:03}, but many arguments can be extended to more general diffusion processes.
diff --git a/paper/sparse.bib b/paper/sparse.bib
index c67cd20..11209ee 100644
--- a/paper/sparse.bib
+++ b/paper/sparse.bib
@@ -122,7 +122,7 @@ year = {2006},
 
 
 @article{vandegeer:2009,
-author = "van de Geer, Sara A. and Bühlmann, Peter",
+author = "van de Geer, Sara A. and B{\"u}hlmann, Peter",
 doi = "10.1214/09-EJS506",
 fjournal = "Electronic Journal of Statistics",
 journal = "Electron. J. Statist.",