\subsection{Proof for different lemmas} \subsubsection{Bounded gradient} \begin{proof} The gradient of $\mathcal{L}$ is given by: \begin{multline*} \nabla \mathcal{L}(\theta^*) = \frac{1}{|\mathcal{T}|}\sum_{t\in \mathcal{T}}x^t\bigg[ x_i^{t+1}\frac{f'}{f}(\inprod{\theta^*}{x^t})\\ - (1-x_i^{t+1})\frac{f'}{1-f}(\inprod{\theta^*}{x^t})\bigg] \end{multline*} Let $\partial_j \mathcal{L}(\theta)$ be the $j$-th coordinate of $\nabla\mathcal{L}(\theta^*)$. Writing $\partial_j\mathcal{L}(\theta^*) = \frac{1}{|\mathcal{T}|}\sum_{t\in\mathcal{T}} Y_t$ and since $\E[x_i^{t+1}|x^t]= f(\inprod{\theta^*}{x^t})$, we have that $\E[Y_{t+1}|Y_t] = 0$. Hence $Z_t = \sum_{k=1}^t Y_k$ is a martingale. Using assumption (LF), we have almost surely $|Z_{t+1}-Z_t|\leq \frac{1}{\alpha}$ and we can apply Azuma's inequality to $Z_t$: \begin{displaymath} \P\big[|Z_{\mathcal{T}}|\geq \lambda\big]\leq 2\exp\left(\frac{-\lambda^2\alpha}{2n}\right) \end{displaymath} Applying a union bound to have the previous inequality hold for all coordinates of $\nabla\mathcal{L}(\theta)$ implies: \begin{align*} \P\big[\|\nabla\mathcal{L}(\theta^*)\|_{\infty}\geq \lambda \big] &\leq 2m\exp\left(\frac{-\lambda^2n\alpha}{2}\right) \end{align*} Choosing $\lambda\defeq 2\sqrt{\frac{\log m}{\alpha n^{1-\delta}}}$ concludes the proof. \end{proof} \subsubsection{Approximate sparsity proof} \subsubsection{RE with high probability} \subsection{Other continuous time processes binned to ours: proportional hazards model} \subsection{Irrepresentability vs. Restricted Eigenvalue Condition} In the words and notation of Theorem 9.1 in \cite{vandegeer:2009}: \begin{lemma} \label{lemm:irrepresentability_proof} Let $\phi^2_{\text{compatible}}(L,S) \defeq \min \{ \frac{s \|f_\beta\|^2_2}{\|\beta_S\|^2_1} \ : \ \beta \in {\cal R}(L, S) \}$, where $\|f_\beta\|^2_2 \defeq \{ \beta^T \Sigma \beta \}$ and ${\cal R}(L,S) \defeq \{\beta : \|\beta_{S^c}\|_1 \leq L \|\beta_S\|_1 \neq 0\}$. If $\nu_{\text{irrepresentable}(S,s)} < 1/L$, then $\phi^2_{\text{compatible}}(L,S) \geq (1 - L \nu_{\text{irrepresentable}(S,s)})^2 \lambda_{\min}^2$. \end{lemma} Since ${\cal R}(3, S) = {\cal C}$, $\|\beta_S\|_1 \geq \|\beta_S\|_2$, and $\|\beta_S\|_1 \geq \frac{1}{3} \|\beta_{S^c}\|_1$ it is easy to see that $\|\beta_S\|_1 \geq \frac{1}{4} \|\beta\|_2$ and therefore that: $\gamma_n \geq \frac{n}{4s}\phi^2_{\text{compatible}}(3,S)$ Consequently, if $\epsilon > \frac{2}{3}$, then $\nu_{\text{irrepresentable}(S,s)} < 1/3$ and the conditions of Lemma~\ref{lemm:irrepresentability_proof} hold. \subsection{Lower bound for restricted eigenvalues (expected hessian) for different graphs} \subsection{Better asymptotic w.r.t expected hessian} \subsection{Confidence intervals?} \subsection{Active learning}