Typos

author: Thibaut Horel <thibaut.horel@gmail.com> 2015-02-06 18:27:55 -0500
committer: Thibaut Horel <thibaut.horel@gmail.com> 2015-02-06 18:27:55 -0500
commit: 85d2de4ff3588b8059b2bb45ec9444097b996aa7 (patch)
tree: 06c50a6d889e68a3d7920130a3441bfb4a15d4bf /paper
parent: 8d7b0d99cfadc38a15b0f63d28d0edd024e8c5f0 (diff)
download: cascades-85d2de4ff3588b8059b2bb45ec9444097b996aa7.tar.gz
5 files changed, 47 insertions, 50 deletions
diff --git a/paper/sections/experiments.tex b/paper/sections/experiments.tex
index 80f95fa..9122e7d 100644
--- a/paper/sections/experiments.tex
+++ b/paper/sections/experiments.tex
@@ -28,7 +28,7 @@ $\ell_2$-norm $\|\hat \Theta - \Theta\|_2$ for a Kronecker graph which is: (d) e
 \label{fig:four_figs}
 \end{table*}
 
-In this section, we validate empirically the results and assumptions of Section~\ref{sec:results} for for varying levels of sparsity and different initializations of parameters ($n$, $m$, $\lambda$, $p_{\text{init}}$), where $p_{\text{init}}$ is the initial probability of a node being a source node. We compare our algorithm to two different state-of-the-art algorithms: \textsc{greedy} and \textsc{mle} from \cite{Netrapalli:2012}. As an extra benchmark, we also introduce a new algorithm \textsc{lasso}, which approximates our \textsc{sparse mle} algorithm.
+In this section, we validate empirically the results and assumptions of Section~\ref{sec:results} for varying levels of sparsity and different initializations of parameters ($n$, $m$, $\lambda$, $p_{\text{init}}$), where $p_{\text{init}}$ is the initial probability of a node being a source node. We compare our algorithm to two different state-of-the-art algorithms: \textsc{greedy} and \textsc{mle} from \cite{Netrapalli:2012}. As an extra benchmark, we also introduce a new algorithm \textsc{lasso}, which approximates our \textsc{sparse mle} algorithm.
 
 \paragraph{Experimental setup}
 We evaluate the performance of the algorithms on synthetic graphs, chosen for their similarity to real social networks. We therefore consider a Watts-Strogatz graph ($300$ nodes, $4500$ edges) \cite{watts:1998}, a Barabasi-Albert graph ($300$ nodes, $16200$ edges) \cite{barabasi:2001}, a Holme-Kim power law graph ($200$ nodes, $9772$ edges) \cite{Holme:2002}, and the recently introduced Kronecker graph ($256$ nodes, $10000$ edges) \cite{Leskovec:2010}. Undirected graphs are converted to directed graphs by doubling the edges.
@@ -38,7 +38,7 @@ from the (IC) model for $n \in \{100, 500, 1000, 2000, 5000\}$. We compare for e
 
 \paragraph{Benchmarks}
 
-We compare our \textsc{sparse mle} algorithm to 3 benchmarks: \textsc{greedy} and \textsc{mle} from \cite{Netrapalli:2012} and \textsc{lasso}. The \textsc{mle} algorithm is a maximum-likelihood estimator without $\ell1$-norm penalization. \textsc{greedy} is an iterative algorithm. We introduced the \textsc{lasso} algorithm in our experiments to achieve faster computation time:
+We compare our \textsc{sparse mle} algorithm to 3 benchmarks: \textsc{greedy} and \textsc{mle} from \cite{Netrapalli:2012} and \textsc{lasso}. The \textsc{mle} algorithm is a maximum-likelihood estimator without $\ell_1$-norm penalization. \textsc{greedy} is an iterative algorithm. We introduced the \textsc{lasso} algorithm in our experiments to achieve faster computation time:
 $$\hat \theta_i \in \arg \min_{\theta} \sum_{t \in {\cal T}} |f(\theta_i\cdot x^t) - x_i^{t+1}|^2 + \lambda \|\theta_i\|_1$$
 \textsc{Lasso} has the merit of being both easier and faster to optimize numerically than the other convex-optimization based algorithms. It approximates the $\textsc{sparse mle}$ algorithm by making the assumption that the observations $x_i^{t+1}$ are of the form: $x_i^{t+1} = f(\theta_i\cdot x^t) + \epsilon$, where $\epsilon$ is random white noise. This is not valid in theory since $\epsilon$ \emph{depends on} $f(\theta_i\cdot x^t)$, however the approximation is validated in practice.
 
diff --git a/paper/sections/intro.tex b/paper/sections/intro.tex
index d52da66..39f5089 100644
--- a/paper/sections/intro.tex
+++ b/paper/sections/intro.tex
@@ -46,7 +46,7 @@ The contributions of this paper are the following:
     \item we formulate the Graph Inference problem in the context of
         discrete-time influence cascades as a sparse recovery problem for
         a specific type of Generalized Linear Model. This formulation notably
-        encompases the well-studied Independent Cascade Model and Voter Model.
+        encompasses the well-studied Independent Cascade Model and Voter Model.
     \item we give an algorithm which recovers the graph's edges using $\O(s\log
         m)$ cascades. Furthermore, we show that our algorithm is also able to
         recover the edge weights (the parameters of the influence model),
@@ -55,7 +55,7 @@ The contributions of this paper are the following:
         recover is approximately $s$-sparse by proving guarantees in the
         \emph{stable recovery} setting.
     \item we provide an almost tight lower bound of $\Omega(s\log \frac{m}{s})$
-        observations.
+        observations required for sparse recovery.
 \end{itemize}
 
 The organization of the paper is as follows: we conclude the introduction by
@@ -69,7 +69,7 @@ Section~\ref{sec:experiments}.
 \paragraph{Related Work}
 
 The study of edge prediction in graphs has been an active field of research for
-over a decade \cite{Nowell08} \cite{Leskovec07} \cite{AdarA05}.
+over a decade \cite{Nowell08, Leskovec07, AdarA05}.
 \cite{GomezRodriguez:2010} introduced the {\scshape Netinf} algorithm, which
 approximates the likelihood of cascades represented as a continuous process.
 The algorithm was improved in later work \cite{gomezbalduzzi:2011}, but is not
@@ -82,7 +82,7 @@ analysis depends on a {\it correlation decay} assumption, which limits the
 number of new infections at every step. In this setting, they show a lower
 bound of the number of cascades needed for support recovery with constant
 probability of the order $\Omega(s \log (m/s))$. They also suggest a {\scshape
-Greedy} algorithm, which achieves an ${\cal O}(s \log m)$ guarantee in the case
+Greedy} algorithm, which achieves a ${\cal O}(s \log m)$ guarantee in the case
 of tree graphs. The work of \cite{Abrahao:13} studies the same continuous-model framework as
 \cite{GomezRodriguez:2010} and obtains an ${\cal O}(s^9 \log^2 s \log m)$ support
 recovery algorithm, without the \emph{correlation decay} assumption.
@@ -99,7 +99,7 @@ m)$, by exploiting similar properties of the convex program's KKT conditions.
 In contrast, our work studies discrete-time diffusion processes including the
 Independent Cascade model under weaker assumptions. Furthermore, we analyze
 both the recovery of the graph's edges and the estimation of the model's
-parameters, and achieve closer to optimal bounds.
+parameters, and achieve close to optimal bounds.
 
 \begin{comment}
 Their work has the merit of studying a generalization of the discrete-time independent cascade model to continuous functions. Similarly to \cite{Abrahao:13}, they place themselves in
diff --git a/paper/sections/lowerbound.tex b/paper/sections/lowerbound.tex
index becd13f..a7ae21c 100644
--- a/paper/sections/lowerbound.tex
+++ b/paper/sections/lowerbound.tex
@@ -4,7 +4,7 @@ support recovery with constant probability under a \emph{correlation decay}
 assumption.  In this section, we will consider the stable sparse recovery
 setting of Section~\ref{sec:relaxing_sparsity}.  Our goal is to obtain an
 information-theoretic lower bound on the number of measurements necessary to
-approximately recover the parameter $\theta$ of a cascade model from observed
+approximately recover the parameter $\theta^*$ of a cascade model from observed
 cascades. Similar lower bounds were obtained for sparse \emph{linear} inverse
 problems in \cite{pw11, pw12, bipw11}.
 
diff --git a/paper/sections/model.tex b/paper/sections/model.tex
index d8706bc..01f816b 100644
--- a/paper/sections/model.tex
+++ b/paper/sections/model.tex
@@ -18,9 +18,7 @@ state space $\{0, 1, \dots, K-1\}^V$ with the following properties:
 \end{enumerate}
 
 In other words, a cascade model describes a diffusion process where a set of
-contagious nodes ``influence'' other nodes in the graph to become in turn
-contagious. An \emph{influence cascade} is a realisation of this random
-process: the successive states of the nodes in graph ${\cal G}$. Note that both
+contagious nodes ``influence'' other nodes in the graph to become contagious. An \emph{influence cascade} is a realisation of this random process, \emph{i.e.} the successive states of the nodes in graph ${\cal G}$. Note that both
 the ``single source'' assumption made in \cite{Daneshmand:2014} and
 \cite{Abrahao:13} as well as the ``uniformly chosen source set'' assumption
 made in \cite{Netrapalli:2012} verify condition 3.
@@ -33,7 +31,7 @@ a more general class of transition probabilities while staying in the
 discrete-time setting. We observe that despite their obvious differences, both
 the independent cascade and the voter models make the graph inference problem
 similar to the standard generalized linear model inference problem. In fact, we
-define a class of diffusion processes for which this true: the
+define a class of diffusion processes for which this is true: the
 \emph{Generalized Linear Cascade Models}. The linear threshold model is
 a special case and is discussed in Section~\ref{sec:linear_threshold}.
 
@@ -101,7 +99,7 @@ In the independent cascade model, nodes can be either susceptible, contagious
 or immune. At $t=0$, all source nodes are ``contagious'' and all remaining
 nodes are ``susceptible''. At each time step $t$, for each edge $(i,j)$ where
 $j$ is susceptible and $i$ is contagious, $i$ attempts to infect $j$ with
-probability $p_{i,j}\in[0,1]$, the infection attempts are mutually independent.
+probability $p_{i,j}\in(0,1]$; the infection attempts are mutually independent.
 If $i$ succeeds, $j$ will become contagious at time step $t+1$. Regardless of
 $i$'s success, node $i$ will be immune at time $t+1$. In other words, nodes
 stay contagious for only one time step. The cascade process terminates when no
@@ -126,8 +124,7 @@ with inverse link function $f : z \mapsto 1 - e^z$.
 
 \subsubsection{The Linear Voter Model}
 
-In the Linear Voter Model, nodes can be either \emph{red} or \emph{blue}. Both
-states are symmetric, but without loss of generality, we can suppose that the
+In the Linear Voter Model, nodes can be either \emph{red} or \emph{blue}. Without loss of generality, we can suppose that the
 \emph{blue} nodes are contagious. The parameters of the graph are normalized
 such that $\forall i, \ \sum_j \Theta_{i,j} = 1$ and we assume that
 $\Theta_{i,i}$ is always non-zero, meaning that all nodes have self-loops.
@@ -188,8 +185,8 @@ problem:
     \hat{\Theta} \in \argmax_{\Theta} \frac{1}{n} \mathcal{L}(\Theta\,|\,x^1,\ldots,x^n)
     - \lambda\|\Theta\|_1
 \end{displaymath}
-where $\lambda$ is the regularization factor which helps at preventing
-overfitting as well as controlling for the sparsity of the solution.
+where $\lambda$ is the regularization factor which helps preventing
+overfitting and controls the sparsity of the solution.
 
 The generalized linear cascade model is decomposable in the following sense:
 given Definition~\ref{def:glcm}, the log-likelihood can be written as the sum
@@ -209,8 +206,8 @@ where we denote by ${\cal T}_i$ the time steps at which node $i$ is susceptible
 In the case of the voter model, the measurements include all time steps until
 we reach the time horizon $T$ or the graph coalesces to a single state. In the
 case of the independent cascade model, the measurements include all time steps
-until node $i$ becomes contagious after which its behavior is deterministic.
-Contrary to prior work, we express our results in terms of the number of
+until node $i$ becomes contagious, after which its behavior is deterministic.
+Contrary to prior work, our results depend on the number of
 measurements and not the number of cascades.
 
 A sufficient condition for program~\eqref{eq:pre-mle} to be a convex program is
diff --git a/paper/sections/results.tex b/paper/sections/results.tex
index c83e4d6..f3c5366 100644
--- a/paper/sections/results.tex
+++ b/paper/sections/results.tex
@@ -20,7 +20,7 @@ a set $\mathcal{T}$ of observations. We will write $n\defeq |\mathcal{T}|$.
 \label{sec:main_theorem}
 
 In this section, we analyze the case where $\theta^*$ is exactly sparse. We
-write $S\defeq\text{supp}(\theta^*)$ and $s=|S|$. Recall, that $\theta_i$ is the vector of weights for all edges \emph{directed at} the node we are solving for. In other words, $S$ is the set of all nodes susceptible to influence our node $i$, also referred to as its parents. Our main theorem will rely on the now standard \emph{restricted eigenvalue condition}.
+write $S\defeq\text{supp}(\theta^*)$ and $s=|S|$. Recall, that $\theta_i$ is the vector of weights for all edges \emph{directed at} the node we are solving for. In other words, $S$ is the set of all nodes susceptible to influence node $i$, also referred to as its parents. Our main theorem will rely on the now standard \emph{restricted eigenvalue condition}.
 
 \begin{definition}
     Let $\Sigma\in\mathcal{S}_m(\reals)$ be a real symmetric matrix and $S$ be
@@ -37,8 +37,8 @@ write $S\defeq\text{supp}(\theta^*)$ and $s=|S|$. Recall, that $\theta_i$ is the
 
 A discussion of the $(S,\gamma)$-{\bf(RE)} assumption in the context of
 generalized linear cascade models can be found in Section~\ref{sec:re}. In our
-setting we will require that this property holds for the Hessian of the
-log-likelihood function $\mathcal{L}$ and essentially captures that the binary
+setting we require that the (RE)-condition holds for the Hessian of the
+log-likelihood function $\mathcal{L}$: it essentially captures the fact that the binary
 vectors of the set of active nodes are not \emph{too} collinear.
 
 We will also need the following assumption on the inverse link function $f$ of the generalized linear cascade model:
@@ -53,8 +53,8 @@ $f(\inprod{\theta^*}{x})\notin\{0,1\}$.
 
 In the voter model, $\frac{f'}{f}(z) = \frac{1}{z}$ and $\frac{f'}{f}(1-z)
 = \frac{1}{1-z}$. Hence (LF) will hold as soon as $\alpha\leq \Theta_{i,j}\leq
-1-\alpha$ for all $(i,j)\in E$ which is always satisfied for some $\alpha$.
-Similarly, in the Independent Cascade Model, $\frac{f'}{f}(z)
+1-\alpha$ for all $(i,j)\in E$ which is always satisfied for some $\alpha$ for non-isolated nodes.
+In the Independent Cascade Model, $\frac{f'}{f}(z)
 = \frac{1}{1-e^{-z}}$ and $\frac{f'}{1-f}(z) = -1$ and (LF) is not restrictive.
 
 \begin{comment}
@@ -74,7 +74,7 @@ $\alpha > 0$. For any $\delta\in(0,1)$, let $\hat{\theta}$ be the solution of
 \begin{equation}
     \label{eq:sparse}
     \|\hat \theta - \theta^* \|_2
-    \leq \frac{3}{\gamma} \sqrt{\frac{s \log m}{\alpha n^{1-\delta}}}
+    \leq \frac{6}{\gamma} \sqrt{\frac{s \log m}{\alpha n^{1-\delta}}}
 \quad
 \text{w.p.}\;1-\frac{1}{e^{n^\delta \log m}}
 \end{equation}
@@ -91,10 +91,10 @@ cast it as a generalized linear cascade model, we had to perform the
 transformation $\Theta_{i,j} = \log(1-p_{i,j})$, where $p_{i,j}$ are the
 infection probabilities. Fortunately, if we estimate $p_j$ through $\hat{p}_{j}
 = \hat{\theta}_j$, a bound on $\|\hat\theta - \theta^*\|_2$ directly implies
-a bound on $\|\hat p - p^*\|$. Indeed we have:
+a bound on $\|\hat p - p^*\|_2$:
 
 \begin{lemma}
-    $\|\hat{\theta} - \theta' \|_2 \geq \|\hat{p} - p^*\|_2$.
+    $\|\hat{\theta} - \theta^* \|_2 \geq \|\hat{p} - p^*\|_2$.
 \end{lemma}
 \begin{proof}
 Using the inequality $\forall x>0, \; \log x \geq 1 - \frac{1}{x}$, we have
@@ -115,7 +115,7 @@ Let ${\cal C}(S) \defeq \{ \Delta \in \mathbb{R}^m\,|\,\|\Delta_S\|_1 \leq
     \label{eq:rc}
     \forall \Delta \in {\cal C}(S), \;
     {\cal L}(\theta^* + \Delta) - {\cal L}(\theta^*)\\
-    - \inprod{\Delta {\cal L}(\theta^*)}{\Delta}
+    - \inprod{\nabla {\cal L}(\theta^*)}{\Delta}
     \geq \kappa_{\cal L} \|\Delta\|_2^2 - \tau_{\cal L}^2(\theta^*)
 \end{multline}
 for some $\kappa_{\cal L} > 0$ and function $\tau_{\cal L}$. Finally suppose
@@ -131,7 +131,7 @@ $\hat{\theta}_\lambda$ is the solution of \eqref{eq:pre-mle}:
 To prove Theorem~\ref{thm:main}, we apply Lemma~\ref{lem:negahban} with
 $\tau_{\mathcal{L}}=0$. Since $\mathcal{L}$ is twice differentiable and convex,
 assumption \eqref{eq:rc} with $\kappa_{\mathcal{L}}=\frac{\gamma}{2}$ is
-implied by the (RE) condition \eqref{eq:re}. The upper bound
+implied by the (RE)-condition.. The upper bound
 on the $\ell_{\infty}$ norm of $\nabla\mathcal{L}(\theta^*)$ is given by
 Lemma~\ref{lem:ub}.
 
@@ -156,7 +156,7 @@ The gradient of $\mathcal{L}$ is given by:
 \end{multline*}
 
 Let $\partial_j \mathcal{L}(\theta)$ be the $j$-th coordinate of
-$\nabla\mathcal{L}(\theta^*)$.  Writing:
+$\nabla\mathcal{L}(\theta^*)$.  Writing
 $\partial_j\mathcal{L}(\theta^*)
 = \frac{1}{|\mathcal{T}|}\sum_{t\in\mathcal{T}} Y_t$ and since
 $\E[x_i^{t+1}|x^t]= f(\inprod{\theta^*}{x^t})$, we have that $\E[Y_{t+1}|Y_t]
@@ -172,7 +172,7 @@ and we can apply Azuma's inequality to $Z_t$:
 Applying a union bound to have the previous inequality hold for all coordinates
 of $\nabla\mathcal{L}(\theta)$ implies:
 \begin{align*}
-    \P\big[\|\nabla\mathcal{L}(\theta)\|_{\infty}\geq \lambda \big]
+    \P\big[\|\nabla\mathcal{L}(\theta^*)\|_{\infty}\geq \lambda \big]
     &\leq 2m\exp\left(\frac{-\lambda^2n\alpha}{2}\right)
 \end{align*}
 Choosing $\lambda\defeq 2\sqrt{\frac{\log m}{\alpha n^{1-\delta}}}$ concludes the
@@ -186,10 +186,10 @@ solve the Graph Inference problem.
 
 \begin{corollary}
 \label{cor:variable_selection}
-Under the same assumptions of Theorem~\ref{thm:main}, let $\hat {\cal S}_\eta
+Under the same assumptions as Theorem~\ref{thm:main}, let $\hat {\cal S}_\eta
 \defeq \{ j \in \{1,\ldots, m\} : \hat{\theta}_j > \eta\}$ for $\eta > 0$. For
 $\epsilon>0$ and $\epsilon < \eta$, let ${\cal S}^*_{\eta + \epsilon} \defeq \{
-i \in \{1,\ldots,m\} :\theta_i > \eta +\epsilon \}$ be the set of all true `strong'
+i \in \{1,\ldots,m\} :\theta_i^* > \eta +\epsilon \}$ be the set of all true `strong'
 parents. Suppose the number of measurements verifies:
 $ n > \frac{9s\log m}{\alpha\gamma^2\epsilon^2}$.
 Then with probability $1-\frac{1}{m}$, ${\cal S}^*_{\eta + \epsilon} \subseteq
@@ -199,11 +199,11 @@ parents and no `false' parents.
 
 \begin{proof}
 By choosing $\delta = 0$, if $ n > \frac{9s\log
-m}{\alpha\gamma^2\epsilon^2}$, then $\|\hat \theta-\theta\|_2 < \epsilon < \eta$
-with probability $1-\frac{1}{m}$. If $\theta_i = 0$ and $\hat \theta > \eta$,
-then $\|\hat \theta - \theta\|_2 \geq |\hat \theta_i-\theta_j| > \eta$, which
-is a contradiction. Therefore we get no false positives. If $\theta_i \leq \eta
-+ \epsilon$, then $|\hat{\theta}_i- \theta_i| < \epsilon \implies \theta_j
+m}{\alpha\gamma^2\epsilon^2}$, then $\|\hat \theta-\theta^*\|_2 < \epsilon < \eta$
+with probability $1-\frac{1}{m}$. If $\theta_i^* = 0$ and $\hat \theta > \eta$,
+then $\|\hat \theta - \theta^*\|_2 \geq |\hat \theta_i-\theta_i^*| > \eta$, which
+is a contradiction. Therefore we get no false positives. If $\theta_i^* > \eta
++ \epsilon$, then $|\hat{\theta}_i- \theta_i^*| < \epsilon \implies \theta_j
 > \eta$ and we get all strong parents.
 \end{proof}
 
@@ -259,10 +259,10 @@ by solving \eqref{eq:pre-mle} for $\lambda \defeq 2\sqrt{\frac{\log m}{\alpha n^
 \end{align*}
 \end{theorem}
 
-As before, edge recovery is a consequence of upper-bounding $\|\theta - \hat \theta\|_2$
+As before, edge recovery is a consequence of upper-bounding $\|\theta^* - \hat \theta\|_2$
 
 \begin{corollary}
-    Under the same assumptions as Theorem~\ref{thm:approx_sparse} the number of
+    Under the same assumptions as Theorem~\ref{thm:approx_sparse}, if the number of
     measurements verifies: \begin{equation}
         n > \frac{9}{\alpha\gamma^2\epsilon^2}\left(1+
         \frac{16}{\epsilon^2}\| \theta^* - \theta^*_{\lfloor
@@ -302,13 +302,13 @@ In our case we have:
 \end{multline*}
 Observe that the Hessian of $\mathcal{L}$ can be seen as
 a re-weighted Gram matrix of the observations. In other words, the restricted
-eigenvalue condition expresses that the observed set of active nodes are not
+eigenvalue condition sates that the observed set of active nodes are not
 too collinear with each other.
 
 In the specific case of ``logistic cascades'' (when $f$ is the logistic
 function), the Hessian simplifies to $\nabla^2\mathcal{L}(\theta^*)
-= \frac{1}{|\mathcal{T}|}XX^T$ where $X$ is the design matrix $[x^1 \ldots
-x^\mathcal{|T|}]$. The (RE) condition is then equivalent 
+= \frac{1}{|\mathcal{T}|}XX^T$ where $X$ is the observation matrix $[x^1 \ldots
+x^\mathcal{|T|}]$. The (RE)-condition is then equivalent 
 to the assumption made in the Lasso analysis of \cite{bickel:2009}.
 
 \paragraph{(RE) with high probability}
@@ -325,7 +325,7 @@ matrix $\E[\nabla^2\mathcal{L}(\theta^*)]$, then it holds for the finite sample
 Hessian matrix $\nabla^2\mathcal{L}(\theta^*)$ with high probability. Note that
 the expected Hessian matrix is exactly the Fisher Information matrix of the
 generalized linear cascade model which captures the amount of information about
-$\theta$ conveyed by the random observations. Therefore, under an assumption
+$\theta^*$ conveyed by the random observations. Therefore, under an assumption
 which only involves the probabilistic model and not the actual observations,
 Proposition~\ref{prop:fi} allows us to draw the same conclusion as in
 Theorem~\ref{thm:main}.
@@ -344,10 +344,10 @@ again non restrictive in the (IC) model and (V) model.
 
 \begin{proposition}
     \label{prop:fi}
-    If $\E[\nabla^2\mathcal{L}(\theta^*)]$ verifies the $(S,\gamma)$-{\bf (RE)}
-    condition and assuming {\bf (LF)} and {\bf (LF2)}, then for $\delta> 0$, if $n^{1-\delta}\geq
+    Suppose $\E[\nabla^2\mathcal{L}(\theta^*)]$ verifies the $(S,\gamma)$-{\bf (RE)}
+    condition and assume {\bf (LF)} and {\bf (LF2)}. For $\delta> 0$, if $n^{1-\delta}\geq
 \frac{M+2}{21\gamma\alpha}s^2\log m
-    $, $\nabla^2\mathcal{L}(\theta^*)$ verifies the $(S,\frac{\gamma}{2})$-(RE)
+    $, then $\nabla^2\mathcal{L}(\theta^*)$ verifies the $(S,\frac{\gamma}{2})$-(RE)
     condition, w.p $\geq 1-e^{-n^\delta\log m}$.
 \end{proposition}
 
@@ -390,13 +390,13 @@ again non restrictive in the (IC) model and (V) model.
 Observe that the number of measurements required in Proposition~\ref{prop:fi}
 is now quadratic in $s$. If we only keep the first measurement from each
 cascade which are independent, we can apply Theorem 1.8 from
-\cite{rudelson:13}, lowering the number of required cascades to $s\log(s\log^3
-m)$.
+\cite{rudelson:13}, lowering the number of required cascades to $s\log m \log^3(
+s\log m)$.
 
 \paragraph{(RE) vs Irrepresentability Condition}
 
 \cite{Daneshmand:2014} rely on an `incoherence' condition on the hessian of the
-likelihood function also known as the {\it (S,s)-irrepresentability} condition:
+likelihood function also known as the {\it (S,s)-irrepresentability} condition.
 
 \begin{comment}
 \begin{definition}
author	Thibaut Horel <thibaut.horel@gmail.com>	2015-02-06 18:27:55 -0500
committer	Thibaut Horel <thibaut.horel@gmail.com>	2015-02-06 18:27:55 -0500
commit	85d2de4ff3588b8059b2bb45ec9444097b996aa7 (patch)
tree	06c50a6d889e68a3d7920130a3441bfb4a15d4bf /paper
parent	8d7b0d99cfadc38a15b0f63d28d0edd024e8c5f0 (diff)
download	cascades-85d2de4ff3588b8059b2bb45ec9444097b996aa7.tar.gz