diff options
| author | Thibaut Horel <thibaut.horel@gmail.com> | 2015-02-06 18:27:55 -0500 |
|---|---|---|
| committer | Thibaut Horel <thibaut.horel@gmail.com> | 2015-02-06 18:27:55 -0500 |
| commit | 85d2de4ff3588b8059b2bb45ec9444097b996aa7 (patch) | |
| tree | 06c50a6d889e68a3d7920130a3441bfb4a15d4bf /paper | |
| parent | 8d7b0d99cfadc38a15b0f63d28d0edd024e8c5f0 (diff) | |
| download | cascades-85d2de4ff3588b8059b2bb45ec9444097b996aa7.tar.gz | |
Typos
Diffstat (limited to 'paper')
| -rw-r--r-- | paper/sections/experiments.tex | 4 | ||||
| -rw-r--r-- | paper/sections/intro.tex | 10 | ||||
| -rw-r--r-- | paper/sections/lowerbound.tex | 2 | ||||
| -rw-r--r-- | paper/sections/model.tex | 19 | ||||
| -rw-r--r-- | paper/sections/results.tex | 62 |
5 files changed, 47 insertions, 50 deletions
diff --git a/paper/sections/experiments.tex b/paper/sections/experiments.tex index 80f95fa..9122e7d 100644 --- a/paper/sections/experiments.tex +++ b/paper/sections/experiments.tex @@ -28,7 +28,7 @@ $\ell_2$-norm $\|\hat \Theta - \Theta\|_2$ for a Kronecker graph which is: (d) e \label{fig:four_figs} \end{table*} -In this section, we validate empirically the results and assumptions of Section~\ref{sec:results} for for varying levels of sparsity and different initializations of parameters ($n$, $m$, $\lambda$, $p_{\text{init}}$), where $p_{\text{init}}$ is the initial probability of a node being a source node. We compare our algorithm to two different state-of-the-art algorithms: \textsc{greedy} and \textsc{mle} from \cite{Netrapalli:2012}. As an extra benchmark, we also introduce a new algorithm \textsc{lasso}, which approximates our \textsc{sparse mle} algorithm. +In this section, we validate empirically the results and assumptions of Section~\ref{sec:results} for varying levels of sparsity and different initializations of parameters ($n$, $m$, $\lambda$, $p_{\text{init}}$), where $p_{\text{init}}$ is the initial probability of a node being a source node. We compare our algorithm to two different state-of-the-art algorithms: \textsc{greedy} and \textsc{mle} from \cite{Netrapalli:2012}. As an extra benchmark, we also introduce a new algorithm \textsc{lasso}, which approximates our \textsc{sparse mle} algorithm. \paragraph{Experimental setup} We evaluate the performance of the algorithms on synthetic graphs, chosen for their similarity to real social networks. We therefore consider a Watts-Strogatz graph ($300$ nodes, $4500$ edges) \cite{watts:1998}, a Barabasi-Albert graph ($300$ nodes, $16200$ edges) \cite{barabasi:2001}, a Holme-Kim power law graph ($200$ nodes, $9772$ edges) \cite{Holme:2002}, and the recently introduced Kronecker graph ($256$ nodes, $10000$ edges) \cite{Leskovec:2010}. Undirected graphs are converted to directed graphs by doubling the edges. @@ -38,7 +38,7 @@ from the (IC) model for $n \in \{100, 500, 1000, 2000, 5000\}$. We compare for e \paragraph{Benchmarks} -We compare our \textsc{sparse mle} algorithm to 3 benchmarks: \textsc{greedy} and \textsc{mle} from \cite{Netrapalli:2012} and \textsc{lasso}. The \textsc{mle} algorithm is a maximum-likelihood estimator without $\ell1$-norm penalization. \textsc{greedy} is an iterative algorithm. We introduced the \textsc{lasso} algorithm in our experiments to achieve faster computation time: +We compare our \textsc{sparse mle} algorithm to 3 benchmarks: \textsc{greedy} and \textsc{mle} from \cite{Netrapalli:2012} and \textsc{lasso}. The \textsc{mle} algorithm is a maximum-likelihood estimator without $\ell_1$-norm penalization. \textsc{greedy} is an iterative algorithm. We introduced the \textsc{lasso} algorithm in our experiments to achieve faster computation time: $$\hat \theta_i \in \arg \min_{\theta} \sum_{t \in {\cal T}} |f(\theta_i\cdot x^t) - x_i^{t+1}|^2 + \lambda \|\theta_i\|_1$$ \textsc{Lasso} has the merit of being both easier and faster to optimize numerically than the other convex-optimization based algorithms. It approximates the $\textsc{sparse mle}$ algorithm by making the assumption that the observations $x_i^{t+1}$ are of the form: $x_i^{t+1} = f(\theta_i\cdot x^t) + \epsilon$, where $\epsilon$ is random white noise. This is not valid in theory since $\epsilon$ \emph{depends on} $f(\theta_i\cdot x^t)$, however the approximation is validated in practice. diff --git a/paper/sections/intro.tex b/paper/sections/intro.tex index d52da66..39f5089 100644 --- a/paper/sections/intro.tex +++ b/paper/sections/intro.tex @@ -46,7 +46,7 @@ The contributions of this paper are the following: \item we formulate the Graph Inference problem in the context of discrete-time influence cascades as a sparse recovery problem for a specific type of Generalized Linear Model. This formulation notably - encompases the well-studied Independent Cascade Model and Voter Model. + encompasses the well-studied Independent Cascade Model and Voter Model. \item we give an algorithm which recovers the graph's edges using $\O(s\log m)$ cascades. Furthermore, we show that our algorithm is also able to recover the edge weights (the parameters of the influence model), @@ -55,7 +55,7 @@ The contributions of this paper are the following: recover is approximately $s$-sparse by proving guarantees in the \emph{stable recovery} setting. \item we provide an almost tight lower bound of $\Omega(s\log \frac{m}{s})$ - observations. + observations required for sparse recovery. \end{itemize} The organization of the paper is as follows: we conclude the introduction by @@ -69,7 +69,7 @@ Section~\ref{sec:experiments}. \paragraph{Related Work} The study of edge prediction in graphs has been an active field of research for -over a decade \cite{Nowell08} \cite{Leskovec07} \cite{AdarA05}. +over a decade \cite{Nowell08, Leskovec07, AdarA05}. \cite{GomezRodriguez:2010} introduced the {\scshape Netinf} algorithm, which approximates the likelihood of cascades represented as a continuous process. The algorithm was improved in later work \cite{gomezbalduzzi:2011}, but is not @@ -82,7 +82,7 @@ analysis depends on a {\it correlation decay} assumption, which limits the number of new infections at every step. In this setting, they show a lower bound of the number of cascades needed for support recovery with constant probability of the order $\Omega(s \log (m/s))$. They also suggest a {\scshape -Greedy} algorithm, which achieves an ${\cal O}(s \log m)$ guarantee in the case +Greedy} algorithm, which achieves a ${\cal O}(s \log m)$ guarantee in the case of tree graphs. The work of \cite{Abrahao:13} studies the same continuous-model framework as \cite{GomezRodriguez:2010} and obtains an ${\cal O}(s^9 \log^2 s \log m)$ support recovery algorithm, without the \emph{correlation decay} assumption. @@ -99,7 +99,7 @@ m)$, by exploiting similar properties of the convex program's KKT conditions. In contrast, our work studies discrete-time diffusion processes including the Independent Cascade model under weaker assumptions. Furthermore, we analyze both the recovery of the graph's edges and the estimation of the model's -parameters, and achieve closer to optimal bounds. +parameters, and achieve close to optimal bounds. \begin{comment} Their work has the merit of studying a generalization of the discrete-time independent cascade model to continuous functions. Similarly to \cite{Abrahao:13}, they place themselves in diff --git a/paper/sections/lowerbound.tex b/paper/sections/lowerbound.tex index becd13f..a7ae21c 100644 --- a/paper/sections/lowerbound.tex +++ b/paper/sections/lowerbound.tex @@ -4,7 +4,7 @@ support recovery with constant probability under a \emph{correlation decay} assumption. In this section, we will consider the stable sparse recovery setting of Section~\ref{sec:relaxing_sparsity}. Our goal is to obtain an information-theoretic lower bound on the number of measurements necessary to -approximately recover the parameter $\theta$ of a cascade model from observed +approximately recover the parameter $\theta^*$ of a cascade model from observed cascades. Similar lower bounds were obtained for sparse \emph{linear} inverse problems in \cite{pw11, pw12, bipw11}. diff --git a/paper/sections/model.tex b/paper/sections/model.tex index d8706bc..01f816b 100644 --- a/paper/sections/model.tex +++ b/paper/sections/model.tex @@ -18,9 +18,7 @@ state space $\{0, 1, \dots, K-1\}^V$ with the following properties: \end{enumerate} In other words, a cascade model describes a diffusion process where a set of -contagious nodes ``influence'' other nodes in the graph to become in turn -contagious. An \emph{influence cascade} is a realisation of this random -process: the successive states of the nodes in graph ${\cal G}$. Note that both +contagious nodes ``influence'' other nodes in the graph to become contagious. An \emph{influence cascade} is a realisation of this random process, \emph{i.e.} the successive states of the nodes in graph ${\cal G}$. Note that both the ``single source'' assumption made in \cite{Daneshmand:2014} and \cite{Abrahao:13} as well as the ``uniformly chosen source set'' assumption made in \cite{Netrapalli:2012} verify condition 3. @@ -33,7 +31,7 @@ a more general class of transition probabilities while staying in the discrete-time setting. We observe that despite their obvious differences, both the independent cascade and the voter models make the graph inference problem similar to the standard generalized linear model inference problem. In fact, we -define a class of diffusion processes for which this true: the +define a class of diffusion processes for which this is true: the \emph{Generalized Linear Cascade Models}. The linear threshold model is a special case and is discussed in Section~\ref{sec:linear_threshold}. @@ -101,7 +99,7 @@ In the independent cascade model, nodes can be either susceptible, contagious or immune. At $t=0$, all source nodes are ``contagious'' and all remaining nodes are ``susceptible''. At each time step $t$, for each edge $(i,j)$ where $j$ is susceptible and $i$ is contagious, $i$ attempts to infect $j$ with -probability $p_{i,j}\in[0,1]$, the infection attempts are mutually independent. +probability $p_{i,j}\in(0,1]$; the infection attempts are mutually independent. If $i$ succeeds, $j$ will become contagious at time step $t+1$. Regardless of $i$'s success, node $i$ will be immune at time $t+1$. In other words, nodes stay contagious for only one time step. The cascade process terminates when no @@ -126,8 +124,7 @@ with inverse link function $f : z \mapsto 1 - e^z$. \subsubsection{The Linear Voter Model} -In the Linear Voter Model, nodes can be either \emph{red} or \emph{blue}. Both -states are symmetric, but without loss of generality, we can suppose that the +In the Linear Voter Model, nodes can be either \emph{red} or \emph{blue}. Without loss of generality, we can suppose that the \emph{blue} nodes are contagious. The parameters of the graph are normalized such that $\forall i, \ \sum_j \Theta_{i,j} = 1$ and we assume that $\Theta_{i,i}$ is always non-zero, meaning that all nodes have self-loops. @@ -188,8 +185,8 @@ problem: \hat{\Theta} \in \argmax_{\Theta} \frac{1}{n} \mathcal{L}(\Theta\,|\,x^1,\ldots,x^n) - \lambda\|\Theta\|_1 \end{displaymath} -where $\lambda$ is the regularization factor which helps at preventing -overfitting as well as controlling for the sparsity of the solution. +where $\lambda$ is the regularization factor which helps preventing +overfitting and controls the sparsity of the solution. The generalized linear cascade model is decomposable in the following sense: given Definition~\ref{def:glcm}, the log-likelihood can be written as the sum @@ -209,8 +206,8 @@ where we denote by ${\cal T}_i$ the time steps at which node $i$ is susceptible In the case of the voter model, the measurements include all time steps until we reach the time horizon $T$ or the graph coalesces to a single state. In the case of the independent cascade model, the measurements include all time steps -until node $i$ becomes contagious after which its behavior is deterministic. -Contrary to prior work, we express our results in terms of the number of +until node $i$ becomes contagious, after which its behavior is deterministic. +Contrary to prior work, our results depend on the number of measurements and not the number of cascades. A sufficient condition for program~\eqref{eq:pre-mle} to be a convex program is diff --git a/paper/sections/results.tex b/paper/sections/results.tex index c83e4d6..f3c5366 100644 --- a/paper/sections/results.tex +++ b/paper/sections/results.tex @@ -20,7 +20,7 @@ a set $\mathcal{T}$ of observations. We will write $n\defeq |\mathcal{T}|$. \label{sec:main_theorem} In this section, we analyze the case where $\theta^*$ is exactly sparse. We -write $S\defeq\text{supp}(\theta^*)$ and $s=|S|$. Recall, that $\theta_i$ is the vector of weights for all edges \emph{directed at} the node we are solving for. In other words, $S$ is the set of all nodes susceptible to influence our node $i$, also referred to as its parents. Our main theorem will rely on the now standard \emph{restricted eigenvalue condition}. +write $S\defeq\text{supp}(\theta^*)$ and $s=|S|$. Recall, that $\theta_i$ is the vector of weights for all edges \emph{directed at} the node we are solving for. In other words, $S$ is the set of all nodes susceptible to influence node $i$, also referred to as its parents. Our main theorem will rely on the now standard \emph{restricted eigenvalue condition}. \begin{definition} Let $\Sigma\in\mathcal{S}_m(\reals)$ be a real symmetric matrix and $S$ be @@ -37,8 +37,8 @@ write $S\defeq\text{supp}(\theta^*)$ and $s=|S|$. Recall, that $\theta_i$ is the A discussion of the $(S,\gamma)$-{\bf(RE)} assumption in the context of generalized linear cascade models can be found in Section~\ref{sec:re}. In our -setting we will require that this property holds for the Hessian of the -log-likelihood function $\mathcal{L}$ and essentially captures that the binary +setting we require that the (RE)-condition holds for the Hessian of the +log-likelihood function $\mathcal{L}$: it essentially captures the fact that the binary vectors of the set of active nodes are not \emph{too} collinear. We will also need the following assumption on the inverse link function $f$ of the generalized linear cascade model: @@ -53,8 +53,8 @@ $f(\inprod{\theta^*}{x})\notin\{0,1\}$. In the voter model, $\frac{f'}{f}(z) = \frac{1}{z}$ and $\frac{f'}{f}(1-z) = \frac{1}{1-z}$. Hence (LF) will hold as soon as $\alpha\leq \Theta_{i,j}\leq -1-\alpha$ for all $(i,j)\in E$ which is always satisfied for some $\alpha$. -Similarly, in the Independent Cascade Model, $\frac{f'}{f}(z) +1-\alpha$ for all $(i,j)\in E$ which is always satisfied for some $\alpha$ for non-isolated nodes. +In the Independent Cascade Model, $\frac{f'}{f}(z) = \frac{1}{1-e^{-z}}$ and $\frac{f'}{1-f}(z) = -1$ and (LF) is not restrictive. \begin{comment} @@ -74,7 +74,7 @@ $\alpha > 0$. For any $\delta\in(0,1)$, let $\hat{\theta}$ be the solution of \begin{equation} \label{eq:sparse} \|\hat \theta - \theta^* \|_2 - \leq \frac{3}{\gamma} \sqrt{\frac{s \log m}{\alpha n^{1-\delta}}} + \leq \frac{6}{\gamma} \sqrt{\frac{s \log m}{\alpha n^{1-\delta}}} \quad \text{w.p.}\;1-\frac{1}{e^{n^\delta \log m}} \end{equation} @@ -91,10 +91,10 @@ cast it as a generalized linear cascade model, we had to perform the transformation $\Theta_{i,j} = \log(1-p_{i,j})$, where $p_{i,j}$ are the infection probabilities. Fortunately, if we estimate $p_j$ through $\hat{p}_{j} = \hat{\theta}_j$, a bound on $\|\hat\theta - \theta^*\|_2$ directly implies -a bound on $\|\hat p - p^*\|$. Indeed we have: +a bound on $\|\hat p - p^*\|_2$: \begin{lemma} - $\|\hat{\theta} - \theta' \|_2 \geq \|\hat{p} - p^*\|_2$. + $\|\hat{\theta} - \theta^* \|_2 \geq \|\hat{p} - p^*\|_2$. \end{lemma} \begin{proof} Using the inequality $\forall x>0, \; \log x \geq 1 - \frac{1}{x}$, we have @@ -115,7 +115,7 @@ Let ${\cal C}(S) \defeq \{ \Delta \in \mathbb{R}^m\,|\,\|\Delta_S\|_1 \leq \label{eq:rc} \forall \Delta \in {\cal C}(S), \; {\cal L}(\theta^* + \Delta) - {\cal L}(\theta^*)\\ - - \inprod{\Delta {\cal L}(\theta^*)}{\Delta} + - \inprod{\nabla {\cal L}(\theta^*)}{\Delta} \geq \kappa_{\cal L} \|\Delta\|_2^2 - \tau_{\cal L}^2(\theta^*) \end{multline} for some $\kappa_{\cal L} > 0$ and function $\tau_{\cal L}$. Finally suppose @@ -131,7 +131,7 @@ $\hat{\theta}_\lambda$ is the solution of \eqref{eq:pre-mle}: To prove Theorem~\ref{thm:main}, we apply Lemma~\ref{lem:negahban} with $\tau_{\mathcal{L}}=0$. Since $\mathcal{L}$ is twice differentiable and convex, assumption \eqref{eq:rc} with $\kappa_{\mathcal{L}}=\frac{\gamma}{2}$ is -implied by the (RE) condition \eqref{eq:re}. The upper bound +implied by the (RE)-condition.. The upper bound on the $\ell_{\infty}$ norm of $\nabla\mathcal{L}(\theta^*)$ is given by Lemma~\ref{lem:ub}. @@ -156,7 +156,7 @@ The gradient of $\mathcal{L}$ is given by: \end{multline*} Let $\partial_j \mathcal{L}(\theta)$ be the $j$-th coordinate of -$\nabla\mathcal{L}(\theta^*)$. Writing: +$\nabla\mathcal{L}(\theta^*)$. Writing $\partial_j\mathcal{L}(\theta^*) = \frac{1}{|\mathcal{T}|}\sum_{t\in\mathcal{T}} Y_t$ and since $\E[x_i^{t+1}|x^t]= f(\inprod{\theta^*}{x^t})$, we have that $\E[Y_{t+1}|Y_t] @@ -172,7 +172,7 @@ and we can apply Azuma's inequality to $Z_t$: Applying a union bound to have the previous inequality hold for all coordinates of $\nabla\mathcal{L}(\theta)$ implies: \begin{align*} - \P\big[\|\nabla\mathcal{L}(\theta)\|_{\infty}\geq \lambda \big] + \P\big[\|\nabla\mathcal{L}(\theta^*)\|_{\infty}\geq \lambda \big] &\leq 2m\exp\left(\frac{-\lambda^2n\alpha}{2}\right) \end{align*} Choosing $\lambda\defeq 2\sqrt{\frac{\log m}{\alpha n^{1-\delta}}}$ concludes the @@ -186,10 +186,10 @@ solve the Graph Inference problem. \begin{corollary} \label{cor:variable_selection} -Under the same assumptions of Theorem~\ref{thm:main}, let $\hat {\cal S}_\eta +Under the same assumptions as Theorem~\ref{thm:main}, let $\hat {\cal S}_\eta \defeq \{ j \in \{1,\ldots, m\} : \hat{\theta}_j > \eta\}$ for $\eta > 0$. For $\epsilon>0$ and $\epsilon < \eta$, let ${\cal S}^*_{\eta + \epsilon} \defeq \{ -i \in \{1,\ldots,m\} :\theta_i > \eta +\epsilon \}$ be the set of all true `strong' +i \in \{1,\ldots,m\} :\theta_i^* > \eta +\epsilon \}$ be the set of all true `strong' parents. Suppose the number of measurements verifies: $ n > \frac{9s\log m}{\alpha\gamma^2\epsilon^2}$. Then with probability $1-\frac{1}{m}$, ${\cal S}^*_{\eta + \epsilon} \subseteq @@ -199,11 +199,11 @@ parents and no `false' parents. \begin{proof} By choosing $\delta = 0$, if $ n > \frac{9s\log -m}{\alpha\gamma^2\epsilon^2}$, then $\|\hat \theta-\theta\|_2 < \epsilon < \eta$ -with probability $1-\frac{1}{m}$. If $\theta_i = 0$ and $\hat \theta > \eta$, -then $\|\hat \theta - \theta\|_2 \geq |\hat \theta_i-\theta_j| > \eta$, which -is a contradiction. Therefore we get no false positives. If $\theta_i \leq \eta -+ \epsilon$, then $|\hat{\theta}_i- \theta_i| < \epsilon \implies \theta_j +m}{\alpha\gamma^2\epsilon^2}$, then $\|\hat \theta-\theta^*\|_2 < \epsilon < \eta$ +with probability $1-\frac{1}{m}$. If $\theta_i^* = 0$ and $\hat \theta > \eta$, +then $\|\hat \theta - \theta^*\|_2 \geq |\hat \theta_i-\theta_i^*| > \eta$, which +is a contradiction. Therefore we get no false positives. If $\theta_i^* > \eta ++ \epsilon$, then $|\hat{\theta}_i- \theta_i^*| < \epsilon \implies \theta_j > \eta$ and we get all strong parents. \end{proof} @@ -259,10 +259,10 @@ by solving \eqref{eq:pre-mle} for $\lambda \defeq 2\sqrt{\frac{\log m}{\alpha n^ \end{align*} \end{theorem} -As before, edge recovery is a consequence of upper-bounding $\|\theta - \hat \theta\|_2$ +As before, edge recovery is a consequence of upper-bounding $\|\theta^* - \hat \theta\|_2$ \begin{corollary} - Under the same assumptions as Theorem~\ref{thm:approx_sparse} the number of + Under the same assumptions as Theorem~\ref{thm:approx_sparse}, if the number of measurements verifies: \begin{equation} n > \frac{9}{\alpha\gamma^2\epsilon^2}\left(1+ \frac{16}{\epsilon^2}\| \theta^* - \theta^*_{\lfloor @@ -302,13 +302,13 @@ In our case we have: \end{multline*} Observe that the Hessian of $\mathcal{L}$ can be seen as a re-weighted Gram matrix of the observations. In other words, the restricted -eigenvalue condition expresses that the observed set of active nodes are not +eigenvalue condition sates that the observed set of active nodes are not too collinear with each other. In the specific case of ``logistic cascades'' (when $f$ is the logistic function), the Hessian simplifies to $\nabla^2\mathcal{L}(\theta^*) -= \frac{1}{|\mathcal{T}|}XX^T$ where $X$ is the design matrix $[x^1 \ldots -x^\mathcal{|T|}]$. The (RE) condition is then equivalent += \frac{1}{|\mathcal{T}|}XX^T$ where $X$ is the observation matrix $[x^1 \ldots +x^\mathcal{|T|}]$. The (RE)-condition is then equivalent to the assumption made in the Lasso analysis of \cite{bickel:2009}. \paragraph{(RE) with high probability} @@ -325,7 +325,7 @@ matrix $\E[\nabla^2\mathcal{L}(\theta^*)]$, then it holds for the finite sample Hessian matrix $\nabla^2\mathcal{L}(\theta^*)$ with high probability. Note that the expected Hessian matrix is exactly the Fisher Information matrix of the generalized linear cascade model which captures the amount of information about -$\theta$ conveyed by the random observations. Therefore, under an assumption +$\theta^*$ conveyed by the random observations. Therefore, under an assumption which only involves the probabilistic model and not the actual observations, Proposition~\ref{prop:fi} allows us to draw the same conclusion as in Theorem~\ref{thm:main}. @@ -344,10 +344,10 @@ again non restrictive in the (IC) model and (V) model. \begin{proposition} \label{prop:fi} - If $\E[\nabla^2\mathcal{L}(\theta^*)]$ verifies the $(S,\gamma)$-{\bf (RE)} - condition and assuming {\bf (LF)} and {\bf (LF2)}, then for $\delta> 0$, if $n^{1-\delta}\geq + Suppose $\E[\nabla^2\mathcal{L}(\theta^*)]$ verifies the $(S,\gamma)$-{\bf (RE)} + condition and assume {\bf (LF)} and {\bf (LF2)}. For $\delta> 0$, if $n^{1-\delta}\geq \frac{M+2}{21\gamma\alpha}s^2\log m - $, $\nabla^2\mathcal{L}(\theta^*)$ verifies the $(S,\frac{\gamma}{2})$-(RE) + $, then $\nabla^2\mathcal{L}(\theta^*)$ verifies the $(S,\frac{\gamma}{2})$-(RE) condition, w.p $\geq 1-e^{-n^\delta\log m}$. \end{proposition} @@ -390,13 +390,13 @@ again non restrictive in the (IC) model and (V) model. Observe that the number of measurements required in Proposition~\ref{prop:fi} is now quadratic in $s$. If we only keep the first measurement from each cascade which are independent, we can apply Theorem 1.8 from -\cite{rudelson:13}, lowering the number of required cascades to $s\log(s\log^3 -m)$. +\cite{rudelson:13}, lowering the number of required cascades to $s\log m \log^3( +s\log m)$. \paragraph{(RE) vs Irrepresentability Condition} \cite{Daneshmand:2014} rely on an `incoherence' condition on the hessian of the -likelihood function also known as the {\it (S,s)-irrepresentability} condition: +likelihood function also known as the {\it (S,s)-irrepresentability} condition. \begin{comment} \begin{definition} |
