diff options
Diffstat (limited to 'paper/sections/results.tex')
| -rw-r--r-- | paper/sections/results.tex | 17 |
1 files changed, 8 insertions, 9 deletions
diff --git a/paper/sections/results.tex b/paper/sections/results.tex index 965447b..042a06d 100644 --- a/paper/sections/results.tex +++ b/paper/sections/results.tex @@ -1,13 +1,13 @@ -In this section, we exploit standard techniques in sparse recovery and leverage the simple nature of generalized linear models (GLMs) to address the standard problem of edge detection. We extend prior work by that edge weights of the graph can also be recovered. We further relax the sparsity constraint: it is more realistic to assume that the graph will have few `strong' edges, characterized by weights closer to 1, and many `weak' edges, characterized by weights closer to 0. +In this section, we exploit standard techniques in sparse recovery and leverage the simple nature of generalized linear models (GLMs) to address the standard problem of edge detection. We extend prior work by showing that edge weights of the graph can be recovered under similar assumptions, as well as by considering the non-exactly sparse case. \subsection{Recovering Edges and Edge weights} -Recovering the edges of the graph can be formalized as recovering the support of $\Theta$, a problem known as {\it variable selection}. As we have seen above, we can optimize Eq.~\ref{eq:pre-mle} node by node. Our objective is to recover the parents of each node, i.e the non-zero coefficients of $\theta_i \ \forall i$. For the rest of the analysis, we suppose that we consider a single node $i$. For ease of presentation, the index $i$ will be implied: $p_{i,j} = p_j$, $\theta_{i,J} = \theta_j$... +Recovering the edges of the graph can be formalized as recovering the support of the edge weights $\Theta$. This problem is known as {\it variable selection}. Since Eq.~\ref{eq:pre-mle} can be solved node by node, the objective is also known as `parent' recovery. For each node $i$, we seek to find the indices $\{j: \theta_{ij} \neq 0\}$. For the rest of the analysis, we suppose that we consider a single node $i$. For ease of presentation, the index $i$ will be implied. -There have been a series of papers arguing that the standard Lasso is an inappropriate exact variable selection method \cite{Zou:2006}, \cite{vandegeer:2011}, since it relies on the essentially necessary irrepresentability condition, introduced in \cite{Zhao:2006}. However, this condition, on which the analysis of \cite{Daneshmand:2014} relies on, rarely holds in practical situations where correlation between variables occurs, and several alternatives have been suggested (the adaptive lasso, thresholded lasso...) We defer an extended analysis of the irrepresentability assumption to Section~\ref{sec:assumptions}. +There have been a series of papers arguing that the standard Lasso is an inappropriate exact variable selection method \cite{Zou:2006}, \cite{vandegeer:2011}, since it relies on the essentially necessary irrepresentability condition, introduced in \cite{Zhao:2006}. This is the condition on which the analysis of \cite{Daneshmand:2014} relies. It has been noted this condition is rather stringent and rarely holds in practical situations where correlation between variables occurs. Several alternatives have been suggested, including the adaptive and thresholded lasso, which relax this assumption. We defer an extended analysis of the irrepresentability assumption to Section~\ref{sec:assumptions}. -Our approach is different. Rather than trying to perform variable selection directly by finding $\{j: \theta_j \neq 0\}$, we seek to upper-bound $\|\hat \theta - \theta^* \|_2$. It is easy to see that recovering all `strong' edges of the graph is a direct consequence of this analysis: by thresholding all weak $\hat \theta$, one recovers all `strong' parents without false positives, as shown in corollary~\ref{cor:variable_selection}. +Our approach is different. Rather than trying to focus on exact variable selection, we seek to upper-bound the $\ell2$-norm $\|\hat \theta - \theta^* \|_2$. It is easy to see that recovering all `strong' edges of the graph follows directly from this analysis. By thresholding all weak $\hat \theta$, one recovers all `strong' parents without false positives, as shown in corollary~\ref{cor:variable_selection}. -We will first apply standard techniques to obtain a ${\cal O}(\sqrt{\frac{s \log m}{n}})$ $\ell2$-norm upper-bound in the case of sparse vectors. We will then extend this analysis to non-sparse vectors. In section~\ref{sec:lowerbound}, we show that our results are almost tight. +We will first apply standard techniques to obtain a ${\cal O}(\sqrt{\frac{s \log m}{n}})$ $\ell2$-norm upper-bound in the case of sparse vectors. We then extend this analysis to non-sparse vectors. In section~\ref{sec:lowerbound}, we show that our results are almost tight. \subsection{Main Theorem} @@ -42,10 +42,9 @@ Suppose the true vector $\theta^*$ has support S of size s and the {\bf(RE)} ass In section~\ref{subsec:icc}, we find a ${\cal O}(\sqrt{n})$ upper-bound for valid $\lambda_n$. It is also reasonable to assume $\gamma_n = \Omega(n)$, as discussed in section~\ref{sec:assumptions}, yielding a ${\cal O}(1/\sqrt{n})$ decay rate per measurement. The authors believe it is more natural to express these results as the number of measurements $N$, i.e. cumulative number of steps in each cascades, rather the number of cascades $n$. - \subsection{Relaxing the Sparsity Constraint} -In many situations however, and for social networks in particular, the graph is not exactly $s$-sparse. A more realistic situation is one where each nodes has few strong `parents' and many `weaker' parents. Rather than obtaining an impossibility result in this situation, we show that we pay a small price for relaxing the sparsity constraint. If we let $\theta^*_{\lfloor s \rfloor}$ be the best s-sparse approximation to $\theta^*$ defined as +In practice, exact sparsity is rarely verified. For social networks in particular, it is more realistic to assume that each node has few strong `parents' and many `weaker' parents. Rather than obtaining an impossibility result in this situation, we show that we pay a small price for relaxing the sparsity constraint. If we let $\theta^*_{\lfloor s \rfloor}$ be the best s-sparse approximation to $\theta^*$ defined as $$\theta^*_{\lfloor s \rfloor} \defeq \min_{\|\theta\|_0 \leq s} \|\theta - \theta^*\|_1$$ then we pay ${\cal O} \left(\sqrt{\frac{\lambda_n}{\gamma_n}} \|\theta^*_s\|_1 \right)$ for recovering the weights of non-exactly sparse vectors. Since $\|\theta^*_{\lfloor s \rfloor}\|_1$ is the sum of the $\|\theta^*\|_0 -s$ weakest coefficients of $\theta^*$, the closer $\theta^*$ is to being sparse, the smaller the price. These results are formalized in the following theorem: @@ -92,7 +91,7 @@ The following corollary follows easily and gives the first $\Omega(s \log p)$ al \begin{corollary} \label{cor:variable_selection} -Assume that ${\bf (RE)}$ holds with $\gamma_n = n \gamma$ for $\gamma > 0$ and that $\theta$ is s-sparse. Suppose that after solving for $\hat \theta$, we construct the set $\hat {\cal S}_\eta \defeq \{ j \in [1..p] : \hat p_j > \eta\}$ for $\eta > 0$. For $\epsilon>0$ and $\epsilon < \eta$, let ${\cal S}^*_{\eta + \epsilon} \defeq \{ j \in [1..p] :p^*_j > \eta +\epsilon \}$ be the set of all true `strong' parents. Suppose the number of measurements verifies: +Assume that ${\bf (RE)}$ holds with $\gamma_n = n \gamma$ for $\gamma > 0$ and that $p$ is s-sparse. Suppose that after solving for $\hat p$, we construct the set $\hat {\cal S}_\eta \defeq \{ j \in [1..p] : \hat p_j > \eta\}$ for $\eta > 0$. For $\epsilon>0$ and $\epsilon < \eta$, let ${\cal S}^*_{\eta + \epsilon} \defeq \{ j \in [1..p] :p^*_j > \eta +\epsilon \}$ be the set of all true `strong' parents. Suppose the number of measurements verifies: \begin{equation} n > \frac{36}{p_{\min}\gamma^2 \epsilon^2} s \log m \end{equation} @@ -104,7 +103,7 @@ then similarly: ${\cal S}^*_{\eta + \epsilon} \subset \hat {\cal S}_\eta \subset \end{corollary} \begin{proof} -By choosing $\delta = 0$, if $n>\frac{36}{p_{\min}\gamma^2 \epsilon^2} s \log m$, then $\|p-p^*\|_2 < \epsilon < \eta$ with probability $1-\frac{1}{m}$. If $p^*_j = 0$ and $\hat p > \eta$, then $\|p - p^*\|_2 \geq |\hat p_j-p^*_j| > \eta$, which is a contradiction. Therefore we get no false positives. If $p^*_j = \eta + \epsilon$, then $|\hat p_j - (\eta+\epsilon)| < \epsilon/2 \implies p_j > \eta + \epsilon/2$. Therefore, we get all strong parents. +Suppose $p$ is exactly s-sparse. By choosing $\delta = 0$, if $n>\frac{36}{p_{\min}\gamma^2 \epsilon^2} s \log m$, then $\|p-p^*\|_2 < \epsilon < \eta$ with probability $1-\frac{1}{m}$. If $p^*_j = 0$ and $\hat p > \eta$, then $\|p - p^*\|_2 \geq |\hat p_j-p^*_j| > \eta$, which is a contradiction. Therefore we get no false positives. If $p^*_j = \eta + \epsilon$, then $|\hat p_j - (\eta+\epsilon)| < \epsilon/2 \implies p_j > \eta + \epsilon/2$. Therefore, we get all strong parents. The analysis in the non-sparse case is identical. \end{proof} Note that $n$ is the number of measurements and not the number of cascades. This is an improvement over prior work since we expect several measurements per cascade. |
