reworded beginning of intro

author: jeanpouget-abadie <jean.pougetabadie@gmail.com> 2015-01-30 00:00:15 -0500
committer: jeanpouget-abadie <jean.pougetabadie@gmail.com> 2015-01-30 00:00:15 -0500
commit: 6491c0bccc42818cd9d71c1c6fd4bd9af953db2e (patch)
tree: 631dd29fb0f3c2d5ba315373f663c1ce897fa1e2
parent: 75b01a49b0310b753198007455a79f39822bd3af (diff)
download: cascades-6491c0bccc42818cd9d71c1c6fd4bd9af953db2e.tar.gz
5 files changed, 26 insertions, 16 deletions
diff --git a/paper/sections/assumptions.tex b/paper/sections/assumptions.tex
index 8b7e3ee..a697d38 100644
--- a/paper/sections/assumptions.tex
+++ b/paper/sections/assumptions.tex
@@ -25,11 +25,18 @@ If the irrepresentability condition holds with $\epsilon > \frac{2}{3}$, then th
 
 \subsection{The Restricted Eigenvalue Condition}
 
-Expressing the restricted eigenvalue assumption for correlated measurements as parameters of the graph and the cascade diffusion process is non-trivial. Under reasonable assumptions on the graph parameters, we can show a very crude ${\cal O}(N)$-lower bound for $\gamma_n$ by exploiting only the first set of measurements, where only the source nodes are active. Note that even though we waste a lot of information, we obtain similar asymptotic behavior than previous work.
+Expressing the restricted eigenvalue assumption for correlated measurements as parameters of the graph and the cascade diffusion process is non-trivial. The restricted eigenvalue property is however well behaved in the following sense:
 
-Similarly to the analysis conducted in \cite{Daneshmand:2014}, we can show that if the eigenvalue can be showed to hold for the `expected' hessian, it can be showed to hold for the hessian itself.
+\begin{lemma}
+\label{lem:expected_hessian}
+Expected hessian analysis!
+\end{lemma}
+
+This result is easily proved by adapting slightly a result from \cite{vandegeer:2009} XXX. Similarly to the analysis conducted in \cite{Daneshmand:2014}, we can show that if the eigenvalue can be showed to hold for the `expected' hessian, it can be showed to hold for the hessian itself. It is easy to see that:
 
 \begin{proposition}
 \label{prop:expected_hessian}
 If result holds for the expected hessian, then it holds for the hessian!
 \end{proposition}
+
+It is most likely possible to remove this extra s factor. See sub-gaussian paper by ... but the calculations are more involved.
diff --git a/paper/sections/intro.tex b/paper/sections/intro.tex
index aa65b5d..d2db035 100644
--- a/paper/sections/intro.tex
+++ b/paper/sections/intro.tex
@@ -1,8 +1,10 @@
-Many real-world phenomena can be modeled as complex diffusion processes on networks: from behavior adoption, sharing of internet memes, citations of articles, and the spread of infectious diseases. Oftentimes, the exact network is unknown to us: we observe only the behavior of the nodes through time, without knowledge of who `influenced' whom. The spread of a particular behavior through a network is known as an {\it Influence Cascade}. In this context, the {\it Graph Inference} problem is to recover the edges of the graph from the observation of few influence cascades.
+A recent line of research has focused on applying advances in sparse recovery to graph analysis. A graph can be interpreted as a signal that one seeks to `compress' or `sketch' and then `recovered'. However, we could also consider the situation where the graph is unknown to us, and we dispose of few measurements to recover the signal. Which kind of measurements do we dispose of in practice?
 
-Recent research efforts have focused on constructing an effective algorithm which recovers a large majority of edges correctly from very few cascades. It has shown that the graph inference problem can be solved, under various assumptions, in ${\cal O}(poly(\Delta) \log m)$ number of observed cascades, where $\Delta$ is the maximum degree oand $m$ the number of nodes in the graph. In other words, the dependence of the number of cascades required to reconstruct the graph is (almost miraculously) logarithmic in the number of nodes of the graph.
+A diffusion process on a graph, which depends on the edges and edge weights in the graph, provides valuable information. By observing the sequence of nodes which become `infected' over time without knowledge of who has `influenced' whom, can we recover the graph on which this process takes place? The spread of a particular behavior through a network is known as an {\it Influence Cascade}. In this context, the {\it Graph Inference} problem is to recover the edges of the graph from the observation of few influence cascades. We propose to show how sparse recovery can be applied to solve this recently introduced graph inference problem.
 
-Since the first few papers on link prediction in networks, the research community has made good progress in defining the Graph Inference problem more clearly and suggesting effective algorithms. We show that these results can be improved to ${\cal O}(\Delta \log m)$, which is tight to a certain extent. We also that the edge weights themselves can be estimated under the same assumptions.
+Recent research efforts in solving the graph inference problem have focused on constructing an effective algorithm which recovers a large majority of edges correctly from very few cascades. It has been shown that the graph inference problem can be solved in ${\cal O}(poly(s) \log m)$ number of observed cascades, where $s$ is the maximum degree and $m$ the number of nodes in the graph. In other words, the dependence of the number of cascades required to reconstruct the graph is (almost miraculously) logarithmic in the number of nodes of the graph. However, results in the sparse recovery literature lead us to believe that ${\cal O}(s \log m/s$ should be sufficient to recover the graph. In fact, we prove this lower bound in a very general sense. We also suggest a ${\cal O}(\Delta \log m)$-algorithm, which is almost tight. We also that the edge weights themselves can be estimated under the same assumptions.
+
+Since the first few papers on link prediction in networks, the research community has made good progress in defining the Graph Inference problem more clearly and suggesting effective algorithms. 
 
 Throughout this paper, we focus on the two main diffusion processes, outlined in the seminal work \cite{Kempe:03}: the independent cascade model (IC) and the linear threshold model.
 
diff --git a/paper/sections/linear_threshold.tex b/paper/sections/linear_threshold.tex
index 8ff64b3..2c0fb62 100644
--- a/paper/sections/linear_threshold.tex
+++ b/paper/sections/linear_threshold.tex
@@ -13,9 +13,9 @@ where we defined again $\theta_j\defeq (\Theta_{1,j},\ldots,\Theta_{m,j})$ and $
 \begin{equation}
     \label{eq:lt}
     \tag{LT}
-    y_{t} = \text{sign} \left(\inprod{\theta_j}{X^t} - t_j \right)
+    y^t_j = \text{sign} \left(\inprod{\theta_j}{X^t} - t_j \right)
 \end{equation}
 
 The linear threshold model can therefore be cast as a variant of the recently introduced 1-bit compressed sensing model \cite{Boufounos:2008}. Several recent papers study the theoretical guarantees obtained for 1-bit compressed sensing with specific measurements \cite{Gupta:2010}, \cite{Plan:2014}. Others study adaptive mechanisms \cite{Jacques:2013}.
 
-Support recovery results in the 1-bit compressed sensing model rely on the fact that well-chosen random hyperplanes segment a compact set into small cells. Knowing which side of the hyperplanes you are on situates you in one of these cells of bounded size. Whilst the obtained bounds are also of the order ${\cal O}(n \log \frac{n}{s}$), no current theory exists for recovering positive bounded signal from bernoulli hyperplanes. We leave this research direction to future work.
-\ No newline at end of file
+Support recovery results in the 1-bit compressed sensing model rely on the fact that well-chosen random hyperplanes segment a compact set into small cells. Knowing which side of the hyperplanes you are on situates you in one of these cells of bounded size. Whilst they obtained bounds are also of the order ${\cal O}(n \log \frac{n}{s}$), no current theory exists for recovering positive bounded signals from bernoulli hyperplanes. We leave this research direction to future work.
+\ No newline at end of file
diff --git a/paper/sections/model.tex b/paper/sections/model.tex
index 25a0a47..5a0a96f 100644
--- a/paper/sections/model.tex
+++ b/paper/sections/model.tex
@@ -50,6 +50,8 @@ as:
 \end{equation}
 where we defined $\theta_j\defeq (\Theta_{1,j},\ldots,\Theta_{m,j})$.
 
+\subsection{The Voter Model}
+
 \subsection{Generalized Linear Models}
 
 \subsection{Maximum Likelihood Estimation}
diff --git a/paper/sections/results.tex b/paper/sections/results.tex
index 965447b..042a06d 100644
--- a/paper/sections/results.tex
+++ b/paper/sections/results.tex
@@ -1,13 +1,13 @@
-In this section, we exploit standard techniques in sparse recovery and leverage the simple nature of generalized linear models (GLMs) to address the standard problem of edge detection. We extend prior work by that edge weights of the graph can also be recovered. We further relax the sparsity constraint: it is more realistic to assume that the graph will have few `strong' edges, characterized by weights closer to 1, and many `weak' edges, characterized by weights closer to 0.
+In this section, we exploit standard techniques in sparse recovery and leverage the simple nature of generalized linear models (GLMs) to address the standard problem of edge detection. We extend prior work by showing that edge weights of the graph can be recovered under similar assumptions, as well as by considering the non-exactly sparse case.
 
 \subsection{Recovering Edges and Edge weights} 
-Recovering the edges of the graph can be formalized as recovering the support of $\Theta$, a problem known as {\it variable selection}. As we have seen above, we can optimize Eq.~\ref{eq:pre-mle} node by node. Our objective is to recover the parents of each node, i.e the non-zero coefficients of $\theta_i \ \forall i$. For the rest of the analysis, we suppose that we consider a single node $i$. For ease of presentation, the index $i$ will be implied: $p_{i,j} = p_j$, $\theta_{i,J} = \theta_j$...
+Recovering the edges of the graph can be formalized as recovering the support of the edge weights $\Theta$. This problem is known as {\it variable selection}. Since Eq.~\ref{eq:pre-mle} can be solved node by node, the objective is also known as `parent' recovery. For each node $i$, we seek to find the indices $\{j: \theta_{ij} \neq 0\}$. For the rest of the analysis, we suppose that we consider a single node $i$. For ease of presentation, the index $i$ will be implied.
 
-There have been a series of papers arguing that the standard Lasso is an inappropriate exact variable selection method \cite{Zou:2006}, \cite{vandegeer:2011}, since it relies on the essentially necessary irrepresentability condition, introduced in \cite{Zhao:2006}. However, this condition, on which the analysis of \cite{Daneshmand:2014} relies on, rarely holds in practical situations where correlation between variables occurs, and several alternatives have been suggested (the adaptive lasso, thresholded lasso...) We defer an extended analysis of the irrepresentability assumption to Section~\ref{sec:assumptions}.
+There have been a series of papers arguing that the standard Lasso is an inappropriate exact variable selection method \cite{Zou:2006}, \cite{vandegeer:2011}, since it relies on the essentially necessary irrepresentability condition, introduced in \cite{Zhao:2006}. This is the condition on which the analysis of \cite{Daneshmand:2014} relies. It has been noted this condition is rather stringent and rarely holds in practical situations where correlation between variables occurs. Several alternatives have been suggested, including the adaptive and thresholded lasso, which relax this assumption. We defer an extended analysis of the irrepresentability assumption to Section~\ref{sec:assumptions}.
 
-Our approach is different. Rather than trying to perform variable selection directly by finding $\{j: \theta_j \neq 0\}$, we seek to upper-bound $\|\hat \theta - \theta^* \|_2$. It is easy to see that recovering all `strong' edges of the graph is a direct consequence of this analysis: by thresholding all weak $\hat \theta$, one recovers all `strong' parents without false positives, as shown in corollary~\ref{cor:variable_selection}.
+Our approach is different. Rather than trying to focus on exact variable selection, we seek to upper-bound  the $\ell2$-norm $\|\hat \theta - \theta^* \|_2$. It is easy to see that recovering all `strong' edges of the graph follows directly from this analysis. By thresholding all weak $\hat \theta$, one recovers all `strong' parents without false positives, as shown in corollary~\ref{cor:variable_selection}.
 
-We will first apply standard techniques to obtain a ${\cal O}(\sqrt{\frac{s \log m}{n}})$ $\ell2$-norm upper-bound in the case of sparse vectors. We will then extend this analysis to non-sparse vectors. In section~\ref{sec:lowerbound}, we show that our results are almost tight.
+We will first apply standard techniques to obtain a ${\cal O}(\sqrt{\frac{s \log m}{n}})$ $\ell2$-norm upper-bound in the case of sparse vectors. We then extend this analysis to non-sparse vectors. In section~\ref{sec:lowerbound}, we show that our results are almost tight.
 
 \subsection{Main Theorem}
 
@@ -42,10 +42,9 @@ Suppose the true vector $\theta^*$ has support S of size s and the {\bf(RE)} ass
 
 In section~\ref{subsec:icc}, we find a ${\cal O}(\sqrt{n})$ upper-bound for valid $\lambda_n$. It is also reasonable to assume $\gamma_n = \Omega(n)$, as discussed in section~\ref{sec:assumptions}, yielding a ${\cal O}(1/\sqrt{n})$ decay rate per measurement. The authors believe it is more natural to express these results as the number of measurements $N$, i.e. cumulative number of steps in each cascades, rather the number of cascades $n$.
 
-
 \subsection{Relaxing the Sparsity Constraint}
 
-In many situations however, and for social networks in particular, the graph is not exactly $s$-sparse. A more realistic situation is one where each nodes has few strong `parents' and many `weaker' parents. Rather than obtaining an impossibility result in this situation, we show that we pay a small price for relaxing the sparsity constraint. If we let $\theta^*_{\lfloor s \rfloor}$ be the best s-sparse approximation to $\theta^*$ defined as 
+In practice, exact sparsity is rarely verified. For social networks in particular, it is more realistic to assume that each node has few strong `parents' and many `weaker' parents. Rather than obtaining an impossibility result in this situation, we show that we pay a small price for relaxing the sparsity constraint. If we let $\theta^*_{\lfloor s \rfloor}$ be the best s-sparse approximation to $\theta^*$ defined as 
 $$\theta^*_{\lfloor s \rfloor} \defeq \min_{\|\theta\|_0 \leq s} \|\theta - \theta^*\|_1$$
 then we pay ${\cal O} \left(\sqrt{\frac{\lambda_n}{\gamma_n}} \|\theta^*_s\|_1 \right)$ for recovering the weights of non-exactly sparse vectors. Since $\|\theta^*_{\lfloor s \rfloor}\|_1$ is the sum of the $\|\theta^*\|_0 -s$ weakest coefficients of $\theta^*$, the closer $\theta^*$ is to being sparse, the smaller the price. These results are formalized in the following theorem:
 
@@ -92,7 +91,7 @@ The following corollary follows easily and gives the first $\Omega(s \log p)$ al
 
 \begin{corollary}
 \label{cor:variable_selection}
-Assume that ${\bf (RE)}$ holds with $\gamma_n = n \gamma$ for $\gamma > 0$ and that $\theta$ is s-sparse. Suppose that after solving for $\hat \theta$, we construct the set $\hat {\cal S}_\eta \defeq \{ j \in [1..p] : \hat p_j > \eta\}$ for $\eta > 0$. For $\epsilon>0$ and $\epsilon < \eta$, let ${\cal S}^*_{\eta + \epsilon} \defeq \{ j \in [1..p] :p^*_j > \eta +\epsilon \}$ be the set of all true `strong' parents. Suppose the number of measurements verifies:
+Assume that ${\bf (RE)}$ holds with $\gamma_n = n \gamma$ for $\gamma > 0$ and that $p$ is s-sparse. Suppose that after solving for $\hat p$, we construct the set $\hat {\cal S}_\eta \defeq \{ j \in [1..p] : \hat p_j > \eta\}$ for $\eta > 0$. For $\epsilon>0$ and $\epsilon < \eta$, let ${\cal S}^*_{\eta + \epsilon} \defeq \{ j \in [1..p] :p^*_j > \eta +\epsilon \}$ be the set of all true `strong' parents. Suppose the number of measurements verifies:
 \begin{equation}
 n > \frac{36}{p_{\min}\gamma^2 \epsilon^2} s \log m
 \end{equation}
@@ -104,7 +103,7 @@ then similarly: ${\cal S}^*_{\eta + \epsilon} \subset \hat {\cal S}_\eta \subset
 \end{corollary}
 
 \begin{proof}
-By choosing $\delta = 0$, if $n>\frac{36}{p_{\min}\gamma^2 \epsilon^2} s \log m$, then $\|p-p^*\|_2 < \epsilon < \eta$ with probability $1-\frac{1}{m}$. If $p^*_j = 0$ and $\hat p > \eta$, then $\|p - p^*\|_2 \geq |\hat p_j-p^*_j| > \eta$, which is a contradiction. Therefore we get no false positives. If $p^*_j = \eta + \epsilon$, then $|\hat p_j - (\eta+\epsilon)| < \epsilon/2 \implies p_j > \eta + \epsilon/2$. Therefore, we get all strong parents.
+Suppose $p$ is exactly s-sparse. By choosing $\delta = 0$, if $n>\frac{36}{p_{\min}\gamma^2 \epsilon^2} s \log m$, then $\|p-p^*\|_2 < \epsilon < \eta$ with probability $1-\frac{1}{m}$. If $p^*_j = 0$ and $\hat p > \eta$, then $\|p - p^*\|_2 \geq |\hat p_j-p^*_j| > \eta$, which is a contradiction. Therefore we get no false positives. If $p^*_j = \eta + \epsilon$, then $|\hat p_j - (\eta+\epsilon)| < \epsilon/2 \implies p_j > \eta + \epsilon/2$. Therefore, we get all strong parents. The analysis in the non-sparse case is identical.
 \end{proof}
 
 Note that $n$ is the number of measurements and not the number of cascades. This is an improvement over prior work since we expect several measurements per cascade.
author	jeanpouget-abadie <jean.pougetabadie@gmail.com>	2015-01-30 00:00:15 -0500
committer	jeanpouget-abadie <jean.pougetabadie@gmail.com>	2015-01-30 00:00:15 -0500
commit	6491c0bccc42818cd9d71c1c6fd4bd9af953db2e (patch)
tree	631dd29fb0f3c2d5ba315373f663c1ce897fa1e2
parent	75b01a49b0310b753198007455a79f39822bd3af (diff)
download	cascades-6491c0bccc42818cd9d71c1c6fd4bd9af953db2e.tar.gz