paper/sections/results.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98

In this section, we exploit standard techniques in sparse recovery and leverage the simple nature of generalized linear cascades to address the standard problem of edge detection. We extend prior work by showing that edge weights of the graph can be recovered under similar assumptions. We extend these results by considering the non-exactly sparse case.

As mentioned previously, our objective is twofold:

\begin{enumerate}
\item We seek to recover the edges of the graph. In other words, for each node $j$, we wish to identify the set of indices $\{i: \Theta_{ij} \neq 0\}$ . 
\item We seek to recover the edge weights $\Theta$ of ${\cal G}$ i.e. upper-bound the error $\|\hat \Theta - \Theta \|_2$.
\end{enumerate}
It is easy to see that solving the second objective allows us to solve the first. If $\|\hat \Theta - \Theta \|_2$ is sufficiently small, we can recover all large coordinates of $\Theta$ by keeping only the coordinates above a certain threshold.

\subsection{Main Theorem}
\label{sec:main_theorem}

For the remainder of this section, we consider a single node $i$. For ease of presentation, the index $j$ is made implicit: $\theta_i \leftarrow \Theta_{ij}$ and $\theta \leftarrow (\Theta_{i\cdot})$. We begin our analysis with the following lemma:
\begin{lemma}
\label{lem:icc_lambda_upper_bound}
For any $\delta > 0$, with probability $1-e^{-n^\delta \log m}$, $\|\nabla {\cal L}(\theta^*)\|_{\infty} \leq 2 \sqrt{\frac{\log m}{n^{1 - \delta} p_{\min}}}$
\end{lemma}

Consider the well-known \emph{restricted eigenvalue condition} for a symetric matrix $\Sigma$ and set ${\cal C} := \{X \in \mathbb{R}^p : \|X_{S^c}\|_1 \leq 3 \|X_S\|_1 \} \cap \{ \|X\|_1 \leq 1 \}$

\begin{equation}
\nonumber
\forall X \in {\cal C}, \| \Sigma X \|_2^2 \geq \gamma_n \|X\|_2^2 
\tag{RE}
\end{equation}

The following theorem is a consequence of Theorem 1 in \cite{Negahban:2009} and Lemma~\ref{lem:icc_lambda_upper_bound}.

\begin{theorem}
\label{thm:neghaban}
Suppose the true vector $\theta^*$ has support S of size s and the {\bf(RE)} assumption holds for the Hessian $\nabla^2 {\cal L}(\theta)$ with $\gamma > 0$, then for $\delta > 0$ by solving \eqref{eq:pre-mle} with $\lambda \defeq 2\sqrt{\frac{\log m}{n^{1 - \delta} p_{\min}}}$, we have with probability $1-\exp(-n^\delta \log m)$:
\begin{equation}
\|\hat \theta - \theta \|_2 \leq \frac{3}{\gamma} \sqrt{\frac{s \log m}{p_{\min} n^{1-\delta}}}
\end{equation}
\end{theorem}

The proof is included in the Appendix. Note that as $n$ goes to infinity, the hessian $\nabla^2 {\cal L}(\theta)$ goes to the \emph{Fisher Information Matrix} of the cascade model as a function of $\theta$. The various assumptions of theorem~\ref{thm:neghaban} are discussed in section~\ref{sec:assumptions}. The following corollary follows easily and gives the first ${\cal O}(s \log m)$ algorithm for graph reconstruction on general graphs.

\begin{corollary}
\label{cor:variable_selection}
Assume that ${\bf (RE)}$ holds with $\gamma > 0$ and that $\theta$ is s-sparse. Consider the set $\hat {\cal S}_\eta \defeq \{ i \in [1..p] : \hat \theta_i > \eta\}$ for $\eta > 0$. For $\epsilon>0$ and $\epsilon < \eta$, let ${\cal S}^*_{\eta + \epsilon} \defeq \{ i \in [1..p] :\theta_i > \eta +\epsilon \}$ be the set of all true `strong' parents. Suppose the number of measurements verifies:
\begin{equation}
n > \frac{36}{p_{\min}\gamma^2 \epsilon^2} s \log m
\end{equation}
Then with probability $1-\frac{1}{m}$, ${\cal S}^*_{\eta + \epsilon} \subset \hat {\cal S}_\eta \subset {\cal S}^*$. In other words we recover all `strong' parents and no `false' parents. 
\end{corollary}

\begin{proof}
Suppose $\theta$ is exactly s-sparse. By choosing $\delta = 0$, if $n>\frac{36}{p_{\min}\gamma^2 \epsilon^2} s \log m$, then $\|\hat \theta-\theta\|_2 < \epsilon < \eta$ with probability $1-\frac{1}{m}$. If $\theta_i = 0$ and $\hat \theta > \eta$, then $\|\hat \theta - \theta\|_2 \geq |\hat \theta_i-\theta_j| > \eta$, which is a contradiction. Therefore we get no false positives. If $\theta_i = \eta + \epsilon$, then $|\hat \theta_i - (\eta+\epsilon)| < \epsilon/2 \implies \theta_j > \eta + \epsilon/2$. Therefore, we get all strong parents.
\end{proof}

Note that if we want \emph{perfect recovery} of all edges, we must estimate $p_{\min}$ within $\epsilon'$ and set $\eta \defeq p_{\min} - \epsilon'$

\subsection{Relaxing the Sparsity Constraint}
\label{sec:relaxing_sparsity}

In practice, exact sparsity is rarely verified. For social networks in particular, it is more realistic to assume that each node has few strong `parents' and many `weak' parents. Rather than obtaining an impossibility result in this situation, we show that we pay a small price for relaxing the sparsity constraint. If we let $\theta^*_{\lfloor s \rfloor}$ be the best s-sparse approximation to $\theta^*$ defined as 
$$\theta^*_{\lfloor s \rfloor} \defeq \min_{\|\theta\|_0 \leq s} \|\theta - \theta^*\|_1$$
then we pay a price proportional to $\|\theta^*_s\|_1$ for recovering the weights of non-exactly sparse vectors, i.e. the sum of the coefficients of $\theta$ save for the $s$ largest. In other words, the closer $\theta^*$ is to being sparse, the smaller the price. These results are formalized in the following theorem, which is also a consequence of Theorem 1 in \cite{Negahban:2009} and lemma~\ref{lem:icc_lambda_upper_bound}.

\begin{theorem}
\label{thm:approx_sparse}
Let $\theta^*_{\lfloor s \rfloor}$ be the best s-sparse approximation to the true vector $\theta^*$. Suppose the {\bf(RE)} assumption holds for the Hessian $\nabla^2 f(\theta^*)$ and for the following set 
\begin{align}
\nonumber
{\cal C}' \defeq & \{X \in \mathbb{R}^p : \|X_{S^c}\|_1 \leq 3 \|X_S\|_1 + 4 \|\theta^*_{\lfloor s \rfloor}\|_1 \} \\ \nonumber
& \cap \{ \|X\|_1 \leq 1 \}
\end{align}
By solving \eqref{eq:pre-mle} for $\lambda \defeq 2\sqrt{\frac{\log m}{n^{1 - \delta} p_{\min}}}$ we have:
\begin{align}
\|\hat p - p^* \|_2 \leq 3 \frac{\sqrt{s}\lambda_n}{\gamma_n} + 2 \sqrt{\frac{\lambda_n}{\gamma_n}} \|p^*_{\lfloor s \rfloor}\|_1
\end{align}
\end{theorem}

As before, edge recovery is a consequence of upper-bounding $\|\theta - \hat \theta\|_2$

\begin{corollary}
If $\theta$ is not exactly s-sparse and the number of measurements verifies:
\begin{equation}
n > \frac{36 \| p^*_{\lfloor s\rfloor}\|_1}{p_{\min}\gamma^2 \epsilon^4} s \log m
\end{equation}
then similarly: ${\cal S}^*_{\eta + \epsilon} \subset \hat {\cal S}_\eta \subset {\cal S}^*$ w.h.p.
\end{corollary}


\subsection{The Independent Cascade Model}

Recall that to cast the Independent Cascade model as a Generalized Linear Cascade, we performed the following change of variables: $\forall i,j \ \Theta_{i,j} = \log(1 - p_{i,j})$. The previous results hold \emph{a priori} for the ``effective'' parameter $\Theta$. The following lemma shows they also hold for the original infection probabilities $p_{i,j}$:
\begin{lemma}
\label{lem:theta_p_upperbound}
$\|\theta - \theta' \|_2 \geq \|p - p^*\|_2$ and $\|\theta_{\lfloor s \rfloor}\|_1 \geq \| p_{\lfloor s \rfloor} \|_1$
\end{lemma}
\begin{proof}
Using the inequality $\forall x>0, \; \log x \geq 1 - \frac{1}{x}$, we have $|\log (1 - p) - \log (1-p')| \geq \max(1 - \frac{1-p}{1-p'}, 1 - \frac{1-p'}{1-p}) \geq \max( p-p', p'-p)$. The second result follows easily from the monotonicity of $\log$ and $\forall x > 0, | \log (1 -x) | \geq x$
\end{proof}

The results of sections~\ref{sec:main_theorem} and \ref{sec:relaxing_sparsity} therefore hold for the original transition probabilities $p$.