jmlr-20-1419.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109

\documentclass[11pt]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[hmargin=1.2in, vmargin=1.4in]{geometry}

\title{Review of \emph{Contextual Stochastic Block Model:\\Sharp Thresholds and Contiguity}}
\date{}
\begin{document}

\maketitle

\paragraph{Summary.} This paper considers a generalization of the stochastic
block model in which additional covariates correlated with the community
assignments are observed at each vertex. Specifically, the generative model for
a graph with $n$ vertices is as follows:
\begin{itemize}
	\item each vertex $i$ is assigned independently and uniformly at random to
		one of two community, $\sigma_i\in\{-1, +1\}$.
	\item two vertices are connected by an edge with probability $a/n$ if they
		belong to the same community, and with probability $b/n$ otherwise.
		Define
		$\lambda^2=1/2\cdot(a-b)^2/(a+b)$.
	\item a $p$-dimensional vector of covariates $B_i$ is observed at
		each vertex, where $B_i = \sqrt{\mu/n}\sigma_i u + Z_i$ with $u\sim
		\mathcal{N}(0,I_p)$ and $Z_i\sim\mathcal{N}(0,I_p)$ and 
		$(Z_1,\dots,Z_n)$ mutually independent.
\end{itemize}
The general question studied in this work is whether recovery of the latent
community assignments is possible from the joint observation of the adjacency
matrix of the graph and the covariates $(B_1,\dots,B_n)$, in the asymptotic
regime where $n/p\to\gamma >0$. Note that this problem reduces to the standard
community detection problem when only the adjacency matrix is observed (or
$\mu=0)$, and reduces to a clustering of a Gaussian mixtures when only the
covariates $B_i$ are observed (or $\lambda=0$). The main result of this work is
an exact characterization of the weak recovery threshold: when $\lambda^2
+ \mu^2/\gamma <1$ then weak recovery is impossible, when
$\lambda^2+\mu^2/\gamma>1$, the authors describe an estimator, computable in
quasi-polynomial time, that achieves weak recovery.

\paragraph{Significance of the results.} The question studied in this work is
very natural, since a practitioner would typically observe some covariates
about the graph's vertices in most community detection applications.
Furthermore, the paper completely solves the question it sought to answer by
describing a sharp threshold (at least as far as weak recovery is concerned).
It is conceptually satisfying that the threshold established in this work
generalizes both the recovery threshold for community detection in the
stochastic block model, and the “spike detection” threshold in random matrix
theory. It is also interesting to see that recovery is sometimes possible by
combining information from \emph{both} types of observations (the covariates
and the graph) even in cases where each type of information \emph{separately}
would be insufficient for recovery.

\vspace{1em}\noindent\emph{Model.} However, the specific generative model
considered in this work seems a little restrictive. While the stochastic block
model and the spiked random matrix model are both well-established and
well-studied \emph{separately}, it is not clear that simply “gluing” them
together constitutes a natural instantiation of the intuitive concept of
\emph{stochastic block model with auxiliary information}. The model studied in
this work has only been considered in two previous papers (Yan and Sarkar,
2020; Deshpande et al., 2018) and does not seem to be justified from the
modeling perspective, but rather by the fact that it is amenable to analysis by
combining existing techniques. This could be improved by providing at the very
least a comparison with other variants of stochastic block model with auxiliary
information, such as for example the one discussed in \emph{Structure and
inference in annotated networks} (Newman and Clauset, 2016). On this note,
a more natural modeling approach is to first draw latent covariates for the
graph vertices, and \emph{then} to describe the conditional distribution of
community assignments given this latent covariates. In contrast, the present
work only describes the conditional distribution of the covariates given the
community assignments, which does not feel natural in the context of network
modeling. Finally, I think the paper would also benefit from a discussion of
whether the analysis could generalize to more general models for the covariates
$B_i$ (closer to the general spiked matrix models that are now commonly
studied).

\vspace{1em}\noindent\emph{Technical contribution.} All the theoretical results
in this work closely follow the proof structures and techniques from recent work
on recovery thresholds for the stochastic block model and spiked random matrix
models. The main technical contribution is the introduction of the “cycle
statistics” $Y_{n,k,l}$ similar to the ones studied for example by Mossel et
al. (2015) in the context of community detection, and by Banerjee and Ma (2018)
for spiked random matrices. The recovery algorithm closely follows the approach
from Hopkins and Steurer (2017) by constructing a low-degree polynomial
approximation of the second moment $\sigma\sigma^T$ of the community
assignments.

\paragraph{Other comments.} The paper is well-written overall, but I think the
proofs could be made clearer if written in a more modular manner. As a simple
example, the first part of proof of Theorem 1 in Section 2 is a very general
argument that could be factored out (and it is in fact exactly identical to
Lemma 2.4 in (Perry et al. 2018) that could be invoked directly). Another
example is the proof of Proposition 5 that is particularly long: however the
asymptotic distribution under $H_0$ is simply a specific case of the one under
$H_1$ when $\lambda=\mu=0$, so it is not clear that it is necessary to conduct
both derivations separately. I understand that some parts of the derivation
under $H_0$ are reused under $H_1$, but these could be factored out.

\paragraph{Conclusion.} This paper considers a natural and important question,
and despite the limitations of the model and the relative absence of technical
novelty, the theoretical results constitute an important step towards
understanding the impact of auxiliary vertex information on the community
detection threshold. I recommend acceptance, possibly after consolidating the
discussion of the model and modularizing the proofs as suggested above.


%\bibliographystyle{alpha}
%\bibliography{jmlr-20-208}

\end{document}