JMLR 20-1419

author: Thibaut Horel <thibaut.horel@gmail.com> 2022-06-28 14:10:32 -0400
committer: Thibaut Horel <thibaut.horel@gmail.com> 2022-06-28 14:10:32 -0400
commit: bf9b76abdfee67f10e4c9edc336b18826e797219 (patch)
tree: 3a12dba652bea8450779cc27fa67a3f54ec1791b
parent: cfa796e938b546102028a7fbae77ef1244fd68d3 (diff)
download: reviews-bf9b76abdfee67f10e4c9edc336b18826e797219.tar.gz
1 files changed, 109 insertions, 0 deletions
diff --git a/jmlr-20-1419.tex b/jmlr-20-1419.tex
new file mode 100644
index 0000000..a2acb68
--- /dev/null
+++ b/jmlr-20-1419.tex
@@ -0,0 +1,109 @@
+\documentclass[11pt]{article}
+\usepackage[T1]{fontenc}
+\usepackage[utf8]{inputenc}
+\usepackage[hmargin=1.2in, vmargin=1.4in]{geometry}
+
+\title{Review of \emph{Contextual Stochastic Block Model:\\Sharp Thresholds and Contiguity}}
+\date{}
+\begin{document}
+
+\maketitle
+
+\paragraph{Summary.} This paper considers a generalization of the stochastic
+block model in which additional covariates correlated with the community
+assignments are observed at each vertex. Specifically, the generative model for
+a graph with $n$ vertices is as follows:
+\begin{itemize}
+	\item each vertex $i$ is assigned independently and uniformly at random to
+		one of two community, $\sigma_i\in\{-1, +1\}$.
+	\item two vertices are connected by an edge with probability $a/n$ if they
+		belong to the same community, and with probability $b/n$ otherwise.
+		Define
+		$\lambda^2=1/2\cdot(a-b)^2/(a+b)$.
+	\item a $p$-dimensional vector of covariates $B_i$ is observed at
+		each vertex, where $B_i = \sqrt{\mu/n}\sigma_i u + Z_i$ with $u\sim
+		\mathcal{N}(0,I_p)$ and $Z_i\sim\mathcal{N}(0,I_p)$ and 
+		$(Z_1,\dots,Z_n)$ mutually independent.
+\end{itemize}
+The general question studied in this work is whether recovery of the latent
+community assignments is possible from the joint observation of the adjacency
+matrix of the graph and the covariates $(B_1,\dots,B_n)$, in the asymptotic
+regime where $n/p\to\gamma >0$. Note that this problem reduces to the standard
+community detection problem when only the adjacency matrix is observed (or
+$\mu=0)$, and reduces to a clustering of a Gaussian mixtures when only the
+covariates $B_i$ are observed (or $\lambda=0$). The main result of this work is
+an exact characterization of the weak recovery threshold: when $\lambda^2
++ \mu^2/\gamma <1$ then weak recovery is impossible, when
+$\lambda^2+\mu^2/\gamma>1$, the authors describe an estimator, computable in
+quasi-polynomial time, that achieves weak recovery.
+
+\paragraph{Significance of the results.} The question studied in this work is
+very natural, since a practitioner would typically observe some covariates
+about the graph's vertices in most community detection applications.
+Furthermore, the paper completely solves the question it sought to answer by
+describing a sharp threshold (at least as far as weak recovery is concerned).
+It is conceptually satisfying that the threshold established in this work
+generalizes both the recovery threshold for community detection in the
+stochastic block model, and the “spike detection” threshold in random matrix
+theory. It is also interesting to see that recovery is sometimes possible by
+combining information from \emph{both} types of observations (the covariates
+and the graph) even in cases where each type of information \emph{separately}
+would be insufficient for recovery.
+
+\vspace{1em}\noindent\emph{Model.} However, the specific generative model
+considered in this work seems a little restrictive. While the stochastic block
+model and the spiked random matrix model are both well-established and
+well-studied \emph{separately}, it is not clear that simply “gluing” them
+together constitutes a natural instantiation of the intuitive concept of
+\emph{stochastic block model with auxiliary information}. The model studied in
+this work has only been considered in two previous papers (Yan and Sarkar,
+2020; Deshpande et al., 2018) and does not seem to be justified from the
+modeling perspective, but rather by the fact that it is amenable to analysis by
+combining existing techniques. This could be improved by providing at the very
+least a comparison with other variants of stochastic block model with auxiliary
+information, such as for example the one discussed in \emph{Structure and
+inference in annotated networks} (Newman and Clauset, 2016). On this note,
+a more natural modeling approach is to first draw latent covariates for the
+graph vertices, and \emph{then} to describe the conditional distribution of
+community assignments given this latent covariates. In contrast, the present
+work only describes the conditional distribution of the covariates given the
+community assignments, which does not feel natural in the context of network
+modeling. Finally, I think the paper would also benefit from a discussion of
+whether the analysis could generalize to more general models for the covariates
+$B_i$ (closer to the general spiked matrix models that are now commonly
+studied).
+
+\vspace{1em}\noindent\emph{Technical contribution.} All the theoretical results
+in this work closely follow the proof structures and techniques from recent work
+on recovery thresholds for the stochastic block model and spiked random matrix
+models. The main technical contribution is the introduction of the “cycle
+statistics” $Y_{n,k,l}$ similar to the ones studied for example by Mossel et
+al. (2015) in the context of community detection, and by Banerjee and Ma (2018)
+for spiked random matrices. The recovery algorithm closely follows the approach
+from Hopkins and Steurer (2017) by constructing a low-degree polynomial
+approximation of the second moment $\sigma\sigma^T$ of the community
+assignments.
+
+\paragraph{Other comments.} The paper is well-written overall, but I think the
+proofs could be made clearer if written in a more modular manner. As a simple
+example, the first part of proof of Theorem 1 in Section 2 is a very general
+argument that could be factored out (and it is in fact exactly identical to
+Lemma 2.4 in (Perry et al. 2018) that could be invoked directly). Another
+example is the proof of Proposition 5 that is particularly long: however the
+asymptotic distribution under $H_0$ is simply a specific case of the one under
+$H_1$ when $\lambda=\mu=0$, so it is not clear that it is necessary to conduct
+both derivations separately. I understand that some parts of the derivation
+under $H_0$ are reused under $H_1$, but these could be factored out.
+
+\paragraph{Conclusion.} This paper considers a natural and important question,
+and despite the limitations of the model and the relative absence of technical
+novelty, the theoretical results constitute an important step towards
+understanding the impact of auxiliary vertex information on the community
+detection threshold. I recommend acceptance, possibly after consolidating the
+discussion of the model and modularizing the proofs as suggested above.
+
+
+%\bibliographystyle{alpha}
+%\bibliography{jmlr-20-208}
+
+\end{document}
author	Thibaut Horel <thibaut.horel@gmail.com>	2022-06-28 14:10:32 -0400
committer	Thibaut Horel <thibaut.horel@gmail.com>	2022-06-28 14:10:32 -0400
commit	bf9b76abdfee67f10e4c9edc336b18826e797219 (patch)
tree	3a12dba652bea8450779cc27fa67a3f54ec1791b
parent	cfa796e938b546102028a7fbae77ef1244fd68d3 (diff)
download	reviews-bf9b76abdfee67f10e4c9edc336b18826e797219.tar.gz