diff options
| -rw-r--r-- | jmlr-20-1419.tex | 109 |
1 files changed, 109 insertions, 0 deletions
diff --git a/jmlr-20-1419.tex b/jmlr-20-1419.tex new file mode 100644 index 0000000..a2acb68 --- /dev/null +++ b/jmlr-20-1419.tex @@ -0,0 +1,109 @@ +\documentclass[11pt]{article} +\usepackage[T1]{fontenc} +\usepackage[utf8]{inputenc} +\usepackage[hmargin=1.2in, vmargin=1.4in]{geometry} + +\title{Review of \emph{Contextual Stochastic Block Model:\\Sharp Thresholds and Contiguity}} +\date{} +\begin{document} + +\maketitle + +\paragraph{Summary.} This paper considers a generalization of the stochastic +block model in which additional covariates correlated with the community +assignments are observed at each vertex. Specifically, the generative model for +a graph with $n$ vertices is as follows: +\begin{itemize} + \item each vertex $i$ is assigned independently and uniformly at random to + one of two community, $\sigma_i\in\{-1, +1\}$. + \item two vertices are connected by an edge with probability $a/n$ if they + belong to the same community, and with probability $b/n$ otherwise. + Define + $\lambda^2=1/2\cdot(a-b)^2/(a+b)$. + \item a $p$-dimensional vector of covariates $B_i$ is observed at + each vertex, where $B_i = \sqrt{\mu/n}\sigma_i u + Z_i$ with $u\sim + \mathcal{N}(0,I_p)$ and $Z_i\sim\mathcal{N}(0,I_p)$ and + $(Z_1,\dots,Z_n)$ mutually independent. +\end{itemize} +The general question studied in this work is whether recovery of the latent +community assignments is possible from the joint observation of the adjacency +matrix of the graph and the covariates $(B_1,\dots,B_n)$, in the asymptotic +regime where $n/p\to\gamma >0$. Note that this problem reduces to the standard +community detection problem when only the adjacency matrix is observed (or +$\mu=0)$, and reduces to a clustering of a Gaussian mixtures when only the +covariates $B_i$ are observed (or $\lambda=0$). The main result of this work is +an exact characterization of the weak recovery threshold: when $\lambda^2 ++ \mu^2/\gamma <1$ then weak recovery is impossible, when +$\lambda^2+\mu^2/\gamma>1$, the authors describe an estimator, computable in +quasi-polynomial time, that achieves weak recovery. + +\paragraph{Significance of the results.} The question studied in this work is +very natural, since a practitioner would typically observe some covariates +about the graph's vertices in most community detection applications. +Furthermore, the paper completely solves the question it sought to answer by +describing a sharp threshold (at least as far as weak recovery is concerned). +It is conceptually satisfying that the threshold established in this work +generalizes both the recovery threshold for community detection in the +stochastic block model, and the “spike detection” threshold in random matrix +theory. It is also interesting to see that recovery is sometimes possible by +combining information from \emph{both} types of observations (the covariates +and the graph) even in cases where each type of information \emph{separately} +would be insufficient for recovery. + +\vspace{1em}\noindent\emph{Model.} However, the specific generative model +considered in this work seems a little restrictive. While the stochastic block +model and the spiked random matrix model are both well-established and +well-studied \emph{separately}, it is not clear that simply “gluing” them +together constitutes a natural instantiation of the intuitive concept of +\emph{stochastic block model with auxiliary information}. The model studied in +this work has only been considered in two previous papers (Yan and Sarkar, +2020; Deshpande et al., 2018) and does not seem to be justified from the +modeling perspective, but rather by the fact that it is amenable to analysis by +combining existing techniques. This could be improved by providing at the very +least a comparison with other variants of stochastic block model with auxiliary +information, such as for example the one discussed in \emph{Structure and +inference in annotated networks} (Newman and Clauset, 2016). On this note, +a more natural modeling approach is to first draw latent covariates for the +graph vertices, and \emph{then} to describe the conditional distribution of +community assignments given this latent covariates. In contrast, the present +work only describes the conditional distribution of the covariates given the +community assignments, which does not feel natural in the context of network +modeling. Finally, I think the paper would also benefit from a discussion of +whether the analysis could generalize to more general models for the covariates +$B_i$ (closer to the general spiked matrix models that are now commonly +studied). + +\vspace{1em}\noindent\emph{Technical contribution.} All the theoretical results +in this work closely follow the proof structures and techniques from recent work +on recovery thresholds for the stochastic block model and spiked random matrix +models. The main technical contribution is the introduction of the “cycle +statistics” $Y_{n,k,l}$ similar to the ones studied for example by Mossel et +al. (2015) in the context of community detection, and by Banerjee and Ma (2018) +for spiked random matrices. The recovery algorithm closely follows the approach +from Hopkins and Steurer (2017) by constructing a low-degree polynomial +approximation of the second moment $\sigma\sigma^T$ of the community +assignments. + +\paragraph{Other comments.} The paper is well-written overall, but I think the +proofs could be made clearer if written in a more modular manner. As a simple +example, the first part of proof of Theorem 1 in Section 2 is a very general +argument that could be factored out (and it is in fact exactly identical to +Lemma 2.4 in (Perry et al. 2018) that could be invoked directly). Another +example is the proof of Proposition 5 that is particularly long: however the +asymptotic distribution under $H_0$ is simply a specific case of the one under +$H_1$ when $\lambda=\mu=0$, so it is not clear that it is necessary to conduct +both derivations separately. I understand that some parts of the derivation +under $H_0$ are reused under $H_1$, but these could be factored out. + +\paragraph{Conclusion.} This paper considers a natural and important question, +and despite the limitations of the model and the relative absence of technical +novelty, the theoretical results constitute an important step towards +understanding the impact of auxiliary vertex information on the community +detection threshold. I recommend acceptance, possibly after consolidating the +discussion of the model and modularizing the proofs as suggested above. + + +%\bibliographystyle{alpha} +%\bibliography{jmlr-20-208} + +\end{document} |
