\documentclass[11pt]{article} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \usepackage[hmargin=1.2in, vmargin=1.1in]{geometry} \title{Review of \emph{When Less is More: Systematic Analysis of Cascade-based Community Detection}} \date{} \begin{document} \maketitle \vspace{-3em} \paragraph{Summary.} This paper considers the problem of recovering unknown communities in a given graph from observations of ``cascades''---that is, samples from a diffusion process propagating along the edges of the graph. Specifically, it is assumed that only the identities and ``infection'' times of vertices are observed, over multiple cascades, and the goal is to infer the community memberships of the vertices. Four algorithms are described, one is based on a likelihood approach, and the remaining three are based on a clustering approach. The algorithms are then evaluated and compared on a Twitter dataset (including both real community assignments and real cascades), and on a collection of real networks on which synthetic cascades were generated. \paragraph{General comments.} The problem considered in this paper is natural and an interesting twist on two standard problems: the network inference problem (recovering edges from observations of cascades), and the community detection problem (recovering community assignments from edges). By going directly from cascades to community assignments, as is considered in this paper, one could hope to have better recovery guarantees, than trying to infer the edges of the graph first and then applying a standard community detection algorithm. The main contribution of the paper seems to be the experimental evaluation, since all discussed algorithms seem to be either already present in a previous paper by the same authors, or simple adaptations of known algorithms. The main takeaway is that among the considered algorithms, the \textsc{Clique} algorithm seems to perform consistently well (either being the best of among the bests). \paragraph{Major points.} Although the paper claims to be agnostic to specific probabilistic models for the graph structure and cascades, I believe it is important to understand how the proposed methodology would perform if specific models were assumed. In particular, I have the following concerns with the models described in Section 3.2 and how the rest of the paper relates to them: \begin{enumerate} \item With the exception of Section 3.2.3, no assumption is made on how the community assignments affect the observations since only the cascade model is described (there is no discussion of the graph generative model). For all we know, the observations could be independent from the community assignments making the learning problem considered in this paper ill-defined. Even assuming that the graph structure is generated from a model making use of the community assignments (for example the Stochastic Block Model), it is very important to emphasize that Sections 3.2.2 and 3.2.3 consider very different learning regimes: \begin{itemize} \item In Section 3.2.3, each cascade is generated directly from the community assignments. In this situation, there could be hope to design an asymptotically consistent estimator of the community assignments when the number of cascades goes to infinity. \item In Section 3.2.3, the graph is only generated once (possibly from the community assignments, although this is not specified) and then the cascades are generated solely based on the graph structure. So as the number of cascades goes to infinity, the best one could hope for is to have an asymptotically consistent estimator of the graph edges. However, \emph{this might still not be sufficient to recover the communities} (recall that in the standard community detection problem, consistency in community inference is only achieved as the size of the graph grows to infinity). This would make the learning task considered in the paper unsolvable, even in the limit of infinitely many cascades and requires a discussion. \end{itemize} Because of this crucial difference, I also believe that the claim made in Section 4.2 that the C-SI-BD model is equivalent to the SI-BD is wrong (it would be true only if the graph was generated anew for each cascade). \item Section 4.2: although the approach is based on maximum likelihood estimation, I find the exposition in this section lacking in several respects. Algorithm 1 seems to be inspired by the idea of alternated maximization but stops after a single alternation. At the very least, experimental evidence to justify this early stopping is necessary (are the gains insignificant with additional iterations?). More generally, I believe that a likelihood-based approach could be made to fit very naturally in the Expectation-maximization framework (with the graph structure being the latent variable), and was surprised to see no mention of it in Section 4.2 (was there a specific reason why it was not considered?). \item Section 4.3: although the approaches described in this section claim to be agnostic to specific generative models, they clearly seem motivated by probabilistic reasoning. In particular, Algorithm 2 is evidently solving a maximum likelihood problem, and equation (6) is implicitly justified as computing the posterior probability of the edge being present conditioned on the observed cascades. I believe a more principled description of these approaches, and of the generative models under which they can be formally justified is required. \item Related to item 1., it is unclear in the experimental section whether the ground truth communities in the studied datasets are in fact correlated with the observations, and if they are, to which extent (see also my comment below about the paper is lacking an explanation of what constitutes a community in the Twitter dataset). I think this concern could be addressed both by some exploratory analysis showing that the ground truth communities indeed correlate with edge density and by a purely synthetic experiment experiment in which the graph is generated from the community assignments (for example using the SBM). \item It is claimed that the reason why some method behave poorly is because they are based on specific probabilistic modeling assumptions. Could experiments be conducted to show these methods indeed perform better when the assumptions are satisfied? \end{enumerate} \paragraph{Specific points.} Some additional concerns, which are more localized manifestations of the aforedescribed concerns. \begin{enumerate} \item Section 4.2, Step (2): it is not clear what is meant by ``it is sufficient to numerically find only the optimal $\delta$''. For which value of $\alpha_{in}$ and $\alpha_{out}$ is $\delta$ solved for numerically? If the solver is directly optimizing over $\alpha_{in}$ and $\delta$, then equation (4) is unnecessary, $\alpha_{out}$ is simply equal to $\alpha_{in}/(\delta+1)$. \item Section 4.2, Step (3): since the authors adapted the Louvain algorithm, it would be useful to have a pseudocode of the exact adaptation they used, since the given description is too informal to know exactly what was done. \item Section 4.2, Step (3): the Louvain algorithm is only a heuristic, so the proposed method does not solve exactly the optimization problem described in Algorithm 1, Step (3). It would be useful to have experimental results showing how far from optimal the algorithm is, and in particular, how the adaptation chosen by the authors defers from the original algorithm. \item Section 4.3: all the algorithms in this section use the Louvain clustering algorithm. It is not clear whether the original algorithm or the (loosely specified) adaptation from Section 4.2 was used. A justification for this choice (whether or not to use the adaptation) would also be helpful. \item Section 5.1: the description of the Twitter dataset does not explain what the communities are in Twitter terms (as a Twitter user, I was not aware that there are community-like entities on Twitter). \item Section 5.3: I find the use of the term ``unbiased'' problematic since the paper has no notion of ground truth distribution, or generative model (see also previous comments). In particular, the authors claim the NMI criterion favors smaller communities, but maybe this is not a problem if the generative model also produces smaller communities? \item Section 5.3: Is there any reason why the ``agreement'' metric commonly used in community detection was not considered? (see for example Definition 3 in \cite{A18}). \end{enumerate} \paragraph{Conclusion.} Although the problem considered in this paper is interesting and worthy of attention, I believe that both the methodology and exposition suffer from important gaps. Justifying the proposed methods under appropriate probabilistic assumptions and then evaluating them in situations where the assumptions hold or do not hold would provide useful insight on the applicability of the methods. In its current form, it is very difficult to understand why the methods perform the way they do in the experiments and to attach a level of confidence to the main takeaway message. \bibliographystyle{alpha} \bibliography{jmlr-20-208} \end{document}