diff options
Diffstat (limited to 'jmlr-20-208.tex')
| -rw-r--r-- | jmlr-20-208.tex | 182 |
1 files changed, 182 insertions, 0 deletions
diff --git a/jmlr-20-208.tex b/jmlr-20-208.tex new file mode 100644 index 0000000..b7d09d9 --- /dev/null +++ b/jmlr-20-208.tex @@ -0,0 +1,182 @@ +\documentclass[11pt]{article} +\usepackage[T1]{fontenc} +\usepackage[utf8]{inputenc} +\usepackage[hmargin=1.2in, vmargin=1.1in]{geometry} + +\title{Review of \emph{When Less is More: Systematic Analysis of Cascade-based Community Detection}} +\date{} +\begin{document} + +\maketitle + +\vspace{-3em} + +\paragraph{Summary.} This paper considers the problem of recovering unknown +communities in a given graph from observations of ``cascades''---that is, +samples from a diffusion process propagating along the edges of the graph. +Specifically, it is assumed that only the identities and ``infection'' times of +vertices are observed, over multiple cascades, and the goal is to infer the +community memberships of the vertices. Four algorithms are described, one is +based on a likelihood approach, and the remaining three are based on +a clustering approach. The algorithms are then evaluated and compared on +a Twitter dataset (including both real community assignments and real +cascades), and on a collection of real networks on which synthetic cascades +were generated. + +\paragraph{General comments.} The problem considered in this paper is natural and an +interesting twist on two standard problems: the network inference problem +(recovering edges from observations of cascades), and the community detection +problem (recovering community assignments from edges). By going directly from +cascades to community assignments, as is considered in this paper, one could +hope to have better recovery guarantees, than trying to infer the edges of the +graph first and then applying a standard community detection algorithm. + +The main contribution of the paper seems to be the experimental evaluation, +since all discussed algorithms seem to be either already present in +a previous paper by the same authors, or simple adaptations of known +algorithms. The main takeaway is that among the considered algorithms, the +\textsc{Clique} algorithm seems to perform consistently well (either being the +best of among the bests). + +\paragraph{Major points.} Although the paper claims to be agnostic to specific +probabilistic models for the graph structure and cascades, I believe it is +important to understand how the proposed methodology would perform if +specific models were assumed. In particular, I have the following concerns with +the models described in Section 3.2 and how the rest of the paper relates to +them: +\begin{enumerate} + \item With the exception of Section 3.2.3, no assumption is made on how the + community assignments affect the observations since only the cascade + model is described (there is no discussion of the graph generative + model). For all we know, the observations could be independent from the + community assignments making the learning problem considered in this + paper ill-defined. + + Even assuming that the graph structure is generated from a model making + use of the community assignments (for example the Stochastic Block + Model), it is very important to emphasize that Sections 3.2.2 and 3.2.3 + consider very different learning regimes: + \begin{itemize} + \item In Section 3.2.3, each cascade is generated directly from the + community assignments. In this situation, there could be hope + to design an asymptotically consistent estimator of the + community assignments when the number of cascades goes to + infinity. + + \item In Section 3.2.3, the graph is only generated once (possibly + from the community assignments, although this is not specified) + and then the cascades are generated solely based on the graph + structure. So as the number of cascades goes to infinity, the + best one could hope for is to have an asymptotically consistent + estimator of the graph edges. However, \emph{this might still + not be sufficient to recover the communities} (recall that in + the standard community detection problem, consistency in + community inference is only achieved as the size of the graph + grows to infinity). This would make the learning task + considered in the paper unsolvable, even in the limit of + infinitely many cascades and requires a discussion. + \end{itemize} + + Because of this crucial difference, I also believe that the claim made + in Section 4.2 that the C-SI-BD model is equivalent to the SI-BD is + wrong (it would be true only if the graph was generated anew for each + cascade). + + \item Section 4.2: although the approach is based on maximum likelihood + estimation, I find the exposition in this section lacking in several + respects. Algorithm + 1 seems to be inspired by the idea of alternated maximization but stops + after a single alternation. At the very least, experimental evidence + to justify this early stopping is necessary (are the gains + insignificant with additional iterations?). + + More generally, I believe that a likelihood-based approach could be + made to fit very naturally in the Expectation-maximization framework + (with the graph structure being the latent variable), and was + surprised to see no mention of it in Section 4.2 (was there + a specific reason why it was not considered?). + + \item Section 4.3: although the approaches described in this section + claim to be agnostic to specific generative models, they clearly + seem motivated by probabilistic reasoning. In particular, Algorithm + 2 is evidently solving a maximum likelihood problem, and equation + (6) is implicitly justified as computing the posterior + probability of the edge being present conditioned on the observed + cascades. I believe a more principled description of these + approaches, and of the generative models under which they can be + formally justified is required. + + \item Related to item 1., it is unclear in the experimental section whether + the ground truth communities in the studied datasets are in fact + correlated with the observations, and if they are, to which extent (see + also my comment below about the paper is lacking an explanation of what + constitutes a community in the Twitter dataset). I think this concern + could be addressed both by some exploratory analysis showing that the + ground truth communities indeed correlate with edge density and by + a purely synthetic experiment experiment in which the graph is + generated from the community assignments (for example using the SBM). + + \item It is claimed that the reason why some method behave poorly is + because they are based on specific probabilistic modeling assumptions. + Could experiments be conducted to show these methods indeed perform + better when the assumptions are satisfied? +\end{enumerate} + +\paragraph{Specific points.} Some additional concerns, which are more localized +manifestations of the aforedescribed concerns. +\begin{enumerate} + \item Section 4.2, Step (2): it is not clear what is meant by ``it is + sufficient to numerically find only the optimal $\delta$''. For which + value of $\alpha_{in}$ and $\alpha_{out}$ is $\delta$ solved for + numerically? If the solver is directly optimizing over $\alpha_{in}$ + and $\delta$, then equation (4) is unnecessary, $\alpha_{out}$ is + simply equal to $\alpha_{in}/(\delta+1)$. + + \item Section 4.2, Step (3): since the authors adapted the Louvain + algorithm, it would be useful to have a pseudocode of the exact + adaptation they used, since the given description is too informal to + know exactly what was done. + + \item Section 4.2, Step (3): the Louvain algorithm is only a heuristic, so + the proposed method does not solve exactly the optimization problem + described in Algorithm 1, Step (3). It would be useful to have + experimental results showing how far from optimal the algorithm is, and + in particular, how the adaptation chosen by the authors defers from the + original algorithm. + + \item Section 4.3: all the algorithms in this section use the Louvain + clustering algorithm. It is not clear whether the original algorithm or + the (loosely specified) adaptation from Section 4.2 was used. + A justification for this choice (whether or not to use the adaptation) + would also be helpful. + + \item Section 5.1: the description of the Twitter dataset does not explain + what the communities are in Twitter terms (as a Twitter user, I was not + aware that there are community-like entities on Twitter). + + \item Section 5.3: I find the use of the term ``unbiased'' problematic + since the paper has no notion of ground truth distribution, or + generative model (see also previous comments). In particular, the + authors claim the NMI criterion favors smaller communities, but maybe + this is not a problem if the generative model also produces smaller + communities? + + \item Section 5.3: Is there any reason why the ``agreement'' + metric commonly used in community detection was not considered? (see + for example Definition 3 in \cite{A18}). +\end{enumerate} + +\paragraph{Conclusion.} Although the problem considered in this paper is +interesting and worthy of attention, I believe that both the methodology and +exposition suffer from important gaps. Justifying the proposed methods under +appropriate probabilistic assumptions and then evaluating them in situations +where the assumptions hold or do not hold would provide useful insight on the +applicability of the methods. In its current form, it is very difficult to +understand why the methods perform the way they do in the experiments and to +attach a level of confidence to the main takeaway message. + + +\bibliographystyle{alpha} +\bibliography{jmlr-20-208} + +\end{document} |
