1 files changed, 182 insertions, 0 deletions
diff --git a/jmlr-20-208.tex b/jmlr-20-208.tex
new file mode 100644
index 0000000..b7d09d9
--- /dev/null
+++ b/jmlr-20-208.tex
@@ -0,0 +1,182 @@
+\documentclass[11pt]{article}
+\usepackage[T1]{fontenc}
+\usepackage[utf8]{inputenc}
+\usepackage[hmargin=1.2in, vmargin=1.1in]{geometry}
+
+\title{Review of \emph{When Less is More: Systematic Analysis of Cascade-based Community Detection}}
+\date{}
+\begin{document}
+
+\maketitle
+
+\vspace{-3em}
+
+\paragraph{Summary.} This paper considers the problem of recovering unknown
+communities in a given graph  from observations of ``cascades''---that is,
+samples from a diffusion process propagating along the edges of the graph.
+Specifically, it is assumed that only the identities and ``infection'' times of
+vertices are observed, over multiple cascades, and the goal is to infer the
+community memberships of the vertices. Four algorithms are described, one is
+based on a likelihood approach, and the remaining three are based on
+a clustering approach. The algorithms are then evaluated and compared on
+a Twitter dataset (including both real community assignments and real
+cascades), and on a collection of real networks on which synthetic cascades
+were generated.
+
+\paragraph{General comments.} The problem considered in this paper is natural and an
+interesting twist on two standard problems: the network inference problem
+(recovering edges from observations of cascades), and the community detection
+problem (recovering community assignments from edges). By going directly from
+cascades to community assignments, as is considered in this paper, one could
+hope to have better recovery guarantees, than trying to infer the edges of the
+graph first and then applying a standard community detection algorithm.
+
+The main contribution of the paper seems to be the experimental evaluation,
+since all discussed algorithms seem to be either already present in
+a previous paper by the same authors, or simple adaptations of known
+algorithms. The main takeaway is that among the considered algorithms, the
+\textsc{Clique} algorithm seems to perform consistently well (either being the
+best of among the bests).
+
+\paragraph{Major points.} Although the paper claims to be agnostic to specific
+probabilistic models for the graph structure and cascades, I believe it is
+important to understand how the proposed methodology would perform if
+specific models were assumed. In particular, I have the following concerns with
+the models described in Section 3.2 and how the rest of the paper relates to
+them:
+\begin{enumerate}
+	\item With the exception of Section 3.2.3, no assumption is made on how the
+		community assignments affect the observations since only the cascade
+		model is described (there is no discussion of the graph generative
+		model). For all we know, the observations could be independent from the
+		community assignments making the learning problem considered in this
+		paper ill-defined.
+
+		Even assuming that the graph structure is generated from a model making
+		use of the community assignments (for example the Stochastic Block
+		Model), it is very important to emphasize that Sections 3.2.2 and 3.2.3
+		consider very different learning regimes:
+		\begin{itemize}
+			\item In Section 3.2.3, each cascade is generated directly from the
+				community assignments. In this situation, there could be hope
+				to design an asymptotically consistent estimator of the
+				community assignments when the number of cascades goes to
+				infinity.
+
+			\item In Section 3.2.3, the graph is only generated once (possibly
+				from the community assignments, although this is not specified)
+				and then the cascades are generated solely based on the graph
+				structure. So as the number of cascades goes to infinity, the
+				best one could hope for is to have an asymptotically consistent
+				estimator of the graph edges. However, \emph{this might still
+				not be sufficient to recover the communities} (recall that in
+				the standard community detection problem, consistency in
+				community inference is only achieved as the size of the graph
+				grows to infinity). This would make the learning task
+				considered in the paper unsolvable, even in the limit of
+				infinitely many cascades and requires a discussion.
+		\end{itemize}
+
+		Because of this crucial difference, I also believe that the claim made
+		in Section 4.2 that the C-SI-BD model is equivalent to the SI-BD is
+		wrong (it would be true only if the graph was generated anew for each
+		cascade).
+
+	\item Section 4.2: although the approach is based on maximum likelihood
+		estimation, I find the exposition in this section lacking in several
+		respects. Algorithm
+		1 seems to be inspired by the idea of alternated maximization but stops
+		  after a single alternation. At the very least, experimental evidence
+		  to justify this early stopping is necessary (are the gains
+		  insignificant with additional iterations?).
+
+		  More generally, I believe that a likelihood-based approach could be
+		  made to fit very naturally in the Expectation-maximization framework
+		  (with the graph structure being the latent variable), and was
+		  surprised to see no mention of it in Section 4.2 (was there
+		  a specific reason why it was not considered?).
+
+		\item Section 4.3: although the approaches described in this section
+			claim to be agnostic to specific generative models, they clearly
+			seem motivated by probabilistic reasoning. In particular, Algorithm
+			2 is evidently solving a maximum likelihood problem, and equation
+			  (6) is implicitly justified as computing the posterior
+			  probability of the edge being present conditioned on the observed
+			  cascades.  I believe a more principled description of these
+			  approaches, and of the generative models under which they can be
+			  formally justified is required.
+	
+	\item Related to item 1., it is unclear in the experimental section whether
+		the ground truth communities in the studied datasets are in fact
+		correlated with the observations, and if they are, to which extent (see
+		also my comment below about the paper is lacking an explanation of what
+		constitutes a community in the Twitter dataset). I think this concern
+		could be addressed both by some exploratory analysis showing that the
+		ground truth communities indeed correlate with edge density and by
+		a purely synthetic experiment experiment in which the graph is
+		generated from the community assignments (for example using the SBM).
+
+	\item It is claimed that the reason why some method behave poorly is
+		because they are based on specific probabilistic modeling assumptions.
+		Could experiments be conducted to show these methods indeed perform
+		better when the assumptions are satisfied?
+\end{enumerate}
+
+\paragraph{Specific points.} Some additional concerns, which are more localized
+manifestations of the aforedescribed concerns.
+\begin{enumerate}
+	\item Section 4.2, Step (2): it is not clear what is meant by ``it is
+		sufficient to numerically find only the optimal $\delta$''. For which
+		value of $\alpha_{in}$ and $\alpha_{out}$ is $\delta$ solved for
+		numerically? If the solver is directly optimizing over $\alpha_{in}$
+		and $\delta$, then equation (4) is unnecessary, $\alpha_{out}$ is
+		simply equal to $\alpha_{in}/(\delta+1)$.
+
+	\item Section 4.2, Step (3): since the authors adapted the Louvain
+		algorithm, it would be useful to have a pseudocode of the exact
+		adaptation they used, since the given description is too informal to
+		know exactly what was done.
+
+	\item Section 4.2, Step (3): the Louvain algorithm is only a heuristic, so
+		the proposed method does not solve exactly the optimization problem
+		described in Algorithm 1, Step (3). It would be useful to have
+		experimental results showing how far from optimal the algorithm is, and
+		in particular, how the adaptation chosen by the authors defers from the
+		original algorithm.
+
+	\item Section 4.3: all the algorithms in this section use the Louvain
+		clustering algorithm. It is not clear whether the original algorithm or
+		the (loosely specified) adaptation from Section 4.2 was used.
+		A justification for this choice (whether or not to use the adaptation)
+		would also be helpful.
+
+	\item Section 5.1: the description of the Twitter dataset does not explain
+		what the communities are in Twitter terms (as a Twitter user, I was not
+		aware that there are community-like entities on Twitter).
+
+	\item Section 5.3: I find the use of the term ``unbiased'' problematic
+		since the paper has no notion of ground truth distribution, or
+		generative model (see also previous comments). In particular, the
+		authors claim the NMI criterion favors smaller communities, but maybe
+		this is not a problem if the generative model also produces smaller
+		communities?
+
+	\item Section 5.3: Is there any reason why the ``agreement''
+		metric commonly used in community detection was not considered? (see
+		for example Definition 3 in \cite{A18}).
+\end{enumerate}
+
+\paragraph{Conclusion.} Although the problem considered in this paper is
+interesting and worthy of attention, I believe that both the methodology and
+exposition suffer from important gaps. Justifying the proposed methods under
+appropriate probabilistic assumptions and then evaluating them in situations
+where the assumptions hold or do not hold would provide useful insight on the
+applicability of the methods. In its current form, it is very difficult to
+understand why the methods perform the way they do in the experiments and to
+attach a level of confidence to the main takeaway message.
+
+
+\bibliographystyle{alpha}
+\bibliography{jmlr-20-208}
+
+\end{document}