jmlr-20-208.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182

\documentclass[11pt]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[hmargin=1.2in, vmargin=1.1in]{geometry}

\title{Review of \emph{When Less is More: Systematic Analysis of Cascade-based Community Detection}}
\date{}
\begin{document}

\maketitle

\vspace{-3em}

\paragraph{Summary.} This paper considers the problem of recovering unknown
communities in a given graph  from observations of ``cascades''---that is,
samples from a diffusion process propagating along the edges of the graph.
Specifically, it is assumed that only the identities and ``infection'' times of
vertices are observed, over multiple cascades, and the goal is to infer the
community memberships of the vertices. Four algorithms are described, one is
based on a likelihood approach, and the remaining three are based on
a clustering approach. The algorithms are then evaluated and compared on
a Twitter dataset (including both real community assignments and real
cascades), and on a collection of real networks on which synthetic cascades
were generated.

\paragraph{General comments.} The problem considered in this paper is natural and an
interesting twist on two standard problems: the network inference problem
(recovering edges from observations of cascades), and the community detection
problem (recovering community assignments from edges). By going directly from
cascades to community assignments, as is considered in this paper, one could
hope to have better recovery guarantees, than trying to infer the edges of the
graph first and then applying a standard community detection algorithm.

The main contribution of the paper seems to be the experimental evaluation,
since all discussed algorithms seem to be either already present in
a previous paper by the same authors, or simple adaptations of known
algorithms. The main takeaway is that among the considered algorithms, the
\textsc{Clique} algorithm seems to perform consistently well (either being the
best of among the bests).

\paragraph{Major points.} Although the paper claims to be agnostic to specific
probabilistic models for the graph structure and cascades, I believe it is
important to understand how the proposed methodology would perform if
specific models were assumed. In particular, I have the following concerns with
the models described in Section 3.2 and how the rest of the paper relates to
them:
\begin{enumerate}
	\item With the exception of Section 3.2.3, no assumption is made on how the
		community assignments affect the observations since only the cascade
		model is described (there is no discussion of the graph generative
		model). For all we know, the observations could be independent from the
		community assignments making the learning problem considered in this
		paper ill-defined.

		Even assuming that the graph structure is generated from a model making
		use of the community assignments (for example the Stochastic Block
		Model), it is very important to emphasize that Sections 3.2.2 and 3.2.3
		consider very different learning regimes:
		\begin{itemize}
			\item In Section 3.2.3, each cascade is generated directly from the
				community assignments. In this situation, there could be hope
				to design an asymptotically consistent estimator of the
				community assignments when the number of cascades goes to
				infinity.

			\item In Section 3.2.3, the graph is only generated once (possibly
				from the community assignments, although this is not specified)
				and then the cascades are generated solely based on the graph
				structure. So as the number of cascades goes to infinity, the
				best one could hope for is to have an asymptotically consistent
				estimator of the graph edges. However, \emph{this might still
				not be sufficient to recover the communities} (recall that in
				the standard community detection problem, consistency in
				community inference is only achieved as the size of the graph
				grows to infinity). This would make the learning task
				considered in the paper unsolvable, even in the limit of
				infinitely many cascades and requires a discussion.
		\end{itemize}

		Because of this crucial difference, I also believe that the claim made
		in Section 4.2 that the C-SI-BD model is equivalent to the SI-BD is
		wrong (it would be true only if the graph was generated anew for each
		cascade).

	\item Section 4.2: although the approach is based on maximum likelihood
		estimation, I find the exposition in this section lacking in several
		respects. Algorithm
		1 seems to be inspired by the idea of alternated maximization but stops
		  after a single alternation. At the very least, experimental evidence
		  to justify this early stopping is necessary (are the gains
		  insignificant with additional iterations?).

		  More generally, I believe that a likelihood-based approach could be
		  made to fit very naturally in the Expectation-maximization framework
		  (with the graph structure being the latent variable), and was
		  surprised to see no mention of it in Section 4.2 (was there
		  a specific reason why it was not considered?).

		\item Section 4.3: although the approaches described in this section
			claim to be agnostic to specific generative models, they clearly
			seem motivated by probabilistic reasoning. In particular, Algorithm
			2 is evidently solving a maximum likelihood problem, and equation
			  (6) is implicitly justified as computing the posterior
			  probability of the edge being present conditioned on the observed
			  cascades.  I believe a more principled description of these
			  approaches, and of the generative models under which they can be
			  formally justified is required.
	
	\item Related to item 1., it is unclear in the experimental section whether
		the ground truth communities in the studied datasets are in fact
		correlated with the observations, and if they are, to which extent (see
		also my comment below about the paper is lacking an explanation of what
		constitutes a community in the Twitter dataset). I think this concern
		could be addressed both by some exploratory analysis showing that the
		ground truth communities indeed correlate with edge density and by
		a purely synthetic experiment experiment in which the graph is
		generated from the community assignments (for example using the SBM).

	\item It is claimed that the reason why some method behave poorly is
		because they are based on specific probabilistic modeling assumptions.
		Could experiments be conducted to show these methods indeed perform
		better when the assumptions are satisfied?
\end{enumerate}

\paragraph{Specific points.} Some additional concerns, which are more localized
manifestations of the aforedescribed concerns.
\begin{enumerate}
	\item Section 4.2, Step (2): it is not clear what is meant by ``it is
		sufficient to numerically find only the optimal $\delta$''. For which
		value of $\alpha_{in}$ and $\alpha_{out}$ is $\delta$ solved for
		numerically? If the solver is directly optimizing over $\alpha_{in}$
		and $\delta$, then equation (4) is unnecessary, $\alpha_{out}$ is
		simply equal to $\alpha_{in}/(\delta+1)$.

	\item Section 4.2, Step (3): since the authors adapted the Louvain
		algorithm, it would be useful to have a pseudocode of the exact
		adaptation they used, since the given description is too informal to
		know exactly what was done.

	\item Section 4.2, Step (3): the Louvain algorithm is only a heuristic, so
		the proposed method does not solve exactly the optimization problem
		described in Algorithm 1, Step (3). It would be useful to have
		experimental results showing how far from optimal the algorithm is, and
		in particular, how the adaptation chosen by the authors defers from the
		original algorithm.

	\item Section 4.3: all the algorithms in this section use the Louvain
		clustering algorithm. It is not clear whether the original algorithm or
		the (loosely specified) adaptation from Section 4.2 was used.
		A justification for this choice (whether or not to use the adaptation)
		would also be helpful.

	\item Section 5.1: the description of the Twitter dataset does not explain
		what the communities are in Twitter terms (as a Twitter user, I was not
		aware that there are community-like entities on Twitter).

	\item Section 5.3: I find the use of the term ``unbiased'' problematic
		since the paper has no notion of ground truth distribution, or
		generative model (see also previous comments). In particular, the
		authors claim the NMI criterion favors smaller communities, but maybe
		this is not a problem if the generative model also produces smaller
		communities?

	\item Section 5.3: Is there any reason why the ``agreement''
		metric commonly used in community detection was not considered? (see
		for example Definition 3 in \cite{A18}).
\end{enumerate}

\paragraph{Conclusion.} Although the problem considered in this paper is
interesting and worthy of attention, I believe that both the methodology and
exposition suffer from important gaps. Justifying the proposed methods under
appropriate probabilistic assumptions and then evaluating them in situations
where the assumptions hold or do not hold would provide useful insight on the
applicability of the methods. In its current form, it is very difficult to
understand why the methods perform the way they do in the experiments and to
attach a level of confidence to the main takeaway message.


\bibliographystyle{alpha}
\bibliography{jmlr-20-208}

\end{document}