1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
|
\documentclass[11pt]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[hmargin=1.2in, vmargin=1.1in]{geometry}
\title{Review of \emph{When Less is More: Systematic Analysis of Cascade-based Community Detection}}
\date{}
\begin{document}
\maketitle
\vspace{-3em}
\paragraph{Summary.} This paper considers the problem of recovering unknown
communities in a given graph from observations of ``cascades''---that is,
samples from a diffusion process propagating along the edges of the graph.
Specifically, it is assumed that only the identities and ``infection'' times of
vertices are observed, over multiple cascades, and the goal is to infer the
community memberships of the vertices. Four algorithms are described, one is
based on a likelihood approach, and the remaining three are based on
a clustering approach. The algorithms are then evaluated and compared on
a Twitter dataset (including both real community assignments and real
cascades), and on a collection of real networks on which synthetic cascades
were generated.
\paragraph{General comments.} The problem considered in this paper is natural and an
interesting twist on two standard problems: the network inference problem
(recovering edges from observations of cascades), and the community detection
problem (recovering community assignments from edges). By going directly from
cascades to community assignments, as is considered in this paper, one could
hope to have better recovery guarantees, than trying to infer the edges of the
graph first and then applying a standard community detection algorithm.
The main contribution of the paper seems to be the experimental evaluation,
since all discussed algorithms seem to be either already present in
a previous paper by the same authors, or simple adaptations of known
algorithms. The main takeaway is that among the considered algorithms, the
\textsc{Clique} algorithm seems to perform consistently well (either being the
best of among the bests).
\paragraph{Major points.} Although the paper claims to be agnostic to specific
probabilistic models for the graph structure and cascades, I believe it is
important to understand how the proposed methodology would perform if
specific models were assumed. In particular, I have the following concerns with
the models described in Section 3.2 and how the rest of the paper relates to
them:
\begin{enumerate}
\item With the exception of Section 3.2.3, no assumption is made on how the
community assignments affect the observations since only the cascade
model is described (there is no discussion of the graph generative
model). For all we know, the observations could be independent from the
community assignments making the learning problem considered in this
paper ill-defined.
Even assuming that the graph structure is generated from a model making
use of the community assignments (for example the Stochastic Block
Model), it is very important to emphasize that Sections 3.2.2 and 3.2.3
consider very different learning regimes:
\begin{itemize}
\item In Section 3.2.3, each cascade is generated directly from the
community assignments. In this situation, there could be hope
to design an asymptotically consistent estimator of the
community assignments when the number of cascades goes to
infinity.
\item In Section 3.2.3, the graph is only generated once (possibly
from the community assignments, although this is not specified)
and then the cascades are generated solely based on the graph
structure. So as the number of cascades goes to infinity, the
best one could hope for is to have an asymptotically consistent
estimator of the graph edges. However, \emph{this might still
not be sufficient to recover the communities} (recall that in
the standard community detection problem, consistency in
community inference is only achieved as the size of the graph
grows to infinity). This would make the learning task
considered in the paper unsolvable, even in the limit of
infinitely many cascades and requires a discussion.
\end{itemize}
Because of this crucial difference, I also believe that the claim made
in Section 4.2 that the C-SI-BD model is equivalent to the SI-BD is
wrong (it would be true only if the graph was generated anew for each
cascade).
\item Section 4.2: although the approach is based on maximum likelihood
estimation, I find the exposition in this section lacking in several
respects. Algorithm
1 seems to be inspired by the idea of alternated maximization but stops
after a single alternation. At the very least, experimental evidence
to justify this early stopping is necessary (are the gains
insignificant with additional iterations?).
More generally, I believe that a likelihood-based approach could be
made to fit very naturally in the Expectation-maximization framework
(with the graph structure being the latent variable), and was
surprised to see no mention of it in Section 4.2 (was there
a specific reason why it was not considered?).
\item Section 4.3: although the approaches described in this section
claim to be agnostic to specific generative models, they clearly
seem motivated by probabilistic reasoning. In particular, Algorithm
2 is evidently solving a maximum likelihood problem, and equation
(6) is implicitly justified as computing the posterior
probability of the edge being present conditioned on the observed
cascades. I believe a more principled description of these
approaches, and of the generative models under which they can be
formally justified is required.
\item Related to item 1., it is unclear in the experimental section whether
the ground truth communities in the studied datasets are in fact
correlated with the observations, and if they are, to which extent (see
also my comment below about the paper is lacking an explanation of what
constitutes a community in the Twitter dataset). I think this concern
could be addressed both by some exploratory analysis showing that the
ground truth communities indeed correlate with edge density and by
a purely synthetic experiment experiment in which the graph is
generated from the community assignments (for example using the SBM).
\item It is claimed that the reason why some method behave poorly is
because they are based on specific probabilistic modeling assumptions.
Could experiments be conducted to show these methods indeed perform
better when the assumptions are satisfied?
\end{enumerate}
\paragraph{Specific points.} Some additional concerns, which are more localized
manifestations of the aforedescribed concerns.
\begin{enumerate}
\item Section 4.2, Step (2): it is not clear what is meant by ``it is
sufficient to numerically find only the optimal $\delta$''. For which
value of $\alpha_{in}$ and $\alpha_{out}$ is $\delta$ solved for
numerically? If the solver is directly optimizing over $\alpha_{in}$
and $\delta$, then equation (4) is unnecessary, $\alpha_{out}$ is
simply equal to $\alpha_{in}/(\delta+1)$.
\item Section 4.2, Step (3): since the authors adapted the Louvain
algorithm, it would be useful to have a pseudocode of the exact
adaptation they used, since the given description is too informal to
know exactly what was done.
\item Section 4.2, Step (3): the Louvain algorithm is only a heuristic, so
the proposed method does not solve exactly the optimization problem
described in Algorithm 1, Step (3). It would be useful to have
experimental results showing how far from optimal the algorithm is, and
in particular, how the adaptation chosen by the authors defers from the
original algorithm.
\item Section 4.3: all the algorithms in this section use the Louvain
clustering algorithm. It is not clear whether the original algorithm or
the (loosely specified) adaptation from Section 4.2 was used.
A justification for this choice (whether or not to use the adaptation)
would also be helpful.
\item Section 5.1: the description of the Twitter dataset does not explain
what the communities are in Twitter terms (as a Twitter user, I was not
aware that there are community-like entities on Twitter).
\item Section 5.3: I find the use of the term ``unbiased'' problematic
since the paper has no notion of ground truth distribution, or
generative model (see also previous comments). In particular, the
authors claim the NMI criterion favors smaller communities, but maybe
this is not a problem if the generative model also produces smaller
communities?
\item Section 5.3: Is there any reason why the ``agreement''
metric commonly used in community detection was not considered? (see
for example Definition 3 in \cite{A18}).
\end{enumerate}
\paragraph{Conclusion.} Although the problem considered in this paper is
interesting and worthy of attention, I believe that both the methodology and
exposition suffer from important gaps. Justifying the proposed methods under
appropriate probabilistic assumptions and then evaluating them in situations
where the assumptions hold or do not hold would provide useful insight on the
applicability of the methods. In its current form, it is very difficult to
understand why the methods perform the way they do in the experiments and to
attach a level of confidence to the main takeaway message.
\bibliographystyle{alpha}
\bibliography{jmlr-20-208}
\end{document}
|