jasa-2019-0653-R1.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69

\documentclass[10pt]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[hmargin=1.2in, vmargin=1.2in]{geometry}
\usepackage{amsmath,amsfonts}

\title{\large Review of \emph{Real-time Regression Analysis of Streaming Clustered Data with Possible Abnormal Data Batches}}
\date{}

\begin{document}

\maketitle

This is an update on a previous review of the same paper after reading the
authors' revision. Overall I would like to thank the authors for taking my
comments and questions in serious considerations and improving the paper
accordingly.

\paragraph{1.}
The main change at the technical level has been a clarification about the
regime with which the number of samples is taken to grow to infinity, with now
two distinct regime: one where the size of each batch is constant and the
number of batches grows to infinity, and one where the first batch's size grows
to infinity (with two sub-regimes depending on whether the subsequent batches
can also grow to infinity).

The asymptotic analysis of the estimator in the first of these two regime was not
previously covered by the original proof, but in this revision the authors
added a separate analysis for this case while also clarifying the proof in the
other case.

Thanks to this improvement, I have now reached a reasonable level confidence in
the correctness of the stated results and believe that the paper is technically
sound.

\paragraph{} A minor suggestion to improve the argument given on line 17, page
43 in the appendix, which lacks rigor as currently written ($n$ hasn't been
   defined and it seems to suggest that all batches have the same size, which
   is not without loss of generality, it is also not clear in which sense the
   approximation $\simeq$ needs to be understood).

   By definition one has $n_j = N_j - N_{j-1}$ hence, defining $N_0=0$:
   \begin{align*}
	   \sum_{j=1}^{b-1} \frac{n_j}{\sqrt{N_j}} = \sum_{j=1}^{b-1}
	   \frac{N_j-N_{j-1}}{\sqrt{N_j}}
	   \leq
	   \sum_{j=1}^{b-1}
	   \int_{N_{j-1}}^{N_j}
	   \frac{dt}{\sqrt{t}} = \int_{0}^{N_j}\frac{dt}{\sqrt{t}} = 2\sqrt{N_j}\,,
	\end{align*}
	where the inequality holds since $t\mapsto 1/\sqrt{t}$ is a decreasing
	function.

\paragraph{2.} The authors also clarified the details of how the Newton-Raphson
method is used, in particular conditions guaranteeing convergence and the
convergence criterion used in the numerical experiments.

While I agree that the numerical experiments clearly show that convergence of
the NR method does happen quickly in practice, I was not convinced by the
authors' explanation that there is no need to control the residual error in the
theoretical analysis, and in particular make sure that it does not accumulate
over the iterations of the recursive procedure. The authors claim that it is
the ``conventional practice in the statistical literature'', but my impression
is that nested procedure (where a subroutine, like NR here, is used in each
iteration) are becoming increasingly common for online estimation (following
a similar trend in the fields of stochastic optimization and machine learning)
and it is now standard to do and end-to-end analysis of the entire procedure,
including the error terms accrued at each iteration.
\end{document}