summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorThibaut Horel <thibaut.horel@gmail.com>2020-01-15 13:30:42 -0500
committerThibaut Horel <thibaut.horel@gmail.com>2020-01-15 13:30:42 -0500
commitec70496b6c547b22cacf317b11280fed690bae04 (patch)
tree4f400713bdee54fd4cef1b5872c1fb59a620abe5
parent7e753eda9fadb1a3f95b6af0dc31707f2ad6309d (diff)
downloadreviews-ec70496b6c547b22cacf317b11280fed690bae04.tar.gz
Add JASA review
-rw-r--r--jasa-2019-0653.tex140
1 files changed, 140 insertions, 0 deletions
diff --git a/jasa-2019-0653.tex b/jasa-2019-0653.tex
new file mode 100644
index 0000000..0536f51
--- /dev/null
+++ b/jasa-2019-0653.tex
@@ -0,0 +1,140 @@
+\documentclass[10pt]{article}
+\usepackage[T1]{fontenc}
+\usepackage[utf8]{inputenc}
+\usepackage[hmargin=1.2in, vmargin=1.2in]{geometry}
+\usepackage{amsmath,amsfonts}
+
+\title{\large Review of \emph{Real-time Regression Analysis of Streaming Clustered Data with Possible Abnormal Data Batches}}
+\date{}
+
+\begin{document}
+
+\maketitle
+
+\paragraph{Summary.} This paper studies the problem of estimating the
+parameters $\beta\in\mathbb{R}^p$ of a marginal generalized linear model with
+correlated outcomes via the ``quadratic inference function'' (QIF). Minimizing
+the QIF yields an estimator $\hat\beta^\star$ of $\beta$, the ``oracle''
+estimator. Since solving the minimizing problem requires processing all the
+available samples at once and could be too expensive, the authors instead focus
+on a streaming setting, where small clusters of data arrive in batch. The goal
+is to obtain a ``renewable'' estimator such that given the current estimate one
+can process a cluster from the stream and update the estimate on the fly, the
+crucial point being that the update does not require looking at the entire
+dataset and can be performed by only considering the current state and data
+coming from the stream. The contributions are:
+\begin{itemize}
+ \item by considering a linearization of the QIF, the authors obtain
+ a renewable estimator $\hat\beta$ for which the update can be performed
+ by solving an equation involving only the fresh data from the stream
+ and a summary of the past data (in the form of an aggregated gradient
+ and variance matrix). This equation can be solved using the
+ Newton--Raphson method.
+ \item a theoretical analysis of the resulting estimator, showing
+ consistency and asymptotic normality. Furthermore it is shown that in
+ the large data limit, $\hat\beta$ is equivalent to the oracle estimator
+ $\hat\beta^\star$.
+ \item in the presence of abnormal or corrupted data batches generated with
+ model parameters different from $\beta$, the authors propose
+ a goodness-of-fit test by comparing the current batch to the last
+ correct batch (the last one which passed the test) and only use data
+ batches passing the test in the renewable estimator.
+ \item a discussion about implementation of the proposed method on the Spark
+ Lambda architecture.
+ \item an evaluation of the proposed estimator on synthetic data (linear and
+ logistic models) and the monitoring procedure (test of abnormal
+ batches) confirming the theory.
+ \item an application to a real dataset of car crashes.
+\end{itemize}
+
+\paragraph{Major Comments.}
+
+The problem considered in this paper is important and well-motivated and in
+light of the ever-increasing amount of available data, the need for efficient
+estimators is self-evident. The proposed method appears sound and its benefits
+are convincingly demonstrated in the experiments. However, I found that the
+theoretical analysis suffers from some notable gaps as follows:
+\begin{itemize}
+ \item Appendix A.2 does not contain enough details for me to be able to
+ assess the correctness of the proof. While I believe the result to be
+ true (with maybe slightly stronger assumptions) the provided argument
+ does not constitute a proof in my opinion. In particular:
+ \begin{itemize}
+ \item the overall proof structure (by induction) does not seem
+ sound to me: the induction hypothesis is that $\tilde\beta_j$
+ is consistent for $j=1,\dots,b-1$. However, being consistent is
+ an asymptotic property as the amount data goes to infinity.
+ Yet, for a fix $j$, $\tilde\beta_j$ is computed from a fixed
+ and finite amount of data and thus cannot be consistent. Note
+ that $\tilde\beta_j$ could be consistent if $n_j$ goes to
+ infinity, but it is only assumed that $N_b=\sum_{j=1}^b
+ n_j\to\infty$. So in its current form, I don't see how the
+ proposed proof structure addresses the case where $n_j=n$ is
+ constant and only $b$ goes to infinity (which still implies
+ $N_b = nb\to_{b\to\infty}\infty$).
+ \item it is not clear to me why the quantity in equation (a.4) is
+ $O_p(N_b^{-1/2})$ (this relates to the induction hypothesis
+ being ill-defined as per the previous point).
+ \item the argument from lines 35--40 on page 37 seems to use the
+ mean value theorem (although the quantity $\xi_b$ appearing in
+ those lines has not been defined and should be defined).
+ However, the mean value theorem does not hold for vector valued
+ functions. Since ultimately, only an inequality is needed,
+ I suspect an easy workaround is to use the mean value inequality,
+ but as written, the current argument is not valid.
+ \item the proof relies on $\tilde\beta_j$ being in
+ $\mathbb{N}_\delta$ for all $j$ (page 38, line 19). I don't
+ think this is guaranteed by the procedure, is it an
+ additional assumption that the authors rely on? If so, it
+ should be added to the theorem statement (or maybe the
+ assumption on positive-definiteness should be relaxed to hold
+ everywhere).
+ \item page 38, line 21, positive-definiteness of
+ $C_j(\tilde\beta_j)$ is invoked. However, the assumption in the
+ theorem is only that it is positive-definite \emph{in
+ expectation}. I am not sure how to bridge this gap (should the
+ assumption be strengthened in the theorem statement?)
+ \end{itemize}
+ \item the renewable method requires solving an equation using the
+ Newton--Raphson (NR) method. However, no indication of sufficient
+ conditions for convergence of the NR method are given. In particular,
+ the rate of convergence should be quantified. The pseudocode given on
+ page 16 only says to run it ``until convergence''. However, in
+ practice, the method is only run for a finite number of steps, inducing
+ some residual errors. A formal analysis of this residual error and
+ guarantees under which it can be made sufficiently small so as not to
+ accumulate too much across steps and jeopardized consistency is
+ important. As currently written, the estimator used in the experiments
+ is not the same as the one which is analyzed theoretically and is not
+ proved to be consistent.
+ \item related to the previous point, I didn't find in the experiments any
+ indication of how the NR method was run, for how many steps, and
+ guidelines on how to chose the number of steps.
+\end{itemize}
+
+A less crucial point, but would still be ``nice to have'' is the following.
+Methods such as stochastic gradient descent are deemed insufficient in the
+introduction (page 2, line 26-29) for the problem considered in this paper.
+However, the problem at hand, with its QIF formulation, seems to fall well within
+the realm of stochastic approximation (where one wishes to minimize a function
+written as an expectation over a distribution, given only samples from this
+distribution). In fact, given that the proposed solution is based on
+a linearization of the QIF, I suspect that what it is doing is very close to
+what stochastic gradient descent applied to this problem would yield. It would
+be nice to give this stochastic gradient descent interpretation of the proposed
+solution, if it is true, or explain how it differs, or give more details about
+why stochastic gradient descent cannot be applied to this problem.
+
+\paragraph{Minor comments.}
+\begin{itemize}
+ \item page 8 line 10, in the sum, the second $g_i$ should simply be $g$
+ \item page 37 line 23, the leading $+\frac{1}{N_b}$ should be
+ $-\frac{1}{N_b}$
+\end{itemize}
+
+\paragraph{Conclusion.} The problem considered in this paper is important and
+the proposed method interesting and well-justified by the experiments. However,
+a significant tightening of the theoretical analysis (along the lines the above
+points) is necessary.
+
+\end{document}