From 6c02609e94efb051a935f5fe0ea8b39c8b121fab Mon Sep 17 00:00:00 2001
From: Thibaut Horel <thibaut.horel@gmail.com>
Date: Sat, 19 Aug 2017 09:14:12 -0400
Subject: Add DECSUP-00465 review

---
 DECSUP00465.tex | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 156 insertions(+)
 create mode 100644 DECSUP00465.tex

diff --git a/DECSUP00465.tex b/DECSUP00465.tex
new file mode 100644
index 0000000..ab2e99f
--- /dev/null
+++ b/DECSUP00465.tex
@@ -0,0 +1,156 @@
+\documentclass[12pt]{article}
+\usepackage[T1]{fontenc}
+\usepackage[utf8]{inputenc}
+\usepackage[hmargin=1in, vmargin=1in]{geometry}
+
+\title{Review for \emph{Optimal Pricing in E-Commerce Based on Sparse and Noisy Data}}
+\date{}
+\begin{document}
+\maketitle
+
+\paragraph{Problem and results.} In this paper, the authors study the problem
+of determining the optimal price of products in the context of online stores.
+The goal is to use past data (consisting of prices and the associated mean daily
+revenue/profit) to propose new (approximately) optimal prices.  The main
+challenges are that the data is sparse (some products have few
+data points) and noisy (mean daily revenue varies a lot based on exogenous
+factors), and that the function to optimize (revenue as a function of the
+chosen price) is both unknown and complex with possibly multiple local optima.
+The proposed method proceeds as follows:
+\begin{itemize}
+	\item use bootstrap estimation to build an estimator of the probability
+		that an observed price in the dataset is optimal.
+	\item use kernel regression to extrapolate this function to all prices:
+		this steps gives a way to compute for any (possibly unobserved) price,
+		the probability that it is optimal.
+	\item use additional data (\emph{e.g. products from the same category or
+		competitor prices}) to build a prior distribution on the probability
+		that a price is optimal. The model used for this are decision trees.
+	\item use the Metropolis-Hastings to sample prices from the posterior
+		distribution.
+\end{itemize}
+After describing their method, the authors present experimental results, both
+simulation-based and also based on a deployment on a large European E-commerce
+website and show that the new method performs better than baselines and also
+led to significant increases in the revenue of the company it was deployed at.
+
+\paragraph{Comments on the proposed method.} The proposed method uses fairly
+standard ideas from statistics/machine learning, but which apparently haven't
+been used until now in the context of optimal pricing in E-commerce. I am not
+knowledgeable about the E-commerce literature, so I am basing this observation
+on the related work section. Each step is properly motivated by the authors and
+tailored to the specific challenges and features of the problem at hand.
+However, my main concerns regarding the method are the following:
+\begin{itemize}
+	\item Instead of directly optimizing the (unknown) revenue/profit function,
+		the authors choose to focus on maximizing the probability that a given
+		price is optimal. However, directly estimating the probability that
+		a price is optimal is unfeasible, so instead, the method optimizes the
+		probability that a given price leads to a revenue (or a combination of
+		revenue and profit) above some threshold, as defined in Equation (5).
+		It is really unclear what this achieves: in particular, the algorithm
+		could get stuck in selecting suboptimal prices for which it is very
+		confident, and fail to explore prices which could be better but for
+		which the confidence is very low at the moment (see also my comment
+		below about exploration/exploitation trade-off).
+	\item Related to the previous point, very little indication is given about
+		the choice of the threshold introduced in Equation (5). This is
+		problematic since if it is chosen too low, then the
+		algorithm could get stuck in suboptimal minima. The authors indicate
+		that the threshold can be chosen based on previously observed data
+		points, which I think heightens the problem of getting stuck before
+		performing sufficient exploration. 
+	\item The paper mentions ``dynamic pricing'' several times, and
+		this is indeed a key feature of the problem at hand. However, the
+		proposed method seems highly static to me: the main algorithm considers
+		all data points available so far and computes prices for the next day
+		(or next time period); this needs to be rerun whenever new prices are
+		needed. It could be nice to show how this procedure can be turned into
+		a feedback loop where new data points are efficiently incorporated. For
+		example, is it the case that the probabilities can be efficiently
+		updated when new data points are observed? Is the approach fully
+		Bayesian in the sense that the computed probabilities are used as the
+		prior distribution for the next step, or does the method always start
+		with the prior distribution given by the decision trees?
+	\item Even if each step is fairly standard and theoretically well
+		understood, the entire procedure combines them in a way which is hard
+		to reason about. As mentioned before, the threshold in (5) depends on
+		the observed data and seem to lead to insufficient exploration. This
+		could be counterbalanced by the proposal distribution in the
+		Metropolis-Hastings algorithm, but it is unclear what the
+		overall procedure converges to (see my comment below about the
+		asbence of theoretical results).
+\end{itemize}
+
+\paragraph{Absence of theoretical results.} Related to the last point above, it
+is regrettable that the proposed method lacks theoretical guarantees.  Even
+though it is known that each individual step converges/is optimal in the limit
+of infinitly many data points, the steps are combined in a way which makes it
+hard to be convinced that the entire procedure converges to a good solution in
+the limit.
+
+
+\paragraph{Additional remark on the overall approach.} At a high-level, the
+problem is an instance of the following problem: optimizing an
+unknown function based on noisy observtions of its values. This is
+a well-studied problem whose main challenge is the so-called
+``exploration-exploitation'' trade-off: methods for this problem need to
+balance exploration (to learn enough about the structure of the objective
+function) and exploitation (to make good decisions with respect to what is
+currently known). Important keywords relevant to this problem and which roughly
+cover this area of research: bandit algorithms, bayesian bandits, online
+optimization, regret minimization, Thompson sampling, derivative-free
+optimization under noise, optimization with bandit feedback. Many solutions
+proposed in this line of work share some similarity with the method proposed in
+this paper, but do so in a more principled way. I think the authors should at
+the very least discuss extensively the need to design a new ad hoc solution and
+how it compares to the above line of work. I would also recommend trying to
+frame their problem using one of the above frameworks: for example, the
+framework of regret minimization seems very well suited to describe the problem
+of dynamic pricing where decisions are made over time based on partial
+observations.
+
+\paragraph{Comments on the experimental results.} The experiments seem sound
+and well-designed and the proposed method does seem to perform significantly
+better than baselines. Maybe more details should be given about how and to
+which value the various meta-parameters where chosen: in particular, the
+boostrap parameter $B$, the kernel regression parameter $h$, the threshold in
+equation (5), the $\lambda$ on page 13, how the decision trees where trained.
+
+\paragraph{Comments on the writing.} The paper is well written overall with
+a structure which is easy to follow. A few points which could improve the
+exposition:
+\begin{itemize}
+\item the caption of Figure 1 page 4 should have one sentence explaining that
+	the curve is obtained via extrapolation with kernel regression (and refer
+		to the later section where it is explained).
+\item section 3.4.1 on bootstrap is heavy and not very useful in my opinion.
+	It should be assumed that readers are somewhat familiar with
+		boostrap estimation. I would reduce this section to one paragraph with
+		maybe one or two sentences on the general principle of bootsrap
+		estimation and then a clear explanation of the form: ``we apply
+		boostrap estimation on data X to estimate Y, this will be then used to
+		do Z''.
+\item the description of Algorithm 1 could be significantly simplified by
+	removing all uses of the $\hat{F}_{boot}$ distribution:  here the
+		algorithm only uses an empirical distribution to estimate
+		a probability. All which is really done is counting the fraction of
+		bootstrap samples whose mean target value is above the threshold
+		$\tilde{y}$. This can be explained in one sentence in English, but is
+		considerably obfuscated by the way Algorithm 1 is written. Furthermore,
+		I think that line 9 in Algorithm 1 contains a typo, I think it should
+		read: ``Solve $\hat{q}_n = \hat{F}_{boot}(\tilde{y})$''. But again,
+		I would recommend not using $\hat{F}_{boot}$ altogether.
+\item section 3.5.2 is similarly heavy: it should be assumed that the reader is
+	familiar with the concept of prior distribution. For example, I would
+		remove the two paragraphs below Equation (8).
+\end{itemize}
+
+\paragraph{Conclusion.} While the problem is interesting and the proposed
+method works well experimentally, I cannot recommend the paper for publication
+in its current form: the method is not particularly novel and combines fairly
+standard techniques in an ad hoc way which makes it hard to reason about
+convergence, whereas a well-established area of research already contains many
+tools which could lead to a more principled solution to the problem.
+
+\end{document}
-- 
cgit v1.2.3-70-g09d2