From 6c02609e94efb051a935f5fe0ea8b39c8b121fab Mon Sep 17 00:00:00 2001 From: Thibaut Horel Date: Sat, 19 Aug 2017 09:14:12 -0400 Subject: Add DECSUP-00465 review --- DECSUP00465.tex | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 156 insertions(+) create mode 100644 DECSUP00465.tex diff --git a/DECSUP00465.tex b/DECSUP00465.tex new file mode 100644 index 0000000..ab2e99f --- /dev/null +++ b/DECSUP00465.tex @@ -0,0 +1,156 @@ +\documentclass[12pt]{article} +\usepackage[T1]{fontenc} +\usepackage[utf8]{inputenc} +\usepackage[hmargin=1in, vmargin=1in]{geometry} + +\title{Review for \emph{Optimal Pricing in E-Commerce Based on Sparse and Noisy Data}} +\date{} +\begin{document} +\maketitle + +\paragraph{Problem and results.} In this paper, the authors study the problem +of determining the optimal price of products in the context of online stores. +The goal is to use past data (consisting of prices and the associated mean daily +revenue/profit) to propose new (approximately) optimal prices. The main +challenges are that the data is sparse (some products have few +data points) and noisy (mean daily revenue varies a lot based on exogenous +factors), and that the function to optimize (revenue as a function of the +chosen price) is both unknown and complex with possibly multiple local optima. +The proposed method proceeds as follows: +\begin{itemize} + \item use bootstrap estimation to build an estimator of the probability + that an observed price in the dataset is optimal. + \item use kernel regression to extrapolate this function to all prices: + this steps gives a way to compute for any (possibly unobserved) price, + the probability that it is optimal. + \item use additional data (\emph{e.g. products from the same category or + competitor prices}) to build a prior distribution on the probability + that a price is optimal. The model used for this are decision trees. + \item use the Metropolis-Hastings to sample prices from the posterior + distribution. +\end{itemize} +After describing their method, the authors present experimental results, both +simulation-based and also based on a deployment on a large European E-commerce +website and show that the new method performs better than baselines and also +led to significant increases in the revenue of the company it was deployed at. + +\paragraph{Comments on the proposed method.} The proposed method uses fairly +standard ideas from statistics/machine learning, but which apparently haven't +been used until now in the context of optimal pricing in E-commerce. I am not +knowledgeable about the E-commerce literature, so I am basing this observation +on the related work section. Each step is properly motivated by the authors and +tailored to the specific challenges and features of the problem at hand. +However, my main concerns regarding the method are the following: +\begin{itemize} + \item Instead of directly optimizing the (unknown) revenue/profit function, + the authors choose to focus on maximizing the probability that a given + price is optimal. However, directly estimating the probability that + a price is optimal is unfeasible, so instead, the method optimizes the + probability that a given price leads to a revenue (or a combination of + revenue and profit) above some threshold, as defined in Equation (5). + It is really unclear what this achieves: in particular, the algorithm + could get stuck in selecting suboptimal prices for which it is very + confident, and fail to explore prices which could be better but for + which the confidence is very low at the moment (see also my comment + below about exploration/exploitation trade-off). + \item Related to the previous point, very little indication is given about + the choice of the threshold introduced in Equation (5). This is + problematic since if it is chosen too low, then the + algorithm could get stuck in suboptimal minima. The authors indicate + that the threshold can be chosen based on previously observed data + points, which I think heightens the problem of getting stuck before + performing sufficient exploration. + \item The paper mentions ``dynamic pricing'' several times, and + this is indeed a key feature of the problem at hand. However, the + proposed method seems highly static to me: the main algorithm considers + all data points available so far and computes prices for the next day + (or next time period); this needs to be rerun whenever new prices are + needed. It could be nice to show how this procedure can be turned into + a feedback loop where new data points are efficiently incorporated. For + example, is it the case that the probabilities can be efficiently + updated when new data points are observed? Is the approach fully + Bayesian in the sense that the computed probabilities are used as the + prior distribution for the next step, or does the method always start + with the prior distribution given by the decision trees? + \item Even if each step is fairly standard and theoretically well + understood, the entire procedure combines them in a way which is hard + to reason about. As mentioned before, the threshold in (5) depends on + the observed data and seem to lead to insufficient exploration. This + could be counterbalanced by the proposal distribution in the + Metropolis-Hastings algorithm, but it is unclear what the + overall procedure converges to (see my comment below about the + asbence of theoretical results). +\end{itemize} + +\paragraph{Absence of theoretical results.} Related to the last point above, it +is regrettable that the proposed method lacks theoretical guarantees. Even +though it is known that each individual step converges/is optimal in the limit +of infinitly many data points, the steps are combined in a way which makes it +hard to be convinced that the entire procedure converges to a good solution in +the limit. + + +\paragraph{Additional remark on the overall approach.} At a high-level, the +problem is an instance of the following problem: optimizing an +unknown function based on noisy observtions of its values. This is +a well-studied problem whose main challenge is the so-called +``exploration-exploitation'' trade-off: methods for this problem need to +balance exploration (to learn enough about the structure of the objective +function) and exploitation (to make good decisions with respect to what is +currently known). Important keywords relevant to this problem and which roughly +cover this area of research: bandit algorithms, bayesian bandits, online +optimization, regret minimization, Thompson sampling, derivative-free +optimization under noise, optimization with bandit feedback. Many solutions +proposed in this line of work share some similarity with the method proposed in +this paper, but do so in a more principled way. I think the authors should at +the very least discuss extensively the need to design a new ad hoc solution and +how it compares to the above line of work. I would also recommend trying to +frame their problem using one of the above frameworks: for example, the +framework of regret minimization seems very well suited to describe the problem +of dynamic pricing where decisions are made over time based on partial +observations. + +\paragraph{Comments on the experimental results.} The experiments seem sound +and well-designed and the proposed method does seem to perform significantly +better than baselines. Maybe more details should be given about how and to +which value the various meta-parameters where chosen: in particular, the +boostrap parameter $B$, the kernel regression parameter $h$, the threshold in +equation (5), the $\lambda$ on page 13, how the decision trees where trained. + +\paragraph{Comments on the writing.} The paper is well written overall with +a structure which is easy to follow. A few points which could improve the +exposition: +\begin{itemize} +\item the caption of Figure 1 page 4 should have one sentence explaining that + the curve is obtained via extrapolation with kernel regression (and refer + to the later section where it is explained). +\item section 3.4.1 on bootstrap is heavy and not very useful in my opinion. + It should be assumed that readers are somewhat familiar with + boostrap estimation. I would reduce this section to one paragraph with + maybe one or two sentences on the general principle of bootsrap + estimation and then a clear explanation of the form: ``we apply + boostrap estimation on data X to estimate Y, this will be then used to + do Z''. +\item the description of Algorithm 1 could be significantly simplified by + removing all uses of the $\hat{F}_{boot}$ distribution: here the + algorithm only uses an empirical distribution to estimate + a probability. All which is really done is counting the fraction of + bootstrap samples whose mean target value is above the threshold + $\tilde{y}$. This can be explained in one sentence in English, but is + considerably obfuscated by the way Algorithm 1 is written. Furthermore, + I think that line 9 in Algorithm 1 contains a typo, I think it should + read: ``Solve $\hat{q}_n = \hat{F}_{boot}(\tilde{y})$''. But again, + I would recommend not using $\hat{F}_{boot}$ altogether. +\item section 3.5.2 is similarly heavy: it should be assumed that the reader is + familiar with the concept of prior distribution. For example, I would + remove the two paragraphs below Equation (8). +\end{itemize} + +\paragraph{Conclusion.} While the problem is interesting and the proposed +method works well experimentally, I cannot recommend the paper for publication +in its current form: the method is not particularly novel and combines fairly +standard techniques in an ad hoc way which makes it hard to reason about +convergence, whereas a well-established area of research already contains many +tools which could lead to a more principled solution to the problem. + +\end{document} -- cgit v1.2.3-70-g09d2