diff options
| author | Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> | 2013-02-10 18:08:15 -0800 |
|---|---|---|
| committer | Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> | 2013-02-10 18:08:15 -0800 |
| commit | 91db9186fa25585c02a035e99386f73c75ad2601 (patch) | |
| tree | bb4bdae8683c9214400dff7d038d113b209ade0b /intro.tex | |
| parent | 5bd3c3aef9533a2697c4b3660c53c1411fa18c77 (diff) | |
| download | recommendation-91db9186fa25585c02a035e99386f73c75ad2601.tar.gz | |
intro markets and typos
Diffstat (limited to 'intro.tex')
| -rw-r--r-- | intro.tex | 12 |
1 files changed, 6 insertions, 6 deletions
@@ -6,12 +6,12 @@ an {\em experimenter} \E\ has access to a population of $n$ potential experiment subjects. Each subject $i\in \{1,\ldots,n\}$ is associated with a set of parameters (or features) $x_i\in \reals^d$, known to the experimenter. -\E\ wishes to measures a certain inherent property of the subjects by performing an experiment: the outcome $y_i$ of the experiment on a subject $i$ is unknown to \E\ before the experiment is performed. +\E\ wishes to measure a certain inherent property of the subjects by performing an experiment: the outcome $y_i$ of the experiment on a subject $i$ is unknown to \E\ before the experiment is performed. Typically, \E\ has a hypothesis on the relationship between $x_i$'s and $y_i$'s. Due to its simplicity, as well as its ubiquity in statistical analysis, a large body of work has focused on linear hypotheses: \emph{i.e.}, it is assumed that there exists a $\beta\in\reals^d$ such that $y_i = \T{\beta} x_i+\varepsilon_i$, for all $i\in \{1,\ldots,n\},$ where $\varepsilon_i$ are zero-mean, i.i.d.~random variables. Conducting the experiments and obtaining the measurements $y_i$ lets \E\ estimate $\beta$, \emph{e.g.}, through linear regression. %, \emph{i.e.}, the model underlying the data, and the experimenter's goal is to obtain such an estimate as accurately as possible. %The goal of experimental design amounts to determining which subjects to experiment upon to produce the best possible such estimate. -The above experimental design scenario has many applications, from medical testing to marketing research and others. -There is a rich literature about estimation procedures, as well as for means of quantifying the quality of the produced estimate~\cite{pukelsheim2006optimal}. There is also an extensive theory on how to select subjects +The above experimental design scenario has many applications. Regression over personal data collected through surveys or experimentation is the cornerstone of marketing research, as well as research in a variety of experimental sciences such as medicine and sociology. Crucially, the statistical analysis of user data is also a widely spread practice among Internet companies, which routinely use machine learning techniques over vast records of user data to perform inference and classification tasks integral to their daily operations. +Beyond linear regression, there is a rich literature about estimation procedures, as well as for means of quantifying the quality of the produced estimate~\cite{pukelsheim2006optimal}. There is also an extensive theory on how to select subjects if \E\ can conduct only a limited number of experiments, so the estimation process returns $\beta$ that approximates the true parameter of the underlying population \cite{ginebra2007measure,le1996comparison,chaloner1995bayesian,boyd2004convex}. @@ -20,9 +20,9 @@ In our setting, experiments cannot be manipulated and hence measurements are rel However, there is a cost $c_i$ associated with experimenting on subject $i$ which varies from subject to subject. This may be viewed as the -cost subject $i$ incurs when tested and for which she needs to be reimbursed; or, it might be viewed as the incentive for $i$ to participate in the experiment; or, it might be the inherent value of the data. +cost subject $i$ incurs when tested and for which she needs to be reimbursed; or, it might be viewed as the incentive for $i$ to participate in the experiment; or, it might be the intrinsic value of the data. - This economic aspect has always been inherent in experimental design: experimenters often work within strict budgets and design creative incentives. However, we are not aware of a principled study of this setting from a strategic point of view. When subjects are strategic, they may have an incentive to misreport their cost, leading to the neeed for a sophisticated choice of experiments and payments. + This economic aspect has always been inherent in experimental design: experimenters often work within strict budgets and design creative incentives. However, we are not aware of a principled study of this setting from a strategic point of view. When subjects are strategic, they may have an incentive to misreport their cost, leading to the neeed for a sophisticated choice of experiments and payments. Arguably, user incentiviation is of particular pertinence due to the extent of statistical analysis over user data on the Internet. %, which has led to the rise of several different research efforts in studying data markets \cite{...}. Our contributions are as follows. @@ -33,7 +33,7 @@ We formulate the problem of experimental design subject to a given budget, in th In particular, we formulate the {\em Strategic Experimental Design Problem} (\SEDP) as follows: the experimenter \E\ wishes to find set $S$ of subjects to maximize \begin{align}V(S) = \log\det\Big(I_d+\sum_{i\in S}x_i\T{x_i}\Big) \label{obj}\end{align} subject to a budget constraint $\sum_{i\in S}c_i\leq B$, where $B$ is \E's budget, as well as the usual constraints of truthfulness and individual rationality. %, and other {\em strategic constraints} we don't list here. -The objective function, which is the key, is formally obtained by optimizing the information gain in $\beta$ when the latter is learned through linear regression, and is known as the $D$-optimality criterion in experimental design literature~\cite{pukelsheim2006optimal,atkinson2007optimum}. +The objective function, which is the key, is formally obtained by optimizing the information gain in $\beta$ when the latter is learned through linear regression, and is related to the so-called $D$-optimality criterion~\cite{pukelsheim2006optimal,atkinson2007optimum}. \item We present the first known polynomial time truthful mechanism for \SEDP{}, yielding a constant factor ($\approx 12.98$) approximation. In contrast to this, we show that no truthful, budget-feasible algorithms are possible for \SEDP{} within a factor 2 approximation. \end{itemize} |
