diff options
Diffstat (limited to 'intro.tex')
| -rw-r--r-- | intro.tex | 11 |
1 files changed, 5 insertions, 6 deletions
@@ -11,9 +11,9 @@ known to the experimenter. Typically, \E\ has a hypothesis on the relationship between $x_i$'s and $y_i$'s. Due to its simplicity, as well as its ubiquity in statistical analysis, a large body of work has focused on linear hypotheses: \emph{i.e.}, it is assumed that there exists a $\beta\in\reals^d$ such that $$y_i = \T{\beta} x_i+\varepsilon_i,$$ for all $i\in \{1,\ldots,n\},$ where $\varepsilon_i$ are zero-mean, i.i.d.~random variables. Conducting the experiments and obtaining the measurements $y_i$ lets \E\ estimate $\beta$, \emph{e.g.}, through linear regression. %, \emph{i.e.}, the model underlying the data, and the experimenter's goal is to obtain such an estimate as accurately as possible. %The goal of experimental design amounts to determining which subjects to experiment upon to produce the best possible such estimate. -The above experimental design scenario has many applications. Regression over personal data collected through surveys or experimentation is the cornerstone of marketing research, as well as research in a variety of experimental sciences such as medicine and sociology. Crucially, the statistical analysis of user data is also a widely spread practice among Internet companies, which routinely use machine learning techniques over vast records of user data to perform inference and classification tasks integral to their daily operations. +The above experimental design scenario has many applications. Regression over personal data collected through surveys or experimentation is the cornerstone of marketing research, as well as research in a variety of experimental sciences such as medicine and sociology. Crucially, statistical analysis of user data is also a widely spread practice among Internet companies, which routinely use machine learning techniques over vast records of user data to perform inference and classification tasks integral to their daily operations. Beyond linear regression, there is a rich literature about estimation procedures, as well as for means of quantifying the quality of the produced estimate~\cite{pukelsheim2006optimal}. There is also an extensive theory on how to select subjects -if \E\ can conduct only a limited number of experiments, so the estimation process returns $\beta$ +if \E\ can conduct only a limited number of experiments, so the estimation process returns a $\beta$ that approximates the true parameter of the underlying population \cite{ginebra2007measure,le1996comparison,chaloner1995bayesian,boyd2004convex}. We depart from this classical setup by viewing experimental design in a strategic setting, and by studying budgeted mechanism design issues. @@ -32,14 +32,13 @@ Our contributions are as follows. We initiate the study of experimental design problem in presence of budgets and strategic subjects. %formulate the problem of experimental design subject to a given budget, in the presence of strategic agents who may lie about their costs. %In particular, we focus on linear regression. This is naturally viewed as a budget feasible mechanism design problem, in which the objective function %is sophisticated and %is related to the covariance of the $x_i$'s. - In particular, we formulate the {\em Strategic Experimental Design Problem} (\SEDP) as follows: the experimenter \E\ wishes to find set $S$ of subjects to maximize + In particular, we formulate the {\em Experimental Design Problem} (\SEDP) as follows: the experimenter \E\ wishes to find set $S$ of subjects to maximize \begin{align}V(S) = \log\det\Big(I_d+\sum_{i\in S}x_i\T{x_i}\Big) \label{obj}\end{align} -subject to a budget constraint $\sum_{i\in S}c_i\leq B$, where $B$ is \E's budget, as well as the usual constraints of truthfulness and individual rationality. -This is naturally viewed as a \emph{budget feasible mechanism design} problem, as introduced by \citeN{singer-mechanisms}. +subject to a budget constraint $\sum_{i\in S}c_i\leq B$, where $B$ is \E's budget. When subjects are strategic, the above problem can be naturally approached as a \emph{budget feasible mechanism design} problem, as introduced by \citeN{singer-mechanisms}. %, and other {\em strategic constraints} we don't list here. \smallskip -The objective function, which is the key, is formally obtained by optimizing the information gain in $\beta$ when the latter is learned through linear regression, and is related to the so-called $D$-optimality criterion~\cite{pukelsheim2006optimal,atkinson2007optimum}. +The objective function, which is the key, is formally obtained by optimizing the information gain in $\beta$ when the latter is learned through ridge regression, and is related to the so-called $D$-optimality criterion~\cite{pukelsheim2006optimal,atkinson2007optimum}. \item We present a polynomial time, truthful mechanism for \SEDP{}, yielding a constant factor ($\approx 12.98$) approximation to the optimal value of \eqref{obj}. In contrast to this, we show that no truthful, budget-feasible mechanisms are possible for \SEDP{} within a factor 2 approximation. |
