diff options
| -rw-r--r-- | abstract.tex | 2 | ||||
| -rw-r--r-- | intro.tex | 11 | ||||
| -rw-r--r-- | main.tex | 2 | ||||
| -rw-r--r-- | problem.tex | 2 |
4 files changed, 7 insertions, 10 deletions
diff --git a/abstract.tex b/abstract.tex index 2f33699..bf078d0 100644 --- a/abstract.tex +++ b/abstract.tex @@ -18,7 +18,7 @@ We initiate the study of budgeted mechanisms for experimental design. In this se Each subject $i$ declares an associated cost $c_i >0$ to be part of the experiment, and must be paid at least her cost. In particular, the {\em Experimental Design Problem} (\SEDP) is to find a set $S$ of subjects for the experiment that maximizes $V(S) = \log\det(I_d+\sum_{i\in S}x_i\T{x_i})$ under the constraint $\sum_{i\in S}c_i\leq B$; our objective function corresponds to the information gain in parameter $\beta$ that is learned through linear regression methods, and is related to the so-called $D$-optimality criterion. Further, the subjects are \emph{strategic} and may lie about their costs. Thus, we need to design a mechanism for \SEDP{} with suitable properties. -We present a deterministic, polynomial time, $\delta$-truthful, budget feasible mechanism for \SEDP{}. +We present a deterministic, polynomial time, budget feasible mechanism scheme, that is approximately truthful and yields a constant factor approximation to \EDP. In particular, for any small $\delta>0$ and $\varepsilon>0$, we can construct a $(12.98\,,\varepsilon)$-approximate mechanism that is $\delta$-truthful and runs in polynomial time in both $n$ and $\log\log\frac{B}{\epsilon\delta}$. By applying previous work on budget feasible mechanisms with submodular objective, one could {\em only} have derived either an exponential time deterministic mechanism or a randomized polynomial time mechanism. Our mechanism yields a constant factor ($\approx 12.68$) approximation, and we show that no truthful, budget-feasible algorithms are possible within a factor $2$ approximation. We also show how to generalize our approach to a wide class of learning problems, beyond linear regression. @@ -1,7 +1,3 @@ -%The statistical analysis of user data is a widely spread practice among Internet companies, which routinely use machine learning techniques over vast records of user data to perform inference and classification tasks integral to their daily operations. Statistical analysis of personal data collected through surveys or experimentation is also the cornerstone of marketing research, as well as research in a variety of experimental sciences such as medicine and sociology. - -%This state of affairs has motivated several recent studies of \emph{data markets}, in which an analyst wishes to purchase data from a set of users \cite{...}. Eah user discloses her data to the analyst only if she receives an appropriate compensation. Assuming that the analyst has a limited budget, a natural question to ask is how she should allocate her budget across different users. - In the classic setting of experimental design \cite{pukelsheim2006optimal,atkinson2007optimum}, an {\em experimenter} \E\ has access to a population of $n$ potential experiment subjects. Each subject $i\in \{1,\ldots,n\}$ is associated with a set of parameters (or features) $x_i\in \reals^d$, @@ -20,8 +16,9 @@ We depart from this classical setup by viewing experimental design in a strategi In our setting, experiments cannot be manipulated and hence measurements are reliable. %\footnote{Thus, the experiments of our interest are statistically significant ones where each experiment provides a reliable outcome.} \E{} has a total budget of $B$ to conduct all the experiments. There is a cost $c_i$ associated with experimenting on -subject $i$ which varies from subject to subject. This cost $c_i$ is determined by the subject $i$: it may be viewed as the -cost $i$ incurs when tested and for which she needs to be reimbursed; or, it might be viewed as the incentive for $i$ to participate in the experiment; or, it might be the intrinsic worth of the data to the subject. The economic aspect of paying subjects has always been inherent in experimental design: experimenters often work within strict budgets and design creative incentives. Subjects often negotiate better incentives or higher payments. +subject $i$ which varies from subject to subject. This cost $c_i$ is determined by the subject $i$ and reported to \E; subjects are strategic and may misreport these costs. Intuitively, $c_i$ may be viewed as the +cost $i$ incurs when tested and for which she needs to be reimbursed; or, it might be viewed as the incentive for $i$ to participate in the experiment; or, it might be the intrinsic worth of the data to the subject. + The economic aspect of paying subjects has always been inherent in experimental design: experimenters often work within strict budgets and design creative incentives. Subjects often negotiate better incentives or higher payments. However, we are not aware of a principled study of this setting from a strategic point of view, when subjects declare their costs and therefore determine their payment. Such a setting is increasingly realistic, given the growth of these experiments over the Internet. % and associated data markets. % When subjects are strategic, they may have an incentive to misreport their cost, leading to the need for a sophisticated choice of experiments and payments. Arguably, user incentiviation is of particular pertinence due to the extent of statistical analysis over user data on the Internet. %, which has led to the rise of several different research efforts in studying data markets \cite{...}. @@ -41,7 +38,7 @@ subject to a budget constraint $\sum_{i\in S}c_i\leq B$, where $B$ is \E's budge \smallskip The objective function, which is the key, is formally obtained by optimizing the information gain in $\beta$ when the latter is learned through ridge regression, and is related to the so-called $D$-optimality criterion~\cite{pukelsheim2006optimal,atkinson2007optimum}. \item -We present a polynomial time, $\delta$-truthful mechanism for \SEDP{}, yielding a constant factor ($\approx 12.98$) approximation to the optimal value of \eqref{obj}. In contrast to this, we show that no truthful, budget-feasible mechanisms are possible for \SEDP{} within a factor 2 approximation. +We present a polynomial time mechanism scheme for \SEDP{} that is approximately truthful and yields a constant factor approximation to the optimal value of \eqref{obj}. In particular, for any small $\delta>0$ and $\varepsilon>0$, we can construct a $(12.98\,,\varepsilon)$-approximate mechanism that is $\delta$-truthful and runs in polynomial time in both $n$ and $\log\log\frac{B}{\epsilon\delta}$. In contrast to this, we show that no truthful, budget-feasible mechanisms are possible for \SEDP{} within a factor 2 approximation. \smallskip We note that the objective \eqref{obj} is submodular. Using this fact, applying previous results on budget feasible mechanism design under general submodular objectives~\cite{singer-mechanisms,chen} would yield either a deterministic, truthful, constant-approximation mechanism that requires exponential time, or a non-deterministic, (universally) truthful, poly-time mechanism that yields a constant approximation ratio only \emph{in expectation} (\emph{i.e.}, its approximation guarantee for a given instance may in fact be unbounded). @@ -67,7 +67,7 @@ c_{-i})$ implies $i\in S(c_i', c_{-i})$, and (b) \end{lemma} Lemma~\ref{thm:myerson-variant} allows us to incorporate our relaxation in the above framework, yielding the following theorem: \begin{theorem}\label{thm:main} - For any $\delta>0$, and any $\epsilon>0$, there exists a $\delta$-truthful, individually rational + For any $\delta\in(0,1]$, and any $\epsilon\in (0,1]$, there exists a $\delta$-truthful, individually rational and budget feasible mechanim for \EDP{} that runs in time $O\big(poly(n, d, \log\log\frac{B}{b\varepsilon\delta})\big)$ and returns diff --git a/problem.tex b/problem.tex index 8d5b262..1ac1a38 100644 --- a/problem.tex +++ b/problem.tex @@ -143,7 +143,7 @@ $p:\reals_+^n\to \reals_+^n$, determining the payments $[p_i(c)]_{i\in \mathcal{ We seek mechanisms that are \emph{normalized} (unallocated experiments receive zero payments), \emph{individually rational} (payments for allocated experiments exceed costs), have \emph{no positive transfers} (payments are non-negative), and are \emph{budget feasible} (the sum of payments does not exceed the budget $B$). We relax the notion of truthfulness to \emph{$\delta$-truthfulness}, requiring that reporting one's true cost is an \emph{almost-dominant} strategy: no subject increases their utility by reporting a cost that differs more than $\delta>0$ from their true cost. Under this definition, a mechanism is truthful if $\delta=0$. -In addition, we would like the allocation $S(c)$ to be of maximal value; however, $\delta$-truthfulness, as well as the hardness of \EDP{}, preclude achieving this goal. Hence, we seek mechanisms with that yield a \emph{constant approximation ratio}, \emph{i.e.}, there exists $\alpha\geq 1$ s.t.~$OPT \leq \alpha V(S(c))$, and are \emph{computationally efficient}, in that $S$ and $p$ can be computed in polynomial time. +In addition, we would like the allocation $S(c)$ to be of maximal value; however, $\delta$-truthfulness, as well as the hardness of \EDP{}, preclude achieving this goal. Hence, we seek mechanisms with that are \emph{$(\alpha,\beta)$-approximate}, \emph{i.e.}, there exist $\alpha\geq 1$ and $\beta>0$ s.t.~$OPT \leq \alpha V(S(c))+\beta$, and are \emph{computationally efficient}, in that $S$ and $p$ can be computed in polynomial time. We note that the greedy algorithm \eqref{eq:max-algorithm} breaks truthfulness. Though this is not true for all submodular functions (see, \emph{e.g.}, \cite{singer-mechanisms}), it is true for the objective of \EDP{}: we show this in Appendix~\ref{sec:non-monotonicity}. This motivates our study of more complex mechanisms. |
