diff options
| -rw-r--r-- | intro.tex | 13 | ||||
| -rw-r--r-- | problem.tex | 81 |
2 files changed, 67 insertions, 27 deletions
@@ -20,9 +20,9 @@ In our setting, experiments cannot be manipulated and hence measurements are rel However, there is a cost $c_i$ associated with experimenting on subject $i$ which varies from subject to subject. This may be viewed as the -cost subject $i$ incurs when tested and for which she needs to be reimbursed; or, it might be viewed as the incentive for $i$ to participate in the experiment; or, it might be the intrinsic value of the data. +cost subject $i$ incurs when tested and for which she needs to be reimbursed; or, it might be viewed as the incentive for $i$ to participate in the experiment; or, it might be the intrinsic worth of the data to the user. - This economic aspect has always been inherent in experimental design: experimenters often work within strict budgets and design creative incentives. However, we are not aware of a principled study of this setting from a strategic point of view. When subjects are strategic, they may have an incentive to misreport their cost, leading to the neeed for a sophisticated choice of experiments and payments. Arguably, user incentiviation is of particular pertinence due to the extent of statistical analysis over user data on the Internet. %, which has led to the rise of several different research efforts in studying data markets \cite{...}. + This economic aspect has always been inherent in experimental design: experimenters often work within strict budgets and design creative incentives. However, we are not aware of a principled study of this setting from a strategic point of view. When subjects are strategic, they may have an incentive to misreport their cost, leading to the need for a sophisticated choice of experiments and payments. Arguably, user incentiviation is of particular pertinence due to the extent of statistical analysis over user data on the Internet. %, which has led to the rise of several different research efforts in studying data markets \cite{...}. Our contributions are as follows. @@ -36,16 +36,19 @@ subject to a budget constraint $\sum_{i\in S}c_i\leq B$, where $B$ is \E's budge The objective function, which is the key, is formally obtained by optimizing the information gain in $\beta$ when the latter is learned through linear regression, and is related to the so-called $D$-optimality criterion~\cite{pukelsheim2006optimal,atkinson2007optimum}. \item We present the first known polynomial time truthful mechanism for \SEDP{}, yielding a constant factor ($\approx 12.98$) approximation. In contrast to this, we show that no truthful, budget-feasible algorithms are possible for \SEDP{} within a factor 2 approximation. +We note that the objective \eqref{obj} is submodular. Using this fact, applying previous results for budget feasible mechanisms under general submodular objectives would yield either a truthful deterministic mechanism that requires exponential time, or a poly-time algorithm that is not deterministic~\cite{singer-mechanisms,chen}. \end{itemize} -We note that the objective \eqref{obj} is submodular. Using this fact, previous work by \citeN{singer-mechanisms} and \citeN{chen} on budget feasible mechanisms for submodular maximization yields a $8.34$-approximate deterministic mechanism for \SEDP{} that is not polynomial time, unless P=NP. Alternatively, previous work by \citeN{chen} on general submodular objectives also yields a randomized, 7.91-approximate polynomial time mechanism for \SEDP{} that is however \emph{universally truthful}, \emph{i.e.}, it is sampled from a distribution among truthful mechanisms. In contrast, our result is the first deterministic constant factor approximation mechanism for \SEDP{} that is both polytime and truthful. + +% budget feasible mechanisms for submodular maximization yields a $8.34$-approximate deterministic mechanism for \SEDP{} that is not polynomial time, unless P=NP. Alternatively, previous work by \citeN{chen} on general submodular objectives also yields a randomized, 7.91-approximate polynomial time mechanism for \SEDP{} that is however \emph{universally truthful}, \emph{i.e.}, it is sampled from a distribution among truthful mechanisms. In contrast, our result is the first deterministic constant factor approximation mechanism for \SEDP{} that is both polytime and truthful. % either a randomized, 7.91-approximate polynomial time mechanism for maximizing a general submodular function that is universally truthful, \emph{i.e.}, it is sampled from a distribution among truthful mechanisms. %There are several recent results in budget feasible %mechanisms~\cite{singer-mechanisms,chen,singer-influence,bei2012budget,dobz2011-mechanisms}, and some apply to the submodular optimization in %\EDP. %There is a randomized, 7.91-approximate polynomial time mechanism for maximizing a general submodular function that is universally truthful, \emph{i.e.}, it is sampled from a distribution among truthful mechanisms. Also, there is a $8.34$-approximate exponential time deterministic mechanism. %There are however no known deterministic, truthful, polynomial time mechanisms for general submodular functions. -Though such mechanisms were known to exist for combinatorial problems with specific submodular objectives such as \textsc{Knapsack} or \textsc{Coverage}~\cite{singer-mechanisms,chen, singer-influence}, these do not readily apply to the more complicated linear-algebraic objective function \eqref{obj} of \SEDP. + +%Though such mechanisms were known to exist for combinatorial problems with specific submodular objectives such as \textsc{Knapsack} or \textsc{Coverage}~\cite{singer-mechanisms,chen, singer-influence}, these do not readily apply to the more complicated linear-algebraic objective function \eqref{obj} of \SEDP. %{\bf S+T: could we verify that the above sentence is correct in its implication?} From a technical perspective, we present a convex relaxation of \eqref{obj}, and show that it is within a constant factor from the so-called multi-linear relaxation of \eqref{obj}. This allows us to adopt the approach followed by prior work in budget feasible mechanisms by Chen \emph{et al.}~\cite{chen} and Singer~\cite{singer-influence}. %{\bf FIX the last sentence} @@ -54,7 +57,7 @@ From a technical perspective, we present a convex relaxation of \eqref{obj}, and %Our approach to mechanisms for experimental design --- by % optimizing the information gain in parameters like $\beta$ which are estimated through the data analysis process --- is general. We give examples of this approach beyond linear regression to a general class that includes logistic regression and learning binary functions, and show that the corresponding budgeted mechanism design problem is also expressed through a submodular optimization. Hence, prior work \cite{chen,singer-mechanisms} immediately applies, and gives randomized, universally truthful, polynomial time, constant factor approximation mechanisms for problems in this class. Getting deterministic, truthful, polynomial time mechanisms with a constant approximation factor for this class or specific problems in it, like we did for \EDP, remains an open problem. -In what follows, we describe related work in Section~\ref{sec:related}. We briefly review experimental design and budget feasible mechanisms in Section~\ref{sec:peel} and define \EDP\ formally. In Section~\ref{sec:main} we present our mechanism for \EDP\ and prove our main results. The present applications of our general framework are presented in Section~\ref{sec:ext}. +In what follows, we describe related work in Section~\ref{sec:related}. We briefly review experimental design and budget feasible mechanisms in Section~\ref{sec:peel} and define \SEDP\ formally. In Section~\ref{sec:main} we present our mechanism for \SEDP\ and prove our main results. A generalization of our framework to machine learning tasks beyond linear regression is presented in Section~\ref{sec:ext}. \junk{ diff --git a/problem.tex b/problem.tex index 546500d..6027379 100644 --- a/problem.tex +++ b/problem.tex @@ -1,50 +1,87 @@ \label{sec:prel} -\subsection{Experimental Design} - -The theory of experimental design \cite{pukelsheim2006optimal,atkinson2007optimum} studies how an experimenter \E\ should select the parameters of a set of experiments she is about to conduct. In general, the optimality of a particular design depends on the purpose of the experiment, \emph{i.e.}, the quantity \E\ is trying to learn or the hypothesis she is trying to validate. Due to their ubiquity in statistical analysis, a large literature on the subject focuses on learning \emph{linear models}, where \E\ wishes to fit a linear function to the data she has collected. +\subsection{Linear Regression and Experimental Design} -More precisely, putting cost considerations aside, suppose that \E\ wishes to conduct $k$ among $n$ possible experiments. Each experiment $i\in\mathcal{N}\defeq \{1,\ldots,n\}$ is associated with a set of parameters (or features) $x_i\in \reals^d$, normalized so that $\|x_i\|_2\leq 1$. Denote by $S\subseteq \mathcal{N}$, where $|S|=k$, the set of experiments selected; upon its execution, experiment $i\in S$ reveals an output variable (the ``measurement'') $y_i$, related to the experiment features $x_i$ through a linear function, \emph{i.e.}, +The theory of experimental design \cite{pukelsheim2006optimal,atkinson2007optimum,chaloner1995bayesian} considers the following formal setting. % studies how an experimenter \E\ should select the parameters of a set of experiments she is about to conduct. In general, the optimality of a particular design depends on the purpose of the experiment, \emph{i.e.}, the quantity \E\ is trying to learn or the hypothesis she is trying to validate. Due to their ubiquity in statistical analysis, a large literature on the subject focuses on learning \emph{linear models}, where \E\ wishes to fit a linear function to the data she has collected. +Putting cost considerations aside, suppose that an experimenter \E\ wishes to conduct $k$ among $n$ possible experiments. Each experiment $i\in\mathcal{N}\defeq \{1,\ldots,n\}$ is associated with a set of parameters (or features) $x_i\in \reals^d$, normalized so that $\|x_i\|_2\leq 1$. Denote by $S\subseteq \mathcal{N}$, where $|S|=k$, the set of experiments selected; upon its execution, experiment $i\in S$ reveals an output variable (the ``measurement'') $y_i$, related to the experiment features $x_i$ through a linear function, \emph{i.e.}, \begin{align}\label{model} \forall i\in\mathcal{N},\quad y_i = \T{\beta} x_i + \varepsilon_i \end{align} where $\beta$ is a vector in $\reals^d$, commonly referred to as the \emph{model}, and $\varepsilon_i$ (the \emph{measurement noise}) are independent, normally distributed random variables with mean 0 and variance $\sigma^2$. -The purpose of these experiments is to allow \E\ to estimate the model $\beta$. In particular, under \eqref{model}, the maximum likelihood estimator of $\beta$ is the \emph{least squares} estimator: for $X_S=[x_i]_{i\in S}\in \reals^{|S|\times d}$ the matrix of experiment features and -$y_S=[y_i]_{i\in S}\in\reals^{|S|}$ the observed measurements, -\begin{align} \hat{\beta} &=\max_{\beta\in\reals^d}\prob(y_S;\beta) =\argmin_{\beta\in\reals^d } \sum_{i\in S}(\T{\beta}x_i-y_i)^2 \nonumber\\ -& = (\T{X_S}X_S)^{-1}X_S^Ty_S\label{leastsquares}\end{align} +The purpose of these experiments is to allow \E\ to estimate the model $\beta$. In particular, + assume that the experimenter \E\ has a {\em prior} +distribution on $\beta$, \emph{i.e.}, $\beta$ has a multivariate normal prior +with zero mean and covariance $\sigma^2R^{-1}\in \reals^{d^2}$ (where $\sigma^2$ is the noise variance). +Then, \E\ estimates $\beta$ through \emph{maximum a posteriori estimation}: \emph{i.e.}, finding the parameter which maximizes the posterior distribution of $\beta$ given the observations $y_S$. Under the linearity assumption \eqref{model} and the Gaussian prior on $\beta$, maximum a posteriori estimation leads to the following maximization \cite{hastie}: +\begin{displaymath} + \hat{\beta} = \argmin_{\beta\in\reals^d} \big(\sum_{i\in S} (y_i - \T{\beta}x_i)^2 + + \T{\beta}R\beta\big) = (R+\T{X_S}X_S)^{-1}X_S^Ty_S \label{ridge} +\end{displaymath} +where $X_S=[x_i]_{i\in S}\in \reals^{|S|\times d}$ is the matrix of experiment features and +$y_S=[y_i]_{i\in S}\in\reals^{|S|}$ the observed measurements. +This optimization, commonly known as \emph{ridge regression}, includes an additional quadratic penalty term compared to the standard least squares estimation. +% under \eqref{model}, the maximum likelihood estimator of $\beta$ is the \emph{least squares} estimator: for $X_S=[x_i]_{i\in S}\in \reals^{|S|\times d}$ the matrix of experiment features and +%$y_S=[y_i]_{i\in S}\in\reals^{|S|}$ the observed measurements, +%\begin{align} \hat{\beta} &=\max_{\beta\in\reals^d}\prob(y_S;\beta) =\argmin_{\beta\in\reals^d } \sum_{i\in S}(\T{\beta}x_i-y_i)^2 \nonumber\\ +%& = (\T{X_S}X_S)^{-1}X_S^Ty_S\label{leastsquares}\end{align} %The estimator $\hat{\beta}$ is unbiased, \emph{i.e.}, $\expt{\hat{\beta}} = \beta$ (where the expectation is over the noise variables $\varepsilon_i$). Furthermore, $\hat{\beta}$ is a multidimensional normal random variable with mean $\beta$ and covariance matrix $(X_S\T{X_S})^{-1}$. - Note that the estimator $\hat{\beta}$ is a linear map of $y_S$; as $y_S$ is a multidimensional normal r.v., so is $\hat{\beta}$ (the randomness coming from -the noise terms $\varepsilon_i$). In particular, $\hat{\beta}$ has mean $\beta$ -(\emph{i.e.}, it is an \emph{unbiased estimator}) and covariance -$(\T{X_S}X_S)^{-1}$. +the noise terms $\varepsilon_i$). In particular, $\hat{\beta}$ has %mean $\beta$% (\emph{i.e.}, it is an \emph{unbiased estimator}) and + covariance +$(R+\T{X_S}X_S)^{-1}$. Let $V:2^\mathcal{N}\to\reals$ be a \emph{value function}, quantifying how informative a set of experiments $S$ is in estimating $\beta$. The standard optimal experimental design problem amounts to finding a set $S$ that maximizes $V(S)$ subject to the constraint $|S|\leq k$. -A variety of different value functions are used in experimental design~\cite{pukelsheim2006optimal}; almost all make use of the covariance $(\T{X_S}X_S)^{-1}$ of the estimator $\hat{\beta}$. A value function preferred because of its relationship to entropy is the \emph{$D$-optimality criterion}: %which yields the following optimization problem +A variety of different value functions are used in experimental design~\cite{pukelsheim2006optimal}; almost all make use of the covariance $(R+\T{X_S}X_S)^{-1}$ of the estimator $\hat{\beta}$. A value function that has natural advantages is the \emph{information gain}: %\emph{$D$-optimality criterion}: %which yields the following optimization problem \begin{align} - V(S) &= \frac{1}{2}\log\det \T{X_S}X_S \label{dcrit} %\\ + V(S)= I(\beta;y_S) = \entropy(\beta)-\entropy(\beta\mid y_S). \label{informationgain} \end{align} -defined as $-\infty$ when $\mathrm{rank}(\T{X_S}X_S)<d$. -As $\hat{\beta}$ is a multidimensional normal random variable, the -$D$-optimality criterion is equal (up to a constant) to the negative of the -entropy of $\hat{\beta}$. Hence, selecting a set of experiments $S$ that +which is the entropy reduction on $\beta$ after the revelation of $y_S$ (also known as the mutual information between $Y_S$ and $\beta$). +Hence, selecting a set of experiments $S$ that maximizes $V(S)$ is equivalent to finding the set of experiments that minimizes -the uncertainty on $\beta$, as captured by the entropy of its estimator. +the uncertainty on $\beta$, as captured by the entropy reduction of its estimator. +Under the linear model \eqref{model}, and the Gaussian prior, the information gain takes the form: +\begin{align} + V(S) &= \frac{1}{2}\log\det(R+ \T{X_S}X_S) \label{dcrit} %\\ +\end{align} +which is indeed a function of the covariance matrix $(R+\T{X_S}X_S)^{-1}$. +%defined as $-\infty$ when $\mathrm{rank}(\T{X_S}X_S)<d$. +%As $\hat{\beta}$ is a multidimensional normal random variable, the +%$D$-optimality criterion is equal (up to a constant) to the negative of the +%entropy of $\hat{\beta}$. Hence, selecting a set of experiments $S$ that +%maximizes $V(S)$ is equivalent to finding the set of experiments that minimizes +%the uncertainty on $\beta$, as captured by the entropy of its estimator. %As discussed in the next section, in this paper, we work with a modified measure of information function, namely %\begin{align} %V(S) & = \frac{1}{2} \log\det \left(I + \T{X_S}X_S\right) %\end{align} %There are several reasons - Note that \eqref{dcrit} is a submodular set function, \emph{i.e.}, $V(S)+V(T)\geq V(S\cup T)+V(S\cap T)$ for all $S,T\subseteq \mathcal{N}$; it is also monotone, \emph{i.e.}, $V(S)\leq V(T)$ for all $S\subset T$. %In addition, the maximization of convex relaxations of this function is a well-studied problem \cite{boyd}. +Our analysis will focus on the case of a \emph{homotropic} prior, in which the prior covariance is the identity matrix, \emph{i.e.}, $$R=I_d\in \reals^{d\times d}.$$ Intuitively, this corresponds to the simplest case of a prior, in which no direction of $\beta$ is a priori favored; equivalently it also corresponds to solving a ridge regression problem with penalty term $\norm{\beta}_2^2$. + +\subsection{Budgeted-Feasible Experimental Design: Full Information Case} +We depart from the above classic experimental design setting by assuming that each experiment is associated with a cost $c_i\in\reals_+$. Moreover, the experimenter $\E$ is limited by a budget $B\in \reals_+$. In the full-information case, the experiment costs are common knowledge and the optimization problem that the experimenter wishes to solve is: +\begin{center} +\textsc{ExperimentalDesignProblem} (EDP) +\begin{subequations} +\begin{align} +\text{Maximize}\quad V(S) &= \log\det(I_d+\T{X_S}X_S) \label{modified} \\ +\text{subject to}\quad \sum_{i\in S} c_i&\leq B +\end{align}\label{edp} +\end{subequations} +\end{center} +\EDP{} \emph{and} the corresponding problem with the more general objective \eqref{dcrit} are NP-hard; to see this, note that \textsc{Knapsack} reduces to EDP with dimension $d=1$ by mapping the weight of each item $w_i$ to an experiment with $x_i=w_i$. Moreover, \eqref{modified} is submodular, monotone and satisfies $V(\emptyset) = 0$. + + Note that \eqref{dcrit} is a submodular set function, \emph{i.e.}, +$V(S)+V(T)\geq V(S\cup T)+V(S\cap T)$ for all $S,T\subseteq \mathcal{N}$; it is also monotone, \emph{i.e.}, $V(S)\leq V(T)$ for all $S\subset T$, and non-negative, as $V(S)\geq 0$ for all $S\subset\mathcal{N}$. +%Finally, we define the strategic version of \EDP{} (denoted by \SEDP) by adding the additional constraints of truthfulness, individual rationality, and normal payments, as described in the previous section. + -\subsection{Budget Feasible Reverse Auctions} -A \emph{budget feasible reverse auction} +\subsection{Experimental Design Problem: Strategic Case} +When agents are strategic, the experimental design problem becomes a special case of a \emph{budget feasible reverse auction} \cite{singer-mechanisms} comprises a set of items $\mathcal{N}=\{1,\ldots,n\}$ as well as a single buyer. Each item has a cost $c_i\in\reals_+$. Moreover, the buyer has a positive value function $V:2^{\mathcal{N}}\to \reals_+$, as well as a budget $B\in \reals_+$. In the \emph{full information case}, the costs $c_i$ are common knowledge; the objective of the buyer in this context is to select a set $S$ maximizing the value $V(S)$ subject to the constraint $\sum_{i\in S} c_i\leq |
