diff options
Diffstat (limited to 'problem.tex')
| -rw-r--r-- | problem.tex | 30 |
1 files changed, 22 insertions, 8 deletions
diff --git a/problem.tex b/problem.tex index 81b7656..90203c6 100644 --- a/problem.tex +++ b/problem.tex @@ -13,20 +13,34 @@ $y_S=[y_i]_{i\in S}\in\reals^{|S|}$ the observed measurements, \begin{align*} \hat{\beta} &=\max_{\beta\in\reals^d}\prob(y_S;\beta) =\argmin_{\beta\in\reals^d } \sum_{i\in S}(\T{\beta}x_i-y_i)^2 \\ & = (\T{X_S}X_S)^{-1}X_S^Ty_S\end{align*} %The estimator $\hat{\beta}$ is unbiased, \emph{i.e.}, $\expt{\hat{\beta}} = \beta$ (where the expectation is over the noise variables $\varepsilon_i$). Furthermore, $\hat{\beta}$ is a multidimensional normal random variable with mean $\beta$ and covariance matrix $(X_S\T{X_S})^{-1}$. -Note that the estimator $\hat{beta}$ is a linear map of $\hat{y}_S$; as $y_S$ is a multidimensional normal r.v., so is $\hat{\beta}$. In particular, $\hat{\beta}$ has mean $\beta$ (\emph{i.e.}, it is unbianced) and covariance $(\T{X_S}X_S)^{-1}$. +Note that the estimator $\hat{\beta}$ is a linear map of $y_S$; as $y_S$ is a multidimensional normal r.v., so is $\hat{\beta}$ (the randomness coming from the noise terms $\varepsilon_i$). In particular, $\hat{\beta}$ has mean $\beta$ (\emph{i.e.}, it is an \emph{unbianced estimator}) and covariance $(\T{X_S}X_S)^{-1}$. -The above covariance matrix is used in experimental design to determine how informative a set of experiments $S$ is. Though many different measures of information exist~(see \cite{pukelsheim2006optimal}), the most commonly used is the following: %so-called \emph{D-optimality criterion}: %which yields the following optimization problem -\begin{align*} -\text{Maximize}\quad V_D(S) &= \frac{1}{2}\log\det \T{X_S}X_S \\ -\text{subj. to}\quad |S|&\leq k -\end{align*} -The objective of the above optimization, called the \emph{D-optimality criterion}, is equal (up to a costant) to the negative of the entropy of $\hat{\beta}$. Selecting a set of experiments $S$ that maximizes $V_D(S)$ is equivalent to finding the set of experiments that minimizes the uncertainty on $\beta$, as captured by the entropy of its estimator. +Let $V:2^\mathcal{N}\to\reals$ be a value function, quantifying how informative a set of experiments $S$ is in estimating $\beta$. The standard optimal experimental design problem amounts to finding a set $S$ that maximizes $V(S)$ subject to the constraint $|S|\leq k$. + +There is a variety of different value functions used in experimental design\cite{pukelsheim2006optimal}. Almost all capture this through some property the covariance $(\T{X_S}X_S)^{-1}$ of the estimator $\hat{\beta}$. Due to its relationship to entropy, a most commonly used is the \emph{$D$-optimality criterion}: %which yields the following optimization problem +\begin{align} + V_D(S) &= \frac{1}{2}\log\det \T{X_S}X_S \label{dcrit} %\\ +\end{align} +As $\hat{\beta}$ is a multidimensional normal random variable, the $D$-optimality criterion is equal (up to a costant) to the negative of the entropy of $\hat{\beta}$. Hence, selecting a set of experiments $S$ that maximizes $V_D(S)$ is equivalent to finding the set of experiments that minimizes the uncertainty on $\beta$, as captured by the entropy of its estimator. + +%As discussed in the next section, in this paper, we work with a modified measure of information function, namely +%\begin{align} +%V(S) & = \frac{1}{2} \log\det \left(I + \T{X_S}X_S\right) +%\end{align} +%There are several reasons + + +\subsection{Budget Feasible Mechanism Design} +In this paper, we approach the problem of optimal experimental design from the perspective of \emph{a budget feasible reverse auction} \cite{singer-mechanisms}. In particular, we assume that each experiment $i\in \mathcal{N}$ is associated with a cost $c_i$, that the experimenter needs to pay in order to conduct the experiment. The experimenter has a budget $B\in\reals_+$. In the \emph{full information case}, the costs are common knowledge; optimal design in this context amounts to selecting a set $S$ maximizing the value $V(S)$ subject to the constraint $\sum_{i\in S} c_i\leq B$. +As in \cite{singer-mechanisms,chen}, we focus in a \emph{strategic scenario}: experiment $i$ corresponds to a \emph{strategic agent}, whose cost $c_i$ is private. For example, each $i$ may correspond to a human participant; the feature vector $x_i$ may correspond to a normalized vector of her age, weight, gender, income, \emph{etc.}, and the measurement $y_i$ may capture some biometric information (\emph{e.g.}, her red cell blood count, a genetic marker, etc.). The cost $c_i$ is the amount the participant deems sufficient to incentivize her participation in the study. -\subsection{Budgeted Mechanism Design} +A mechanism $\mathcal{M} = (f,p)$ comprises (a) an \emph{allocation function} $f:\reals_+^n \to 2^\mathcal{N}$ and (b) a \emph{payment function} $p:\reals_+^n\to \reals_+^n$. The allocation function determines the set $S\subset \mathcal{N}$ of experiments to be conducted. The payment function returns a vector of payments $[p_i]_{i\in \mathcal{N}}$. As in \cite{singer-mechanisms, chen}, we study mechanisms that are normalized ($i\notin S$ implies $p_i=0$), individually rational ($p_i\geq c_i$) and have no positive transfers $p_i\geq 0$. +TODO: truthful, computationally efficient, budget feasible, approximation to V(S) +TODO: Myerson's Theorem \begin{comment} \subsection{Experimental Design} |
