summaryrefslogtreecommitdiffstats
path: root/problem.tex
diff options
context:
space:
mode:
Diffstat (limited to 'problem.tex')
-rw-r--r--problem.tex59
1 files changed, 33 insertions, 26 deletions
diff --git a/problem.tex b/problem.tex
index 73647da..798e69a 100644
--- a/problem.tex
+++ b/problem.tex
@@ -6,30 +6,40 @@ The theory of experimental design \cite{pukelsheim2006optimal,atkinson2007optimu
Suppose that an experimenter \E\ wishes to conduct $k$ among $n$ possible
experiments. Each experiment $i\in\mathcal{N}\defeq \{1,\ldots,n\}$ is
associated with a set of parameters (or features) $x_i\in \reals^d$, normalized
-so that $b\leq \|x_i\|^2_2\leq 1,$ for some $b>0$. Denote by $S\subseteq \mathcal{N}$, where $|S|=k$, the set of experiments selected; upon its execution, experiment $i\in S$ reveals an output variable (the ``measurement'') $y_i$, related to the experiment features $x_i$ through a linear function, \emph{i.e.},
-\begin{align}\label{model}
- \forall i\in\mathcal{N},\quad y_i = \T{\beta} x_i + \varepsilon_i
-\end{align}
-where $\beta$ is a vector in $\reals^d$, commonly referred to as the \emph{model}, and $\varepsilon_i$ (the \emph{measurement noise}) are independent, normally distributed random variables with mean 0 and variance $\sigma^2$.
-
-For example, each $i$ may correspond to a human subject; the
-feature vector $x_i$ may correspond to a normalized vector of her age, weight,
-gender, income, \emph{etc.}, and the measurement $y_i$ may capture some
-biometric information (\emph{e.g.}, her red cell blood count, a genetic marker,
-etc.). The magnitude of the coefficient $\beta_i$ captures the effect that feature $i$ has on the measured variable, and its sign captures whether the correlation is positive or negative.
+so that $b\leq \|x_i\|^2_2\leq 1,$ for some $b>0$. Denote by $S\subseteq
+\mathcal{N}$, where $|S|=k$, the set of experiments selected; upon its
+execution, experiment $i\in S$ reveals an output variable (the ``measurement'')
+$y_i$, related to the experiment features $x_i$ through a linear function,
+\emph{i.e.}, $y_i = \T{\beta} x_i + \varepsilon_i$ where $\beta$ is a vector in
+$\reals^d$, commonly referred to as the \emph{model}, and $\varepsilon_i$ (the
+\emph{measurement noise}) are independent, normally distributed random
+variables with mean 0 and variance $\sigma^2$.
+For example, each $i$ may correspond to a human subject; the feature vector
+$x_i$ may correspond to a normalized vector of her age, weight, gender, income,
+\emph{etc.}, and the measurement $y_i$ may capture some biometric information
+(\emph{e.g.}, her red cell blood count, a genetic marker, etc.). The magnitude
+of the coefficient $\beta_i$ captures the effect that feature $i$ has on the
+measured variable, and its sign captures whether the correlation is positive or
+negative.
-The purpose of these experiments is to allow \E\ to estimate the model $\beta$. In particular,
- assume that the experimenter \E\ has a {\em prior}
+The purpose of these experiments is to allow \E\ to estimate the model
+$\beta$. In particular, assume that the experimenter \E\ has a {\em prior}
distribution on $\beta$, \emph{i.e.}, $\beta$ has a multivariate normal prior
-with zero mean and covariance $\sigma^2R^{-1}\in \reals^{d^2}$ (where $\sigma^2$ is the noise variance).
-Then, \E\ estimates $\beta$ through \emph{maximum a posteriori estimation}: \emph{i.e.}, finding the parameter which maximizes the posterior distribution of $\beta$ given the observations $y_S$. Under the linearity assumption \eqref{model} and the Gaussian prior on $\beta$, maximum a posteriori estimation leads to the following maximization \cite{hastie}:
+with zero mean and covariance $\sigma^2R^{-1}\in \reals^{d^2}$ (where
+$\sigma^2$ is the noise variance). Then, \E\ estimates $\beta$ through
+\emph{maximum a posteriori estimation}: \emph{i.e.}, finding the parameter
+which maximizes the posterior distribution of $\beta$ given the observations
+$y_S$. Under the linearity assumption and the Gaussian prior on $\beta$,
+maximum a posteriori estimation leads to the following maximization
+\cite{hastie}:
\begin{align}
\begin{split}
- \hat{\beta} = \argmax_{\beta\in\reals^d} \prob(\beta\mid y_S) &=\argmin_{\beta\in\reals^d} \big(\sum_{i\in S} (y_i - \T{\beta}x_i)^2
- + \T{\beta}R\beta\big)\\
- & = (R+\T{X_S}X_S)^{-1}X_S^Ty_S \label{ridge}
-\end{split}
+ \hat{\beta} = \argmax_{\beta\in\reals^d} \prob(\beta\mid y_S)
+ &=\argmin_{\beta\in\reals^d} \big(\sum_{i\in S} (y_i - \T{\beta}x_i)^2
+ + \T{\beta}R\beta\big)\\
+ & = (R+\T{X_S}X_S)^{-1}X_S^Ty_S \label{ridge}
+ \end{split}
\end{align}
where the last equality is obtained by setting $\nabla_{\beta}\prob(\beta\mid y_S)$ to zero and solving the resulting linear system; in \eqref{ridge}, $X_S\defeq[x_i]_{i\in S}\in \reals^{|S|\times d}$ is the matrix of experiment features and
$y_S\defeq[y_i]_{i\in S}\in\reals^{|S|}$ are the observed measurements.
@@ -44,15 +54,13 @@ This optimization, commonly known as \emph{ridge regression}, includes an additi
Let $V:2^\mathcal{N}\to\reals$ be a \emph{value function}, quantifying how informative a set of experiments $S$ is in estimating $\beta$. The classical experimental design problem amounts to finding a set $S$ that maximizes $V(S)$ subject to the constraint $|S|\leq k$.
A variety of different value functions are used in literature~\cite{pukelsheim2006optimal,boyd2004convex}; %; almost all make use of the covariance $(R+\T{X_S}X_S)^{-1}$ of the estimator $\hat{\beta}$.
-one that has natural advantages is the \emph{information gain}: %\emph{$D$-optimality criterion}: %which yields the following optimization problem
-\begin{align}
- V(S)= I(\beta;y_S) = \entropy(\beta)-\entropy(\beta\mid y_S). \label{informationgain}
-\end{align}
+one that has natural advantages is the \emph{information gain}, %\emph{$D$-optimality criterion}: %which yields the following optimization problem
+ $V(S)= I(\beta;y_S) = \entropy(\beta)-\entropy(\beta\mid y_S)$
which is the entropy reduction on $\beta$ after the revelation of $y_S$ (also known as the mutual information between $y_S$ and $\beta$).
Hence, selecting a set of experiments $S$ that
maximizes $V(S)$ is equivalent to finding the set of experiments that minimizes
the uncertainty on $\beta$, as captured by the entropy reduction of its estimator.
-Under the linear model \eqref{model}, and the Gaussian prior, the information gain takes the following form (see, \emph{e.g.}, \cite{chaloner1995bayesian}):
+Under the linear model, and the Gaussian prior, the information gain takes the following form (see, \emph{e.g.}, \cite{chaloner1995bayesian}):
\begin{align}
I(\beta;y_S)&= \frac{1}{2}\log\det(R+ \T{X_S}X_S) - \frac{1}{2}\log\det R\label{dcrit} %\\
\end{align}
@@ -142,8 +150,7 @@ yields an approximation ratio of $\frac{5 e }{e-1}~$\cite{singer-mechanisms}; t
\subsection{Budget-Feasible Experimental Design: Strategic Case}
We study the following \emph{strategic} setting, in which the costs $c_i$ are {\em not} common knowledge and their reporting can be manipulated by the experiment subjects. The latter are strategic and wish to maximize their utility, which is the difference of the payment they receive and their true cost. We note that, though subjects may misreport $c_i$, they cannot lie about $x_i$ (\emph{i.e.}, all public features are verifiable prior to the experiment) nor $y_i$ (\emph{i.e.}, the subject cannot falsify her measurement).
-In this setting, experimental design reduces to a \emph{budget feasible reverse auction}, as introduced by \citeN{singer-mechanisms}; we review the formal definition in Appendix~\ref{app:budgetfeasible}. In short, given a budget $B$ and a value function $V:2^{\mathcal{N}}\to\reals_+$, a \emph{reverse auction mechanism} $\mathcal{M} = (S,p)$ comprises (a) an \emph{allocation function}\footnote{Note that $S$ would be more aptly termed as a \emph{selection} function, as this is a reverse auction, but we retain the term ``allocation'' to align with the familiar term from standard auctions.}
-$S:\reals_+^n \to 2^\mathcal{N}$, determining the set
+In this setting, experimental design reduces to a \emph{budget feasible reverse auction}, as introduced by \citeN{singer-mechanisms}; we review the formal definition in Appendix~\ref{app:budgetfeasible}. In short, given a budget $B$ and a value function $V:2^{\mathcal{N}}\to\reals_+$, a \emph{reverse auction mechanism} $\mathcal{M} = (S,p)$ comprises (a) an \emph{allocation function} $S:\reals_+^n \to 2^\mathcal{N}$, determining the set
of experiments to be purchased, and (b) a \emph{payment function}
$p:\reals_+^n\to \reals_+^n$, determining the payments $[p_i(c)]_{i\in \mathcal{N}}$ received by experiment subjects.