From f28514e9eb3ed56091b3d10b8f6efc3b91a0e2b6 Mon Sep 17 00:00:00 2001 From: Stratis Ioannidis Date: Wed, 3 Jul 2013 07:04:36 -0700 Subject: up to myerson --- problem.tex | 72 ++++++++++++++++++++++++++++++++++++++++--------------------- 1 file changed, 48 insertions(+), 24 deletions(-) (limited to 'problem.tex') diff --git a/problem.tex b/problem.tex index a5db2ac..c05e20c 100644 --- a/problem.tex +++ b/problem.tex @@ -3,7 +3,7 @@ The theory of experimental design \cite{pukelsheim2006optimal,atkinson2007optimum,chaloner1995bayesian} considers the following formal setting. % studies how an experimenter \E\ should select the parameters of a set of experiments she is about to conduct. In general, the optimality of a particular design depends on the purpose of the experiment, \emph{i.e.}, the quantity \E\ is trying to learn or the hypothesis she is trying to validate. Due to their ubiquity in statistical analysis, a large literature on the subject focuses on learning \emph{linear models}, where \E\ wishes to fit a linear function to the data she has collected. %Putting cost considerations aside -Suppose that an experimenter \E\ wishes to conduct $k$ among $n$ possible experiments. Each experiment $i\in\mathcal{N}\defeq \{1,\ldots,n\}$ is associated with a set of parameters (or features) $x_i\in \reals^d$, normalized so that $\|x_i\|_2\leq 1$. Denote by $S\subseteq \mathcal{N}$, where $|S|=k$, the set of experiments selected; upon its execution, experiment $i\in S$ reveals an output variable (the ``measurement'') $y_i$, related to the experiment features $x_i$ through a linear function, \emph{i.e.}, +Suppose that an experimenter \E\ wishes to conduct $k$ among $n$ possible experiments. Each experiment $i\in\mathcal{N}\defeq \{1,\ldots,n\}$ is associated with a set of parameters (or features) $x_i\in \reals^d$, normalized so that $$b\leq \|x_i\|_2\leq 1,$$ for some $b>0$. Denote by $S\subseteq \mathcal{N}$, where $|S|=k$, the set of experiments selected; upon its execution, experiment $i\in S$ reveals an output variable (the ``measurement'') $y_i$, related to the experiment features $x_i$ through a linear function, \emph{i.e.}, \begin{align}\label{model} \forall i\in\mathcal{N},\quad y_i = \T{\beta} x_i + \varepsilon_i \end{align} @@ -19,7 +19,7 @@ etc.). The magnitude of the coefficient $\beta_i$ captures the effect that featu The purpose of these experiments is to allow \E\ to estimate the model $\beta$. In particular, assume that the experimenter \E\ has a {\em prior} distribution on $\beta$, \emph{i.e.}, $\beta$ has a multivariate normal prior -with zero mean and covariance $\sigma^2R^{-1}\in \reals^{d^2}$ (where $\sigma^2$ is the noise variance). +with zero mean and covariance $\sigma^2R^{-1}\in \reals^{d^2}$ (where $\sigma^2$ is the noise variance and the prior on $\beta$). Then, \E\ estimates $\beta$ through \emph{maximum a posteriori estimation}: \emph{i.e.}, finding the parameter which maximizes the posterior distribution of $\beta$ given the observations $y_S$. Under the linearity assumption \eqref{model} and the Gaussian prior on $\beta$, maximum a posteriori estimation leads to the following maximization \cite{hastie}: \begin{align} \hat{\beta} = \argmax_{\beta\in\reals^d} \prob(\beta\mid y_S) =\argmin_{\beta\in\reals^d} \big(\sum_{i\in S} (y_i - \T{\beta}x_i)^2 @@ -33,17 +33,12 @@ This optimization, commonly known as \emph{ridge regression}, includes an additi %\begin{align} \hat{\beta} &=\max_{\beta\in\reals^d}\prob(y_S;\beta) =\argmin_{\beta\in\reals^d } \sum_{i\in S}(\T{\beta}x_i-y_i)^2 \nonumber\\ %& = (\T{X_S}X_S)^{-1}X_S^Ty_S\label{leastsquares}\end{align} %The estimator $\hat{\beta}$ is unbiased, \emph{i.e.}, $\expt{\hat{\beta}} = \beta$ (where the expectation is over the noise variables $\varepsilon_i$). Furthermore, $\hat{\beta}$ is a multidimensional normal random variable with mean $\beta$ and covariance matrix $(X_S\T{X_S})^{-1}$. -Note that the estimator $\hat{\beta}$ is a linear map of $y_S$; as $y_S$ is -a multidimensional normal r.v., so is $\hat{\beta}$ (the randomness coming from -the noise terms $\varepsilon_i$). -%In particular, $\hat{\beta}$ has %mean $\beta$% (\emph{i.e.}, it is an \emph{unbiased estimator}) and -% covariance -%$(R+\T{X_S}X_S)^{-1}$. +%Note that the estimator $\hat{\beta}$ is a linear map of $y_S$; as $y_S$ is a multidimensional normal r.v., so is $\hat{\beta}$ (the randomness coming from the noise terms $\varepsilon_i$). +%In particular, $\hat{\beta}$ has %mean $\beta$% (\emph{i.e.}, it is an \emph{unbiased estimator}) and covariance $(R+\T{X_S}X_S)^{-1}$. Let $V:2^\mathcal{N}\to\reals$ be a \emph{value function}, quantifying how informative a set of experiments $S$ is in estimating $\beta$. The classical experimental design problem amounts to finding a set $S$ that maximizes $V(S)$ subject to the constraint $|S|\leq k$. -A variety of different value functions are used in literature~\cite{pukelsheim2006optimal}. -%; almost all make use of the covariance $(R+\T{X_S}X_S)^{-1}$ of the estimator $\hat{\beta}$. -A value function that has natural advantages is the \emph{information gain}: %\emph{$D$-optimality criterion}: %which yields the following optimization problem +A variety of different value functions are used in literature~\cite{pukelsheim2006optimal,boyd2004convex}; %; almost all make use of the covariance $(R+\T{X_S}X_S)^{-1}$ of the estimator $\hat{\beta}$. +one that has natural advantages is the \emph{information gain}: %\emph{$D$-optimality criterion}: %which yields the following optimization problem \begin{align} V(S)= I(\beta;y_S) = \entropy(\beta)-\entropy(\beta\mid y_S). \label{informationgain} \end{align} @@ -57,7 +52,13 @@ Under the linear model \eqref{model}, and the Gaussian prior, the information ga \end{align} This value function is known in the experimental design literature as the $D$-optimality criterion -\cite{pukelsheim2006optimal,atkinson2007optimum,chaloner1995bayesian}. +\cite{pukelsheim2006optimal,atkinson2007optimum,chaloner1995bayesian}. + +Note that the estimator $\hat{\beta}$ is a linear map of $y_S$; as $y_S$ is a multidimensional normal r.v., so is $\hat{\beta}$ (the randomness coming from the noise terms $\varepsilon_i$). +In particular, $\hat{\beta}$ has %mean $\beta$% (\emph{i.e.}, it is an \emph{unbiased estimator}) and +covariance $(R+\T{X_S}X_S)^{-1}$. As such, maximizing $V(S)$ can alternatively be seen as a means of reducing the uncertainty on estimator $\hat{\beta}$ my minimizing the product of the eigenvalues of its covariance. + +%An alternative interpretation, given that $(R+ \T{X_S}X_S)^{-1}$ is the covariance of the estimator $\hat{\beta}$, is that it tries to minimize the %which is indeed a function of the covariance matrix $(R+\T{X_S}X_S)^{-1}$. %defined as $-\infty$ when $\mathrm{rank}(\T{X_S}X_S)\delta.$ The mechanism is \emph{truthful} if $\delta=0$. \item \emph{Budget Feasibility.} The sum of the payments should not exceed the budget constraint, \emph{i.e.} $\sum_{i\in\mathcal{N}} p_i(c) \leq B.\label{budget}$ @@ -186,7 +210,7 @@ Given that the full information problem \EDP{} is NP-hard, we further seek mecha As noted in \cite{singer-mechanisms, chen}, budget feasible reverse auctions are \emph{single parameter} auctions: each agent has only one private value (namely, $c_i$). As such, Myerson's Theorem \cite{myerson} gives -a characterization of truthful mechanisms. +a characterization of truthful mechanisms. NEEDS TO BE FIXED \begin{lemma}[\citeN{myerson}]\label{thm:myerson} \sloppy A normalized mechanism $\mathcal{M} = (S,p)$ for a single parameter auction is -- cgit v1.2.3-70-g09d2