diff options
| author | Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> | 2012-11-05 12:43:11 -0800 |
|---|---|---|
| committer | Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> | 2012-11-05 12:43:11 -0800 |
| commit | f96f3e9a7233fbd170a375232a7740a3913bcc18 (patch) | |
| tree | b3b199c125297c9ab385aec7ca404ded0b4d674c | |
| parent | ea386112229740787e940788c4f821c779732920 (diff) | |
| download | recommendation-f96f3e9a7233fbd170a375232a7740a3913bcc18.tar.gz | |
general by muthu
| -rw-r--r-- | general.tex | 54 |
1 files changed, 19 insertions, 35 deletions
diff --git a/general.tex b/general.tex index 957521a..65b3c8f 100644 --- a/general.tex +++ b/general.tex @@ -1,14 +1,15 @@ \subsection{Bayesian Experimental Design}\label{sec:bed} -In this section, we extend our results to Bayesian experimental design -\cite{chaloner1995bayesian}. We show that objective function \eqref{modified} -has a natural interpretation in this context, further motivating its selection -as our objective. Moreover, we extend Theorem~\ref{thm:main} to a more general -Bayesian setting. -In the Bayesian setting, it is assumed that the experimenter has a prior +%In this section, we extend our results to Bayesian experimental design +%\cite{chaloner1995bayesian}. We show that objective function \eqref{modified} +%has a natural interpretation in this context, further motivating its selection +%as our objective. Moreover, + +We extend Theorem~\ref{thm:main} to a more general +Bayesian setting where it is assumed that the experimenter \E\ has a {\em prior} distribution on $\beta$: in particular, $\beta$ has a multivariate normal prior with zero mean and covariance $\sigma^2R^{-1}\in \reals^{d^2}$ (where $\sigma^2$ is the noise variance). -The experimenter estimates $\beta$ through \emph{maximum a posteriori estimation}: \emph{i.e.}, finding the parameter which maximizes the posterior distribution of $\beta$ given the observations $y_S$. Under the linearity assumption \eqref{model} and the Gaussian prior on $\beta$, maximum a posteriori estimation leads to the following maximization \cite{hastie}: +\E\ estimates $\beta$ through \emph{maximum a posteriori estimation}: \emph{i.e.}, finding the parameter which maximizes the posterior distribution of $\beta$ given the observations $y_S$. Under the linearity assumption \eqref{model} and the Gaussian prior on $\beta$, maximum a posteriori estimation leads to the following maximization \cite{hastie}: \begin{displaymath} \hat{\beta} = \argmin_{\beta\in\reals^d} \sum_i (y_i - \T{\beta}x_i)^2 + \T{\beta}R\beta @@ -27,13 +28,15 @@ Assuming normal noise variables, the information gain is equal (up to a constan \begin{align} V(S) = \frac{1}{2}\log\det(R + \T{X_S}X_S)\label{bayesianobjective} \end{align} -Our objective \eqref{modified} clearly follows from \eqref{bayesianobjective} -by setting $R=I_d$. Hence, our optimization can be interpreted as +Our objective for \EDP\ +%\eqref{modified} +clearly follows from \eqref{bayesianobjective} +by setting $R=I_d$. Hence, the optimization discussed thus far can be interpreted as a maximization of the information gain when the prior distribution has a covariance $\sigma^2 I_d$, and the experimenter is solving a ridge regression problem with penalty term $\norm{x}_2^2$. -Moreover, our results can be extended to the general Bayesian case, by +Our results can be extended to the general Bayesian case, by replacing $I_d$ with the positive semidefinite matrix $R$. First, we re-set the origin of the value function so that $V(\emptyset) = 0$: \begin{align}\label{eq:normalized} @@ -60,30 +63,9 @@ where $\mu$ is the smallest eigenvalue of $R$. \subsection{$D$-Optimality and Beyond} -Ideally, motivated by the $D$-optimality criterion, we would like to design a mechanism that maximizes \eqref{dcrit} within a good approximation ratio. In what follows, we consider a slightly more general objective as follows: -\begin{center} -\textsc{ExperimentalDesignProblem} (EDP) -\begin{subequations} -\begin{align} -\text{Maximize}\quad V(S) &= \log\det(I_d+\T{X_S}X_S) \label{modified} \\ -\text{subject to}\quad \sum_{i\in S} c_i&\leq B -\end{align}\label{edp} -\end{subequations} -\end{center} where $I_d\in \reals^{d\times d}$ is the identity matrix. -\junk{One possible interpretation of \eqref{modified} is that, prior to the auction, the experimenter has free access to $d$ experiments whose features form an orthonormal basis in $\reals^d$. However, \eqref{modified} can also be motivated in the context of \emph{Bayesian experimental design} \cite{chaloner1995bayesian}. In short, the objective \eqref{modified} arises naturally when the experimenter retrieves the model $\beta$ through \emph{ridge regression}, rather than the linear regression \eqref{leastsquares} over the observed data; we explore this connection in Section~\ref{sec:bed}. -} - -We present our results with this version of the objective function because it is simple and captures the versions -we will need later (including an arbitrary matrix in place of $I_d$ or a zero matrix which will correspond to ~\eqref{dcrit} ). - -%Note that maximizing \eqref{modified} is equivalent to maximizing \eqref{dcrit} in the full-information case. In particular, $\det(\T{X_S}X_S)> \det(\T{X_{S'}}X_{S'})$ iff $\det(I_d+\T{X_S}X_S)>\det(I_d+\T{X_{S'}}X_{S'})$. In addition, -\EDP{} \emph{and} the corresponding problem with objective \eqref{dcrit} are NP-hard; to see this, note that \textsc{Knapsack} reduces to EDP with dimension $d=1$ by mapping the weight of each item $w_i$ to an experiment with $x_i=w_i$. Moreover, \eqref{modified} is submodular, monotone and satisfies $V(\emptyset) = 0$. -%, allowing us to use the extensive machinery for the optimization of submodular functions, as well as recent results in the -% context of budget feasible auctions \cite{chen,singer-mechanisms}. - -%\stratis{A potential problem is that all of the above properties hold also for, \emph{e.g.}, $V(S)=\log(1+\det(\T{X_S}X_S))$\ldots} - -As \eqref{dcrit} may take arbitrarily small negative values, to define a meaningful approximation one would consider the (equivalent) maximization of $V(S) = \det\T{X_S}X_S$. %, for some strictly increasing, on-to function $f:\reals_+\to\reals_+$. +We now reexamine the classical $D$-optimality in \eqref{dcrit}, which is~\ref{bayesianobjective} with $R$ replaced by +the zero matrix. +Since \eqref{dcrit} may take arbitrarily small negative values, to define a meaningful approximation one would consider the (equivalent) maximization of $V(S) = \det\T{X_S}X_S$. %, for some strictly increasing, on-to function $f:\reals_+\to\reals_+$. However, the following lower bound implies that such an optimization goal cannot be attained under the constraints of truthfulness, budget feasibility, and individual rationality. \begin{lemma} For any $M>1$, there is no $M$-approximate, truthful, budget feasible, individually rational mechanism for a budget feasible reverse auction with value function $V(S) = \det{\T{X_S}X_S}$. @@ -93,6 +75,8 @@ For any $M>1$, there is no $M$-approximate, truthful, budget feasible, individua \input{proof_of_lower_bound1} \end{proof} +Beyond $D$-optimality, +{\bf STRATIC, fill in..} \subsection{Beyond Linear Models} Selecting experiments that maximize the information gain in the Bayesian setup @@ -135,7 +119,7 @@ variables is a submodular function. Thus, the value function is written in This lemma implies that learning an \emph{arbitrary hypothesis, under an arbitrary prior} when noise is conditionally independent leads to a submodular -value function. Hence, we can apply the results from Chen \emph{et al.} to get +value function. Hence, we can apply the previously known results to get the following corollary: \begin{corollary} For Bayesian experimental design with the objective given by the |
