summaryrefslogtreecommitdiffstats
path: root/general.tex
diff options
context:
space:
mode:
authorStratis Ioannidis <stratis@stratis-Latitude-E6320.(none)>2012-11-05 12:43:11 -0800
committerStratis Ioannidis <stratis@stratis-Latitude-E6320.(none)>2012-11-05 12:43:11 -0800
commitf96f3e9a7233fbd170a375232a7740a3913bcc18 (patch)
treeb3b199c125297c9ab385aec7ca404ded0b4d674c /general.tex
parentea386112229740787e940788c4f821c779732920 (diff)
downloadrecommendation-f96f3e9a7233fbd170a375232a7740a3913bcc18.tar.gz
general by muthu
Diffstat (limited to 'general.tex')
-rw-r--r--general.tex54
1 files changed, 19 insertions, 35 deletions
diff --git a/general.tex b/general.tex
index 957521a..65b3c8f 100644
--- a/general.tex
+++ b/general.tex
@@ -1,14 +1,15 @@
\subsection{Bayesian Experimental Design}\label{sec:bed}
-In this section, we extend our results to Bayesian experimental design
-\cite{chaloner1995bayesian}. We show that objective function \eqref{modified}
-has a natural interpretation in this context, further motivating its selection
-as our objective. Moreover, we extend Theorem~\ref{thm:main} to a more general
-Bayesian setting.
-In the Bayesian setting, it is assumed that the experimenter has a prior
+%In this section, we extend our results to Bayesian experimental design
+%\cite{chaloner1995bayesian}. We show that objective function \eqref{modified}
+%has a natural interpretation in this context, further motivating its selection
+%as our objective. Moreover,
+
+We extend Theorem~\ref{thm:main} to a more general
+Bayesian setting where it is assumed that the experimenter \E\ has a {\em prior}
distribution on $\beta$: in particular, $\beta$ has a multivariate normal prior
with zero mean and covariance $\sigma^2R^{-1}\in \reals^{d^2}$ (where $\sigma^2$ is the noise variance).
-The experimenter estimates $\beta$ through \emph{maximum a posteriori estimation}: \emph{i.e.}, finding the parameter which maximizes the posterior distribution of $\beta$ given the observations $y_S$. Under the linearity assumption \eqref{model} and the Gaussian prior on $\beta$, maximum a posteriori estimation leads to the following maximization \cite{hastie}:
+\E\ estimates $\beta$ through \emph{maximum a posteriori estimation}: \emph{i.e.}, finding the parameter which maximizes the posterior distribution of $\beta$ given the observations $y_S$. Under the linearity assumption \eqref{model} and the Gaussian prior on $\beta$, maximum a posteriori estimation leads to the following maximization \cite{hastie}:
\begin{displaymath}
\hat{\beta} = \argmin_{\beta\in\reals^d} \sum_i (y_i - \T{\beta}x_i)^2
+ \T{\beta}R\beta
@@ -27,13 +28,15 @@ Assuming normal noise variables, the information gain is equal (up to a constan
\begin{align}
V(S) = \frac{1}{2}\log\det(R + \T{X_S}X_S)\label{bayesianobjective}
\end{align}
-Our objective \eqref{modified} clearly follows from \eqref{bayesianobjective}
-by setting $R=I_d$. Hence, our optimization can be interpreted as
+Our objective for \EDP\
+%\eqref{modified}
+clearly follows from \eqref{bayesianobjective}
+by setting $R=I_d$. Hence, the optimization discussed thus far can be interpreted as
a maximization of the information gain when the prior distribution has
a covariance $\sigma^2 I_d$, and the experimenter is solving a ridge regression
problem with penalty term $\norm{x}_2^2$.
-Moreover, our results can be extended to the general Bayesian case, by
+Our results can be extended to the general Bayesian case, by
replacing $I_d$ with the positive semidefinite matrix $R$. First, we re-set the
origin of the value function so that $V(\emptyset) = 0$:
\begin{align}\label{eq:normalized}
@@ -60,30 +63,9 @@ where $\mu$ is the smallest eigenvalue of $R$.
\subsection{$D$-Optimality and Beyond}
-Ideally, motivated by the $D$-optimality criterion, we would like to design a mechanism that maximizes \eqref{dcrit} within a good approximation ratio. In what follows, we consider a slightly more general objective as follows:
-\begin{center}
-\textsc{ExperimentalDesignProblem} (EDP)
-\begin{subequations}
-\begin{align}
-\text{Maximize}\quad V(S) &= \log\det(I_d+\T{X_S}X_S) \label{modified} \\
-\text{subject to}\quad \sum_{i\in S} c_i&\leq B
-\end{align}\label{edp}
-\end{subequations}
-\end{center} where $I_d\in \reals^{d\times d}$ is the identity matrix.
-\junk{One possible interpretation of \eqref{modified} is that, prior to the auction, the experimenter has free access to $d$ experiments whose features form an orthonormal basis in $\reals^d$. However, \eqref{modified} can also be motivated in the context of \emph{Bayesian experimental design} \cite{chaloner1995bayesian}. In short, the objective \eqref{modified} arises naturally when the experimenter retrieves the model $\beta$ through \emph{ridge regression}, rather than the linear regression \eqref{leastsquares} over the observed data; we explore this connection in Section~\ref{sec:bed}.
-}
-
-We present our results with this version of the objective function because it is simple and captures the versions
-we will need later (including an arbitrary matrix in place of $I_d$ or a zero matrix which will correspond to ~\eqref{dcrit} ).
-
-%Note that maximizing \eqref{modified} is equivalent to maximizing \eqref{dcrit} in the full-information case. In particular, $\det(\T{X_S}X_S)> \det(\T{X_{S'}}X_{S'})$ iff $\det(I_d+\T{X_S}X_S)>\det(I_d+\T{X_{S'}}X_{S'})$. In addition,
-\EDP{} \emph{and} the corresponding problem with objective \eqref{dcrit} are NP-hard; to see this, note that \textsc{Knapsack} reduces to EDP with dimension $d=1$ by mapping the weight of each item $w_i$ to an experiment with $x_i=w_i$. Moreover, \eqref{modified} is submodular, monotone and satisfies $V(\emptyset) = 0$.
-%, allowing us to use the extensive machinery for the optimization of submodular functions, as well as recent results in the
-% context of budget feasible auctions \cite{chen,singer-mechanisms}.
-
-%\stratis{A potential problem is that all of the above properties hold also for, \emph{e.g.}, $V(S)=\log(1+\det(\T{X_S}X_S))$\ldots}
-
-As \eqref{dcrit} may take arbitrarily small negative values, to define a meaningful approximation one would consider the (equivalent) maximization of $V(S) = \det\T{X_S}X_S$. %, for some strictly increasing, on-to function $f:\reals_+\to\reals_+$.
+We now reexamine the classical $D$-optimality in \eqref{dcrit}, which is~\ref{bayesianobjective} with $R$ replaced by
+the zero matrix.
+Since \eqref{dcrit} may take arbitrarily small negative values, to define a meaningful approximation one would consider the (equivalent) maximization of $V(S) = \det\T{X_S}X_S$. %, for some strictly increasing, on-to function $f:\reals_+\to\reals_+$.
However, the following lower bound implies that such an optimization goal cannot be attained under the constraints of truthfulness, budget feasibility, and individual rationality.
\begin{lemma}
For any $M>1$, there is no $M$-approximate, truthful, budget feasible, individually rational mechanism for a budget feasible reverse auction with value function $V(S) = \det{\T{X_S}X_S}$.
@@ -93,6 +75,8 @@ For any $M>1$, there is no $M$-approximate, truthful, budget feasible, individua
\input{proof_of_lower_bound1}
\end{proof}
+Beyond $D$-optimality,
+{\bf STRATIC, fill in..}
\subsection{Beyond Linear Models}
Selecting experiments that maximize the information gain in the Bayesian setup
@@ -135,7 +119,7 @@ variables is a submodular function. Thus, the value function is written in
This lemma implies that learning an \emph{arbitrary hypothesis, under an
arbitrary prior} when noise is conditionally independent leads to a submodular
-value function. Hence, we can apply the results from Chen \emph{et al.} to get
+value function. Hence, we can apply the previously known results to get
the following corollary:
\begin{corollary}
For Bayesian experimental design with the objective given by the