1 files changed, 28 insertions, 11 deletions
diff --git a/problem.tex b/problem.tex
index e820f85..546500d 100644
--- a/problem.tex
+++ b/problem.tex
@@ -1,7 +1,7 @@
 \label{sec:prel}
 \subsection{Experimental Design}
 
-The theory of experimental design \cite{pukelsheim2006optimal,atkinson2007optimum}  studies how an experimenter \E\  should select the parameters of a set of experiments she is about to conduct. In general, the optimality of a particular design depends on the purpose of the experiment, \emph{i.e.}, the quantity \E\  is trying to learn or the hypothesis she is trying to validate. Due to their ubiquity in statistical analysis, a large literature on the subject focuses on learning \emph{linear models}, where \E\ wishes to  fit  a linear map to the data she has collected.
+The theory of experimental design \cite{pukelsheim2006optimal,atkinson2007optimum}  studies how an experimenter \E\  should select the parameters of a set of experiments she is about to conduct. In general, the optimality of a particular design depends on the purpose of the experiment, \emph{i.e.}, the quantity \E\  is trying to learn or the hypothesis she is trying to validate. Due to their ubiquity in statistical analysis, a large literature on the subject focuses on learning \emph{linear models}, where \E\ wishes to  fit  a linear function to the data she has collected.
 
 More precisely, putting cost considerations aside, suppose that \E\ wishes to conduct $k$ among $n$ possible experiments. Each experiment $i\in\mathcal{N}\defeq \{1,\ldots,n\}$ is associated with a set of parameters (or features) $x_i\in \reals^d$, normalized so that $\|x_i\|_2\leq 1$. Denote by $S\subseteq \mathcal{N}$, where $|S|=k$, the set of experiments selected; upon its execution, experiment $i\in S$ reveals an output variable (the ``measurement'') $y_i$,  related to the experiment features $x_i$ through a linear function, \emph{i.e.},
 \begin{align}\label{model}
@@ -27,6 +27,7 @@ A variety of different value functions are used in experimental design~\cite{puk
 \begin{align}
  V(S) &= \frac{1}{2}\log\det \T{X_S}X_S \label{dcrit} %\\
 \end{align}
+defined as $-\infty$ when $\mathrm{rank}(\T{X_S}X_S)<d$. 
 As $\hat{\beta}$ is a multidimensional normal random variable, the
 $D$-optimality criterion is equal (up to a constant) to the negative of the
 entropy of $\hat{\beta}$. Hence, selecting a set of experiments $S$ that
@@ -39,8 +40,7 @@ the uncertainty on $\beta$, as captured by the entropy of its estimator.
 %\end{align}
 %There are several  reasons
 
- Value function \eqref{dcrit} is undefined when $\mathrm{rank}(\T{X_S}X_S)<d$; in this case, we take $V(S)=-\infty$ (so that $V$ takes values in the extended reals).
-Note that \eqref{dcrit} is a submodular set function, \emph{i.e.}, 
+ Note that \eqref{dcrit} is a submodular set function, \emph{i.e.}, 
 $V(S)+V(T)\geq V(S\cup T)+V(S\cap T)$ for all $S,T\subseteq \mathcal{N}$; it is also monotone, \emph{i.e.}, $V(S)\leq V(T)$ for all $S\subset T$. %In addition, the maximization of convex relaxations of this function is a  well-studied problem \cite{boyd}.
 
 \subsection{Budget Feasible Reverse Auctions}
@@ -71,8 +71,7 @@ In addition to the above, mechanism design in budget feasible reverse auctions s
     \item \emph{Truthfulness.} An agent has no incentive to missreport her true cost. Formally, let $c_{-i}$
  be a vector of costs of all agents except $i$. %    $c_{-i}$ being the same). Let $[p_i']_{i\in \mathcal{N}} = p(c_i',
 %    c_{-i})$, then the 
-A mechanism is truthful iff for every $i \in \mathcal{N}$ and every two cost vectors $(c_i,c_{-i})$ and $(c_i',c_{-i})$
- $p_i(c_i,c_{-i}) - s_i(c_i,c_{-i})\cdot c_i \geq p_i(c_i',c_{-i}) - s(c_i',c_{-i})\cdot c_i$.
+A mechanism is truthful iff  $p_i(c_i,c_{-i}) - s_i(c_i,c_{-i})\cdot c_i \geq p_i(c_i',c_{-i}) - s(c_i',c_{-i})\cdot c_i$ for every $i \in \mathcal{N}$ and every two cost vectors $(c_i,c_{-i})$ and $(c_i',c_{-i})$.
     \item \emph{Budget Feasibility.} The sum of the payments should not exceed the
     budget constraint:
     %\begin{displaymath}
@@ -85,7 +84,8 @@ A mechanism is truthful iff for every $i \in \mathcal{N}$ and every two cost vec
     \begin{displaymath}
         OPT \leq \alpha V(S).
     \end{displaymath}
-    The approximation ratio captures the \emph{price of truthfulness}, \emph{i.e.},  the relative value loss incurred by adding the truthfulness constraint. % to the value function maximization.
+ %   The approximation ratio captures the \emph{price of truthfulness}, \emph{i.e.},  the relative value loss incurred by adding the truthfulness constraint. % to the value function maximization.
+    We stress here that the approximation ratio is defined with respect \emph{to the maximal value in the full information case}.
     \item \emph{Computational efficiency.} The allocation and payment function
     should be computable in polynomial time in the number of
     agents $n$. %\thibaut{Should we say something about the black-box model for $V$?  Needed to say something in general, but not in our case where the value function can be computed in polynomial time}.
@@ -130,7 +130,7 @@ biometric information (\emph{e.g.}, her red cell blood count, a genetic marker,
 etc.). The cost $c_i$ is the amount the subject deems sufficient to
 incentivize her participation in the study. Note that, in this setup, the feature vectors $x_i$ are public information that the experimenter can consult prior the experiment design. Moreover, though a subject may lie about her true cost $c_i$, she cannot lie about $x_i$ (\emph{i.e.}, all features are verifiable upon collection) or $y_i$ (\emph{i.e.}, she cannot falsify her measurement).
 
-Ideally, motivated by the $D$-optimality criterion, we would like to design a mechanism that maximizes \eqref{dcrit} within a good approximation ratio.  In what follows, we consider a slightly more general objective as follows:
+Ideally, motivated by the $D$-optimality criterion, we would like to design a mechanism that maximizes \eqref{dcrit} within a good approximation ratio. The latter however can take negative values, making it unfit for stating approximation ratios.  As such, we consider the following full information problem:
 \begin{center}
 \textsc{ExperimentalDesignProblem} (EDP)
 \begin{subequations}
@@ -140,19 +140,36 @@ Ideally, motivated by the $D$-optimality criterion, we would like to design a me
 \end{align}\label{edp}
 \end{subequations}
 \end{center} where $I_d\in \reals^{d\times d}$ is the identity matrix.
-\junk{One possible interpretation of \eqref{modified} is that, prior to the auction, the experimenter has free access to $d$ experiments whose features form an orthonormal basis in $\reals^d$. However, \eqref{modified} can also be motivated in the context of \emph{Bayesian experimental design} \cite{chaloner1995bayesian}. In short, the objective \eqref{modified} arises naturally when the experimenter retrieves the model $\beta$ through \emph{ridge regression}, rather than the linear regression \eqref{leastsquares} over the observed data; we explore this connection in Section~\ref{sec:bed}. 
-}
+The objective~\eqref{modified} is non-negative, and can be motivated in the context of \emph{Bayesian experimental design} \cite{chaloner1995bayesian}. In short, it arises naturally when the experimenter retrieves the model $\beta$ through \emph{ridge regression}, rather than the linear regression \eqref{leastsquares} over the observed data; we explore this connection in Section~\ref{sec:bed}. 
+
+
+
+
+%We present our results with this version of the objective function because it is simple and captures the versions
+%we will need later (including  an arbitrary matrix in place of $I_d$ or a zero matrix which will correspond to ~\eqref{dcrit} ).
+
+%\begin{center}
+%\textsc{ExperimentalDesignProblem} (EDP)
+%\begin{subequations}
+%\begin{align}
+%\text{Maximize}\quad V(S) &= \log\det(I_d+\T{X_S}X_S) \label{modified} \\
+%\text{subject to}\quad \sum_{i\in S} p_i&\leq B\\
+%p_i&\geq c_i, \quad i\in S\\
+%\end{align}\label{edp}
+%\end{subequations}
+%\end{center} 
 
-We present our results with this version of the objective function because it is simple and captures the versions
-we will need later (including  an arbitrary matrix in place of $I_d$ or a zero matrix which will correspond to ~\eqref{dcrit} ).
 
 %Note that maximizing \eqref{modified} is equivalent to maximizing \eqref{dcrit} in the full-information case. In particular, $\det(\T{X_S}X_S)> \det(\T{X_{S'}}X_{S'})$ iff $\det(I_d+\T{X_S}X_S)>\det(I_d+\T{X_{S'}}X_{S'})$. In addition,
 \EDP{} \emph{and} the corresponding problem with objective \eqref{dcrit} are NP-hard; to see this, note that \textsc{Knapsack} reduces to EDP with dimension $d=1$ by mapping the weight of each item $w_i$ to an experiment with $x_i=w_i$. Moreover, \eqref{modified} is submodular, monotone and satisfies $V(\emptyset) = 0$.
+Finally, we define the strategic version  of \EDP{} (denoted by \SEDP) by adding the additional constraints of truthfulness, individual rationality, and normal payments, as described in the previous section.
+
 %, allowing us to use the extensive machinery for the optimization of submodular functions, as well as recent results in the 
 % context of budget feasible auctions \cite{chen,singer-mechanisms}. 
 
 %\stratis{A potential problem is that all of the above properties hold also for, \emph{e.g.}, $V(S)=\log(1+\det(\T{X_S}X_S))$\ldots}
 
+
 \begin{comment}\junk{
 As \eqref{dcrit} may take arbitrarily small negative values, to define a meaningful approximation one would consider  the (equivalent) maximization of $V(S) = \det\T{X_S}X_S$. %, for some strictly increasing, on-to function $f:\reals_+\to\reals_+$. 
 However, the following lower bound implies that such an optimization goal cannot be attained under the constraints of  truthfulness, budget feasibility, and individual rationality.