6 files changed, 96 insertions, 52 deletions
diff --git a/definitions.tex b/definitions.tex
index 526f544..de551ca 100644
--- a/definitions.tex
+++ b/definitions.tex
@@ -5,6 +5,7 @@
 \newtheorem{example}{Example}
 \newtheorem{prop}{Proposition}
 \newtheorem{theorem}{Theorem}
+\newtheorem{corollary}{Corollary}
 %\newcommand*{\defeq}{\stackrel{\text{def}}{=}}
 \newcommand*{\defeq}{\equiv}
 \newcommand{\var}{\mathop{\mathrm{Var}}}
@@ -26,4 +27,4 @@
 \newcommand{\E}{{\tt E}}
 \newcommand{\id}{\mathbbm{1}}
 \newcommand{\junk}[1]{}
-\newcommand{\edp}{{\tt EDP}}
-\ No newline at end of file
+\newcommand{\edp}{{\tt EDP}}
diff --git a/general.tex b/general.tex
index c11cb99..957521a 100644
--- a/general.tex
+++ b/general.tex
@@ -5,11 +5,13 @@ has a natural interpretation in this context, further motivating its selection
 as our objective.  Moreover, we extend Theorem~\ref{thm:main} to a more general
 Bayesian setting.
 
-In the Bayesian setting, it is assumed that the experimenter has a prior distribution on $\beta$: in particular, $\beta$ has a multivariate normal prior with zero mean  and covariance $\sigma^2R\in \reals^{d^2}$ (where $\sigma^2$ is the noise variance). 
+In the Bayesian setting, it is assumed that the experimenter has a prior
+distribution on $\beta$: in particular, $\beta$ has a multivariate normal prior
+with zero mean  and covariance $\sigma^2R^{-1}\in \reals^{d^2}$ (where $\sigma^2$ is the noise variance). 
 The experimenter estimates $\beta$ through \emph{maximum a posteriori estimation}: \emph{i.e.}, finding the parameter which maximizes the posterior distribution of $\beta$ given the observations $y_S$. Under the linearity assumption \eqref{model} and the Gaussian prior on $\beta$, maximum a posteriori estimation leads to the following maximization \cite{hastie}: 
 \begin{displaymath}
     \hat{\beta} = \argmin_{\beta\in\reals^d} \sum_i (y_i - \T{\beta}x_i)^2
-    + \sum_i \norm{R\beta}_2^2
+    + \T{\beta}R\beta
 \end{displaymath}
 This optimization, commonly known as \emph{ridge regression}, includes an additional penalty term compared to the least squares estimation \eqref{leastsquares}.
 
@@ -47,7 +49,7 @@ Theorem~\ref{thm:main}:
 \begin{theorem}
  There exists a truthful, individually rational and budget feasible mechanism for the objective
  function $\tilde{V}$ given by \eqref{eq:normalized}. Furthermore, for any $\varepsilon
- > 0$, in time $O(\text{poly}(|\mathcal{N}|, d, \log\log \varepsilon^{-1}))$,
+ > 0$, in time $O(\text{poly}(n, d, \log\log \varepsilon^{-1}))$,
  the algorithm computes a set $S^*$ such that:
  \begin{displaymath}
     OPT \leq 
@@ -93,11 +95,18 @@ For any $M>1$, there is no $M$-approximate, truthful, budget feasible, individua
 
 
 \subsection{Beyond Linear Models}
-Selecting experiments that maximize the information gain in the Bayesian setup leads to a natural generalization to other learning examples beyond linear regression. In particular, in the more general PAC learning setup \cite{...}, the features $x_i$, $i\in \mathcal{N}$ take values in some general set $\Omega$, called the \emph{query space}, and measurements $y_i\in\reals$ are given by 
+Selecting experiments that maximize the information gain in the Bayesian setup
+leads to a natural generalization to other learning examples beyond linear
+regression. In particular, in the more general PAC learning setup \cite{valiant}, the features $x_i$, $i\in \mathcal{N}$ take values in some general set $\Omega$, called the \emph{query space}, and measurements $y_i\in\reals$ are given by 
  \begin{equation}\label{eq:hypothesis-model}
     y_i = h(x_i) + \varepsilon_i
 \end{equation}
-where $h\in \mathcal{H}$ for some subset $\mathcal{H}$ of all possible mappings $h:\Omega\to\reals$, called the \emph{hypothesis space}, and $\varepsilon_i$ are random variables in $\reals$, not necessarily identically distributed, that are independent \emph{conditioned on $h$}. This model is quite broad, and captures many learning tasks, such as:
+where $h\in \mathcal{H}$ for some subset $\mathcal{H}$ of all possible mappings
+$h:\Omega\to\reals$, called the \emph{hypothesis space}, and $\varepsilon_i$
+are random variables in $\reals$, not necessarily identically distributed, that
+are independent \emph{conditioned on $h$}. This model is quite broad, and
+captures many learning tasks, such as support vector machines (SVM), linear
+discriminant analysis (LDA), and the following tasks:
 \begin{enumerate}
 \item\textbf{Generalized Linear Regression.} $\Omega=\reals^d$,  $\mathcal{H}$ is the set of linear maps $\{h(x) = \T{\beta}x \text{ s.t. } \beta\in \reals^d\}$, and $\varepsilon_i$ are independent zero-mean normal variables, where  $\expt{\varepsilon_i^2}=\sigma_i$. 
 \item\textbf{Logistic Regression.} $\Omega=\reals^d$,  $\mathcal{H}$ is the set of maps $\{h(x) = \frac{e^{\T{\beta} x}}{1+e^{\T{\beta} x}} \text{ s.t. } \beta\in\reals^d\}$, and $\varepsilon_i$ are independent conditioned on $h$ such that $$\varepsilon_i=\begin{cases} 1- h(x_i),& \text{w.~prob.}\;h(x_i)\\-h(x_i),&\text{w.~prob.}\;1-h(x_i)\end{cases}$$
@@ -113,27 +122,37 @@ This is a monotone set function, and it clearly satisfies $V(\emptyset)=0$. Thou
 The value function given by the information gain \eqref{general} is submodular.
 \end{lemma}
 \begin{proof}
-The lemma is proved in a slightly different context in \cite{krause2005near}; we
-repeat the proof here for the sake of completeness. Using the chain rule for
-the conditional entropy we get:
+Using the chain rule for the conditional entropy we get:
 \begin{equation}\label{eq:chain-rule}
     V(S) = H(y_S) - H(y_S \mid \beta) 
     = H(y_S) - \sum_{i\in S} H(y_i \mid \beta)
 \end{equation}
 where the second equality comes from the independence of the $y_i$'s
 conditioned on $\beta$. Recall that the joint entropy of a set of random
-variables is a submodular function. Thus, our value function is written in
-\eqref{eq:chain-rule} as the sum of a submodular function and a modular function. 
+variables is a submodular function. Thus, the value function is written in
+\eqref{eq:chain-rule} as the sum of a submodular function and a modular function.
 \end{proof}
 
-This lemma that implies that learning an \emph{arbitrary hypothesis, under an
+This lemma implies that learning an \emph{arbitrary hypothesis, under an
 arbitrary prior} when noise is conditionally independent leads to a submodular
-value function. Hence the universally truthful 7.91-approximate mechanism by
-Chen \emph{et al.} \cite{chen} immediately applies in the value query model.
-Moreover, in cases where maximizing \eqref{general} can be done in polynomial
-time in the full-information setup, the truthful $8.34$-appoximate mechanism of
-Chen \emph{et al.}~applies again. However that, in many scenarios covered by
-this model (inlcuding the last two examples above), even computing the entropy
+value function. Hence, we can apply the results from Chen \emph{et al.} to get
+the following corollary:
+\begin{corollary}
+    For Bayesian experimental design with the objective given by the
+    information gain \eqref{general}, there exists a randomized, budget
+    feasible, individually rational, and universally truthful mechanism. This
+    mechanism has a complexity $O\big(\text{poly}(n,d)\big)$ and an
+    approximation ratio of $7.91$.
+
+    In cases where maximizing \eqref{general} can be done in polynomial time in
+    the full-information setup, there exists a randomized, budget feasible,
+    individually rational, and truthful mechanism for Bayesian experimental
+    design. This mechanism has a complexity $O\big(\text{poly}(n,d)\big)$ and
+    an approximation ratio of $8.34$. 
+\end{corollary}
+
+Note however that, in many scenarios covered by
+this model (including the last two examples above), even computing the entropy
 under a given set might be a hard task---\emph{i.e.}, the value query model may
 not apply. Hence, identifying learning tasks in the above class for which
 truthful or universally truthful constant approximation mechanisms exist, or
diff --git a/intro.tex b/intro.tex
index f0545bb..83d0f87 100644
--- a/intro.tex
+++ b/intro.tex
@@ -10,20 +10,22 @@ There is a rich literature about estimation procedures, as well as for means for
 if \E\ can conduct only a limited number of experiments, so the estimation process returns $\beta$
 that approximates the true parameter of the underlying population \cite{ginebra2007measure,le1996comparison,chaloner1995bayesian,boyd2004convex}. 
 
-We depart from this classical set up by viewing experimental design in a strategic setting, and by studying mechanism design issues. 
-In our setup, experiments cannot be manipulated and hence measurements are considered precise.\footnote{Thus, the experiments of our interest are statistically significant ones where each experiment provides a reliable outcome.}  However, there
+We depart from this classical setup by viewing experimental design in a strategic setting, and by studying mechanism design issues. 
+In our setting, experiments cannot be manipulated and hence measurements are considered reliable. %\footnote{Thus, the experiments of our interest are statistically significant ones where each experiment provides a reliable outcome.}  
+However, there
 is a cost $c_i$ associated with experimenting on
 subject $i$ which varies from subject to subject. This may be viewed as the  
-cost subject $i$ incurs when tested, and hence $i$ needs to be reimbursed; or, it might be viewed as the incentive for $i$
-to participate in the experiment; or, it might be the inherent value of the data.  This economic aspect has always been inherent in experimental design: experimenters often work within strict budgets and design creative incentives. However, we are not aware of principled study of this setting from a strategic point of view. When subjects are strategic, they may have an incentive to mis-report their cost. 
+cost subject $i$ incurs when tested and for which she needs to be reimbursed; or, it might be viewed as the incentive for $i$
+to participate in the experiment; or, it might be the inherent value of the data.  When subjects are strategic, they may have an incentive to misreport their cost.  This economic aspect has always been inherent in experimental design: experimenters often work within strict budgets and design creative incentives. However, we are not aware of principled study of this setting from a strategic point of view.
 
 
 Our contributions are as follows.
 \begin{itemize}
 \item
-We formulate the problem of experimental design subject to a given budget, in presence of strategic agents who specify their costs. In particular, we focus on linear regression. This is naturally viewed  as a budget feasible mechanism design problem. The objective function is sophisticated and is  related to the covariance of the $x_i$'s. In particular, we formulate the  {\em Experimental Design Problem} (\EDP) as follows: the experimenter \E\ wishes to find set $S$ of subjects to maximize 
+We formulate the problem of experimental design subject to a given budget, in the presence of strategic agents who may lie about their costs. In particular, we focus on linear regression. This is naturally viewed  as a budget feasible mechanism design problem, in which the objective function %is sophisticated and 
+is  related to the covariance of the $x_i$'s. In particular, we formulate the  {\em Experimental Design Problem} (\EDP) as follows: the experimenter \E\ wishes to find set $S$ of subjects to maximize 
 \begin{align}V(S) = \log\det(I_d+\sum_{i\in S}x_i\T{x_i}) \label{obj}\end{align}
-with a  budget constraint $\sum_{i\in S}c_i\leq B$, where $B$ is \E's budget. %, and other {\em strategic constraints} we don't list here.
+subject to a budget constraint $\sum_{i\in S}c_i\leq B$, where $B$ is \E's budget. %, and other {\em strategic constraints} we don't list here.
 The objective function, which is the key,  is obtained by optimizing  the information gain in  $\beta$ when it is learned  through linear regression methods, and is related to the so-called $D$-optimality criterion. 
 \item
 The above objective is submodular. 
@@ -36,14 +38,14 @@ For specific combinatorial problems such as \textsc{Knapsack} or \textsc{Coverag
 deterministic, truthful, polynomial constant-approximation algorithms~\cite{singer-mechanisms,chen, singer-influence}, but they do not work for the linear-algebraic  objective function in \EDP.
 %{\bf S+T: could we verify that the above sentence is correct in its implication?}
 
-We present the first known, polynomial time truthful mechanism for \EDP{}. Our mechanism is a constant factor ($\approx 19.68$) approximation for \EDP{}. From a technical perspective, we present a convex relaxation to \EDP\; and show that it is within a constant factor approximation of a multi-linear relaxation for \EDP{}. This allows us to adopt the prior work in budget feasible mechanisms by Chen \emph{et al.}~\cite{chen} and Singer~\cite{singer-influence}.  In contrast to this, no truthful algorithms are possible for \EDP{}  within a factor 2 approximation. {\bf FIX the last sentence}
+We present the first known, polynomial time truthful mechanism for \EDP{}. Our mechanism is a constant factor ($\approx 19.68$) approximation for \EDP{}. From a technical perspective, we present a convex relaxation to \EDP\; and show that it is within a constant factor approximation of the multi-linear relaxation for \EDP{}. This allows us to adopt the prior work in budget feasible mechanisms by Chen \emph{et al.}~\cite{chen} and Singer~\cite{singer-influence}.  In contrast to this, we show that no truthful, budget-feasible algorithms are possible for \EDP{}  within a factor 2 approximation. %{\bf FIX the last sentence}
 
 \item
 Our approach to mechanisms for experimental design --- by 
- optimizing  the information gain in parameters like $\beta$ which are estimated through the data analysis process --- is general.  We give examples of this approach beyond linear regression to logistic regression, learning binary functions  and others and derive budgeted mechanism design problems with submodular optimization where prior work immediately applies and gives constant factor approximations. We leave it open to get deterministic truthful polynomial time mechanisms for these problems, like we did for \EDP.
+ optimizing  the information gain in parameters like $\beta$ which are estimated through the data analysis process --- is general.  We give examples of this approach beyond linear regression to a general class that includes logistic regression and learning binary functions, and show that the corresponding budgeted mechanism design problem is also expressed through a submodular optimization.  Hence, the prior work \cite{chen,singer-mechanisms} immediately applies, and gives universally truthful, polynomial timem constant factor approximation mechanisms for problems in this class. Getting deterministic truthful polynomial time mechanisms for this class or specific problems in it, like we did for \EDP, remains an open problem.
 \end{itemize}
 
-In what follows, we described related work in Section~\ref{sec:related}. We describe the basics of experimental design and budget feasible mechanisms in Section~\ref{sec:peel} and define \EDP formally. In Section~\ref{sec:main} we present our mechanism for \EDP and prove our main results. In~\ref{sec:ext} we present applications of out general framework. 
+In what follows, we described related work in Section~\ref{sec:related}. We describe the basics of experimental design and budget feasible mechanisms in Section~\ref{sec:peel} and define \EDP\ formally. In Section~\ref{sec:main} we present our mechanism for \EDP\ and prove our main results. In~\ref{sec:ext} we present applications of out general framework. 
 
 \junk{
 
diff --git a/notes.bib b/notes.bib
index 113f6d2..78e7656 100644
--- a/notes.bib
+++ b/notes.bib
@@ -5,6 +5,15 @@
   publisher={Cambridge University Press}
 }
 
+@inproceedings{roth-schoenebeck,
+ author = {Roth, Aaron and Schoenebeck, Grant},
+ title = {Conducting truthful surveys, cheaply},
+ booktitle = {Proceedings of the 13th ACM Conference on Electronic Commerce},
+ year = {2012}
+} 
+
+
+
 @inproceedings{pranav,
 author="Pranav Dandekar and Nadia Fawaz and Stratis Ioannidis",
 title= "Privacy Auctions for Recommender Systems",
@@ -587,23 +596,7 @@ year = 2012
 }
 
 
-@inproceedings{roth-schoenebeck,
- author = {Roth, Aaron and Schoenebeck, Grant},
- title = {Conducting truthful surveys, cheaply},
- booktitle = {Proceedings of the 13th ACM Conference on Electronic Commerce},
- series = {EC '12},
- year = {2012},
- isbn = {978-1-4503-1415-2},
- location = {Valencia, Spain},
- pages = {826--843},
- numpages = {18},
- url = {http://doi.acm.org/10.1145/2229012.2229076},
- doi = {10.1145/2229012.2229076},
- acmid = {2229076},
- publisher = {ACM},
- address = {New York, NY, USA},
- keywords = {mechanism design, privacy},
-} 
+ 
 
 @inproceedings{roth-liggett,
   author    = {Katrina Ligett and
@@ -612,7 +605,6 @@ year = 2012
                at a Cost}},
   booktitle = {Proceedings of the 8th Workshop on Internet and Network Economics},
   series = {WINE '12},
-  pages = {To appear},
   year      = {2012},
 }
 
@@ -626,4 +618,24 @@ number = "2011/005",
 year = "2011"
 }
 
+@inproceedings{valiant,
+  author    = {Leslie G. Valiant},
+  title     = {A Theory of the Learnable},
+  booktitle = {STOC},
+  year      = {1984},
+  pages     = {436-445},
+  ee        = {http://doi.acm.org/10.1145/800057.808710},
+  crossref  = {DBLP:conf/stoc/STOC16},
+  bibsource = {DBLP, http://dblp.uni-trier.de}
+}
+
+@proceedings{DBLP:conf/stoc/STOC16,
+  editor    = {Richard A. DeMillo},
+  title     = {Proceedings of the 16th Annual ACM Symposium on Theory of
+               Computing, April 30 - May 2, 1984, Washington, DC, USA},
+  booktitle = {STOC},
+  publisher = {ACM},
+  year      = {1984},
+  bibsource = {DBLP, http://dblp.uni-trier.de}
+}
 
diff --git a/paper.tex b/paper.tex
index 581191c..def5c72 100644
--- a/paper.tex
+++ b/paper.tex
@@ -4,7 +4,13 @@
 \usepackage{algorithm}
 \usepackage{algpseudocode,bbm,color,verbatim}
 \input{definitions}
-\title{Budgeted Mechanisms for Experimental Design}
+\title{Budget Feasible Mechanisms for Experimental Design}
+
+%Remove permission block empty space
+\makeatletter
+\let\@copyrightspace\relax
+\makeatother
+
 \begin{document}
 
 \maketitle
@@ -12,11 +18,11 @@
 \section{Introduction}
 \input{intro}
 
-\section{Preliminaries}
+\section{Preliminaries}\label{sec:peel}
 \input{problem}
 \section{Mechanism for \EDP{}}
 \input{main}
-\section{Extension to Other Problems}
+\section{Extension to Other Problems}\label{sec:ext}
 \input{general}
 \section{Conclusion}
 \input{conclusion}
diff --git a/related.tex b/related.tex
index b514244..6ecdb12 100644
--- a/related.tex
+++ b/related.tex
@@ -4,8 +4,10 @@
 Budget feasible mechanism design was originally proposed by Singer \cite{singer-mechanisms}. Singer considers the problem of maximizing an arbitrary submodular function subject to a budget constraint in the \emph{value query} model, \emph{i.e.} assuming an  oracle providing the  value of the submodular objective on any given set. 
  Singer shows that there exists a randomized, 112-approximation mechanism for submodular maximization that is \emph{universally truthful} (\emph{i.e.}, it is a randomized mechanism sampled from a distribution over truthful mechanisms). Chen \emph{et al.}~\cite{chen}  improve this result by providing a 7.91-approximate mechanism, and show a corresponding lower bound of $2$ among universally truthful mechanisms for submodular maximization.
 
+\sloppy
 In contrast to the above results, no truthful, constant approximation mechanism that runs in polynomial time is presently known for submodular maximization.  However, assuming access to an oracle providing the optimum in the full-information setup, Chen \emph{et al.},~provide a truthful, $8.34$-approximate mechanism; in cases for which the full information problem is NP-hard, as the one we consider here, this mechanism is not poly-time, unless P=NP.  Chen \emph{et al.}~also prove a $1+\sqrt{2}$ lower bound for truthful mechanisms, improving upon an earlier bound of 2 by Singer \cite{singer-mechanisms}. 
 
+\fussy
 Improved bounds, as well as deterministic polynomial mechanisms, are known for specific submodular objectives. For symmetric submodular functions, a truthful mechanism with approximation ratio 2 is known, and this ratio is tight \cite{singer-mechanisms}. Singer also provides a 7.32-approximate truthful mechanism for the budget feasible version of \textsc{Matching}, and a corresponding lower bound of 2 \cite{singer-mechanisms}. Improving an earlier result by Singer,  Chen \emph{et al.}~\cite{chen}, give a truthful, $2+\sqrt{2}$-approximate mechanism for \textsc{Knapsack}, and a lower bound of $1+\sqrt{2}$. Finally, a truthful, 31-approximate  mechanism is also known for the budgeted version of \textsc{Coverage} \cite{singer-mechanisms,singer-influence}.
 
 Beyond submodular objectives, it is known that no truthful mechanism with approximation ratio smaller than $n^{1/2-\epsilon}$ exists for maximizing  fractionally subadditive functions (a class that includes submodular functions) assuming access to a value query oracle~\cite{singer-mechanisms}. Assuming access to a stronger oracle (the \emph{demand} oracle), there exists
@@ -13,14 +15,16 @@ a truthful, $O(\log^3 n)$-approximate mechanism
 \cite{dobz2011-mechanisms} as well as a universally truthful, $O(\frac{\log n}{\log \log n})$-approximate mechanism for subadditive maximization 
 \cite{bei2012budget}. Moreover, in a Bayesian setup, assuming a prior distribution among the agent's costs, there exists a truthful mechanism with a 768/512-approximation ratio \cite{bei2012budget}. %(in terms of expectations)
 
- A series of recent papers  \cite{mcsherrytalwar,approximatemechanismdesign,xiao:privacy-truthfulness,chen:privacy-truthfulness} consider the related problem of retreiving data from an  \textit{unverified} database: the auctioneer  cannot verify the data reported by individuals and therefore must incentivize them to report this truthfully. 
-McSherry and Talwar \cite{mcsherrytalwar} argue that \emph{differentially private} mechanisms offer a form of \emph{approximate truthfulness}: if users have a utility that depends on their privacy, reporting their data untruthfully can only increase their utility by a small amount. Xiao \cite{xiao:privacy-truthfulness}, improving upon earlier work by Nissim \emph{et al.}~\cite{approximatemechanismdesign} construct mechanisms that
+ A series of recent papers  \cite{mcsherrytalwar,approximatemechanismdesign,xiao:privacy-truthfulness,chen:privacy-truthfulness} consider the related problem of retrieving data from an  \textit{unverified} database: the auctioneer  cannot verify the data reported by individuals and therefore must incentivize them to report this truthfully. 
+McSherry and Talwar \cite{mcsherrytalwar} argue that \emph{differentially private} mechanisms offer a form of \emph{approximate truthfulness}: if users have a utility that depends on their privacy, reporting their data untruthfully can only increase their utility by a small amount. Xiao \cite{xiao:privacy-truthfulness}, improving upon earlier work by Nissim \emph{et al.}~\cite{approximatemechanismdesign}, constructs mechanisms that
 simultaneously achieve exact truthfulness as well as differential privacy. Eliciting private data through a \emph{survey} \cite{roth-liggett}, whereby individuals first decide whether to participate in the survey and then report their data,
- also fall under the unverified database setting \cite{xiao:privacy-truthfulness}. In the \emph{verified} database setting, Ghosh and Roth~\cite{ghosh-roth:privacy-auction} and Dandekar \emph{et al.}~\cite{pranav} consider budgeted auctions where users have a utility again captured by differential privacy. Our work departs from the above setups in that utilities do not involve privacy, whose effects are assumed to be internalized in the costs reported by the users; crucially, we also assume that experiments are tamper-proof, and individuals can misreport their costs but not their values.
+ also falls under the unverified database setting \cite{xiao:privacy-truthfulness}. In the \emph{verified} database setting, Ghosh and Roth~\cite{ghosh-roth:privacy-auction} and Dandekar \emph{et al.}~\cite{pranav} consider budgeted auctions where users have a utility again captured by differential privacy. Our work departs from the above setups in that utilities do not involve privacy, whose effects are assumed to be internalized in the costs reported by the users; crucially, we also assume that experiments are tamper-proof, and individuals can misreport their costs but not their values.
 
-Our work is closest to the survey setup of Roth and Schoenebeck~\cite{roth-schoenebeck}, who also consider how to sample individuals with different features who reported a hidden value at a certain cost. The authors assume a prior on the joint distribution between costs and features, and wish to obtain an unbiased estimate of the expectation of the hidden value under the constraints of truthfulness, budget feasibility and individual rationality. Our work departs by learning a more general statistic (a linear model) than data means. We note that, as in \cite{roth-shoenebeck}, costs and features can be arbitrarily corellated (our results are prior-free).
+\sloppy
+Our work is closest to the survey setup of Roth and Schoenebeck~\cite{roth-schoenebeck}, who also consider how to sample individuals with different features who reported a hidden value at a certain cost. The authors assume a prior on the joint distribution between costs and features, and wish to obtain an unbiased estimate of the expectation of the hidden value under the constraints of truthfulness, budget feasibility and individual rationality. Our work departs by learning a more general statistic (a linear model) than data means. We note that, as in \cite{roth-schoenebeck}, costs and features can be arbitrarily correlated (our results are prior-free).
 
 
+\fussy
 %\stratis{TODO: privacy discussion. Logdet objective. Should be one paragraph each.}
 
 \begin{comment}