diff options
| author | Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> | 2012-11-05 09:40:54 -0800 |
|---|---|---|
| committer | Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> | 2012-11-05 09:40:54 -0800 |
| commit | 38152e6f14d7c5f91cee3c07e0ff2a00ef7b5a33 (patch) | |
| tree | 37373f53f9be005546346489f0383a36fff6b817 | |
| parent | c0400b1dcc65ea1018ff77f467721fc7d0ae9e4f (diff) | |
| parent | d4d9933432e1f15f9839d0e0ca14ff0f8656b814 (diff) | |
| download | recommendation-38152e6f14d7c5f91cee3c07e0ff2a00ef7b5a33.tar.gz | |
muthu commit
| -rw-r--r-- | general.tex | 6 | ||||
| -rw-r--r-- | intro.tex | 2 | ||||
| -rw-r--r-- | main.tex | 20 | ||||
| -rw-r--r-- | notes.bib | 22 | ||||
| -rw-r--r-- | paper.tex | 3 | ||||
| -rw-r--r-- | problem.tex | 35 | ||||
| -rw-r--r-- | related.tex | 11 |
7 files changed, 58 insertions, 41 deletions
diff --git a/general.tex b/general.tex index 46337d6..3815c83 100644 --- a/general.tex +++ b/general.tex @@ -45,7 +45,7 @@ analysis of the approximation ratio, we get the following result which extends Theorem~\ref{thm:main}: \begin{theorem} - There exists a truthful and budget feasible mechanism for the objective + There exists a truthful, individually rational and budget feasible mechanism for the objective function $\tilde{V}$ given by \eqref{eq:normalized}. Furthermore, for any $\varepsilon > 0$, in time $O(\text{poly}(|\mathcal{N}|, d, \log\log \varepsilon^{-1}))$, the algorithm computes a set $S^*$ such that: @@ -64,8 +64,8 @@ Selecting experiments that maximize the information gain in the Bayesian setup l where $h\in \mathcal{H}$ for some subset $\mathcal{H}$ of all possible mappings $h:\Omega\to\reals$, called the \emph{hypothesis space}, and $\varepsilon_i$ are random variables in $\reals$, not necessarily identically distributed, that are independent \emph{conditioned on $h$}. This model is quite broad, and captures many learning tasks, such as: \begin{enumerate} \item\textbf{Generalized Linear Regression.} $\Omega=\reals^d$, $\mathcal{H}$ is the set of linear maps $\{h(x) = \T{\beta}x \text{ s.t. } \beta\in \reals^d\}$, and $\varepsilon_i$ are independent zero-mean normal variables, where $\expt{\varepsilon_i^2}=\sigma_i$. -\item\textbf{Logistic Regression.} $\Omega=\reals^d$, $\mathcal{H}$ is the set of maps $\{h(x) = \frac{e^{\T{\beta} x}}{1+e^{\T{\beta} x}} \text{ s.t. } \beta\in\reals^d\}$, and $\varepsilon_i$ are independent conditioned on $h$ such that $$\varepsilon_i=\begin{cases} 1- h(x_i),& \text{w.~prob.}~h(x_i)\\-h(x_i),&\text{w.~prob.}~1-h(x_i)\end{cases}$$ -\item\textbf{Learning Binary Functions with Bernoulli Noise.} $\Omega = \{0,1\}^d$, and $\mathcal{H}$ is some subset of $2^{\Omega\times\{0,1\}}$, and $$\varepsilon_i =\begin{cases}0, &\text{w.~prob.}~p\\\bar{h}(x_i)-h(x_i), \text{w.~prob.}~1-p\end{cases}$$ +\item\textbf{Logistic Regression.} $\Omega=\reals^d$, $\mathcal{H}$ is the set of maps $\{h(x) = \frac{e^{\T{\beta} x}}{1+e^{\T{\beta} x}} \text{ s.t. } \beta\in\reals^d\}$, and $\varepsilon_i$ are independent conditioned on $h$ such that $$\varepsilon_i=\begin{cases} 1- h(x_i),& \text{w.~prob.}\;h(x_i)\\-h(x_i),&\text{w.~prob.}\;1-h(x_i)\end{cases}$$ +\item\textbf{Learning Binary Functions with Bernoulli Noise.} $\Omega = \{0,1\}^d$, and $\mathcal{H}$ is some subset of $2^{\Omega}$, and $$\varepsilon_i =\begin{cases}0, &\text{w.~prob.}\;p\\\bar{h}(x_i)-h(x_i), &\text{w.~prob.}\;1-p\end{cases}$$ \end{enumerate} In this setup, assume that the experimenter has a prior distribution on the hypothesis $h\in \mathcal{H}$. Then, the information gain objective can be written again as the mutual information between $\beta$ and $y_S$. @@ -24,7 +24,7 @@ Our contributions are as follows. We formulate the problem of experimental design subject to a given budget, in presence of strategic agents who specify their costs. In particular, we focus on linear regression. This is naturally viewed as a budget feasible mechanism design problem. The objective function is sophisticated and is related to the covariance of the $x_i$'s. In particular we formulate the {\em Experimental Design Problem} (\EDP) as follows: the experimenter \E\ wishes to find set $S$ of subjects to maximize \begin{align}V(S) = \log\det(I_d+\sum_{i\in S}x_i\T{x_i}) \label{obj}\end{align} with a budget constraint $\sum_{i\in S}c_i\leq B$, where $B$ is \E's budget. %, and other {\em strategic constraints} we don't list here. -The objective function, which is the key, is obtained by optimizing the information gain in $\beta$ when it is learned through linear regression methods, and is the so-called $D$-objective criterion in the literature. +The objective function, which is the key, is obtained by optimizing the information gain in $\beta$ when it is learned through linear regression methods, and is the so-called $D$-optimality criterion in the literature. \item The above objective is submodular. There are several recent results in budget feasible @@ -19,8 +19,12 @@ included in $S$ is the one with the highest \emph{marginal-value-per-cost}: \end{align} This process terminates when no more items can be added to $S$ using \eqref{greedy} under budget $B$. This is the generalization of the \emph{value-per-cost} ratio used in the greedy approximation algorithm for \textsc{Knapsack}. However, in contrast to \textsc{Knapsack}, for general submodular -functions, the marginal value of an element in \eqref{greedy} depends on the set to which it the element is added. -% +functions, the marginal value of an element in \eqref{greedy} depends on the set to which the element is added. +Similarly, the value of an element depends on the set in which it is +considered. However, in the following, the value of an element $i$ will refer to +its value as a singleton set: $V(\{i\})$. If there is no ambiguity, we will write +$V(i)$ instead of $V(\{i\})$. + Unfortunately, even for the full information case, the greedy algorithm gives an unbounded approximation ratio. Let $i^* = \argmax_{i\in\mathcal{N}} V(i)$ be the element of maximum value. @@ -28,7 +32,7 @@ unbounded approximation ratio. Let $i^* %It has been noted by Khuller \emph{et al.}~\cite{khuller} that For the maximum coverage problem, taking the maximum between the greedy solution and $V(i^*$) -gives a $\frac{2e}{e-1}$ approximation ratio ~\cite{khuller} . In the general case, +gives a $\frac{2e}{e-1}$ approximation ratio ~\cite{khuller}. In the general case, \junk{we have the following result from Singer \cite{singer-influence} which follows from Chen \emph{et al.}~\cite{chen}: \stratis{Is it Singer or Chen? Also, we need to introduce $V(i)$ somewhere...} @@ -83,10 +87,10 @@ problems, Chen et al.~\cite{chen} %for \textsc{Knapsack} and Singer propose comparing $V(i^*)$ to %$OPT(R_{\mathcal{N}\setminus\{i\}}, B)$, where $R$ denotes the optimal value of a \emph{fractional relaxation} of the function $V$ over the set -$\mathcal{N}$. A fuction $R:[0, 1]^{n}\to\reals_+$ defined on the hypercube $[0, 1]^{n}$ is a fractional relaxation of $V$ +$\mathcal{N}$. A function $R:[0, 1]^{n}\to\reals_+$ defined on the hypercube $[0, 1]^{n}$ is a fractional relaxation of $V$ over the set $\mathcal{N}$ if %(a) $R$ is a function defined on the hypercube $[0, 1]^{n}$ and (b) - $R(\mathbf{1}_S) = V(S)$ for all $S\subseteq\mathcal{N}$, where -$\mathbf{1}_S$ denotes the indicator vector of $S$. The optimization program + $R(\id_S) = V(S)$ for all $S\subseteq\mathcal{N}$, where +$\id_S$ denotes the indicator vector of $S$. The optimization program \eqref{eq:non-strategic} extends naturally to such relaxations: \begin{align} OPT' = \argmax_{\lambda\in[0, 1]^{n}} @@ -180,7 +184,7 @@ characterization from Singer \cite{singer-mechanisms} gives a formula to We can now state our main result: \begin{theorem}\label{thm:main} - The allocation described in Algorithm~\ref{mechanism}, along with threshold payments, is truthful, individiually rational + The allocation described in Algorithm~\ref{mechanism}, along with threshold payments, is truthful, individually rational and budget feasible. Furthermore, for any $\varepsilon>0$, the mechanism has complexity $O\left(\text{poly}(n, d, \log\log \varepsilon^{-1})\right)$ and returns a set $S^*$ such that: @@ -195,7 +199,7 @@ We can now state our main result: Note that this implies we construct a poly-time mechanism with accuracy arbitrarily close to 19.68, by taking $\varepsilon = \ldots$\stratis{fix me}. In addition, we prove the following lower bound. \begin{theorem}\label{thm:lowerbound} -There is no $2$-approximate, truthful, budget feasible, individionally rational mechanism for EDP. +There is no $2$-approximate, truthful, budget feasible, individually rational mechanism for EDP. \end{theorem} %\stratis{move the proof as appropriate} \begin{proof} @@ -5,6 +5,28 @@ publisher={Cambridge University Press} } +@inproceedings{pranav, +author="Pranav Dandekar and Nadia Fawaz and Stratis Ioannidis", +title= "Privacy Auctions for Recommender Systems", +booktitle = "WINE", +year = 2012 +} +@inproceedings{ghosh-roth:privacy-auction, + author = {Ghosh, Arpita and Roth, Aaron}, + title = {{Selling privacy at auction}}, + booktitle = {{Proc. ACM EC}}, + year = {2011}, + isbn = {978-1-4503-0261-6}, + location = {San Jose, California, USA}, + pages = {199--208}, + numpages = {10}, + doi = {http://doi.acm.org/10.1145/1993574.1993605}, + acmid = {1993605}, + keywords = {auctions, differential privacy}, +} + + + @article{approximatemechanismdesign, author = {Kobbi Nissim and Rann Smorodinsky and @@ -9,9 +9,6 @@ \maketitle -\begin{abstract} -TODO -\end{abstract} \section{Intro} \input{intro} diff --git a/problem.tex b/problem.tex index 4835be4..527c34a 100644 --- a/problem.tex +++ b/problem.tex @@ -68,7 +68,7 @@ no positive transfers ($p_i(c)\geq 0$). In addition to the above, mechanism design in budget feasible reverse auctions seeks mechanisms that have the following properties \cite{singer-mechanisms,chen}: \begin{itemize} - \item \emph{Truthfulness.} An agent has no incentive to misreport her true cost. Formally, let $c_{-i}$ + \item \emph{Truthfulness.} An agent has no incentive to missreport her true cost. Formally, let $c_{-i}$ be a vector of costs of all agents except $i$. % $c_{-i}$ being the same). Let $[p_i']_{i\in \mathcal{N}} = p(c_i', % c_{-i})$, then the A mechanism is truthful iff for every $i \in \mathcal{N}$ and every two cost vectors $(c_i,c_{-i})$ and $(c_i',c_{-i})$ @@ -130,25 +130,7 @@ biometric information (\emph{e.g.}, her red cell blood count, a genetic marker, etc.). The cost $c_i$ is the amount the subject deems sufficient to incentivize her participation in the study. Note that, in this setup, the feature vectors $x_i$ are public information that the experimenter can consult prior the experiment design. Moreover, though a subject may lie about her true cost $c_i$, she cannot lie about $x_i$ (\emph{i.e.}, all features are verifiable upon collection) or $y_i$ (\emph{i.e.}, she cannot falsify her measurement). -%\subsection{D-Optimality Criterion} -<<<<<<< HEAD -Ideally, motivated by the $D$-optimality criterion, we would like to design a mechanism that maximizes or approximates \eqref{dcrit} . Since \eqref{dcrit} may take arbitrarily small negative values, to define a meaningful approximation one would consider the (equivalent) maximization of $V(S) = \det\T{X_S}X_S$. %, for some strictly increasing, on-to function $f:\reals_+\to\reals_+$. -However, the following lower bound implies that such an optimization goal cannot be attained under the constraints of truthfulness, budget feasibility, and individual rationality. -\begin{lemma} -For any $M>1$, there is no $M$-approximate, truthful, budget feasible, individionally rational mechanism for a budget feasible reverse auction with $V(S) = \det{\T{X_S}X_S}$. -======= Ideally, motivated by the $D$-optimality criterion, we would like to design a mechanism that maximizes \eqref{dcrit} within a good approximation ratio. In what follows, we consider a slightly more general objective as follows: -\junk{ -As \eqref{dcrit} may take arbitrarily small negative values, to define a meaningful approximation one would consider the (equivalent) maximization of $V(S) = \det\T{X_S}X_S$. %, for some strictly increasing, on-to function $f:\reals_+\to\reals_+$. -However, the following lower bound implies that such an optimization goal cannot be attained under the constraints of truthfulness, budget feasibility, and individual rationality. -\begin{lemma} -For any $M>1$, there is no $M$-approximate, truthful, budget feasible, individually rational mechanism for a budget feasible reverse auction with value function $V(S) = \det{\T{X_S}X_S}$. ->>>>>>> c29302b25adf190f98019eb8ce5f79b10b66d54d -\end{lemma} -\begin{proof} -\input{proof_of_lower_bound1} -\end{proof} -This negative result motivates us to study a problem with a modified objective:} \begin{center} \textsc{ExperimentalDesignProblem} (EDP) \begin{subequations} @@ -158,17 +140,30 @@ This negative result motivates us to study a problem with a modified objective:} \end{align}\label{edp} \end{subequations} \end{center} where $I_d\in \reals^{d\times d}$ is the identity matrix. -One possible interpretation of \eqref{modified} is that, prior to the auction, the experimenter has free access to $d$ experiments whose features form an orthonormal basis in $\reals^d$. However, \eqref{modified} can also be motivated in the context of \emph{Bayesian experimental design} \cite{chaloner1995bayesian}. In short, the objective \eqref{modified} arises naturally when the experimenter retrieves the model $\beta$ through \emph{ridge regression}, rather than the linear regression \eqref{leastsquares} over the observed data; we explore this connection in Section~\ref{sec:bed}. +\junk{One possible interpretation of \eqref{modified} is that, prior to the auction, the experimenter has free access to $d$ experiments whose features form an orthonormal basis in $\reals^d$. However, \eqref{modified} can also be motivated in the context of \emph{Bayesian experimental design} \cite{chaloner1995bayesian}. In short, the objective \eqref{modified} arises naturally when the experimenter retrieves the model $\beta$ through \emph{ridge regression}, rather than the linear regression \eqref{leastsquares} over the observed data; we explore this connection in Section~\ref{sec:bed}. } We present our results with this version of the objective function because it is simple and captures the versions we will need later (including an arbitrary matrix in place of $I_d$ or a zero matrix which will correspond to ~\eqref{dcrit} ). + Note that maximizing \eqref{modified} is equivalent to maximizing \eqref{dcrit} in the full-information case. In particular, $\det(\T{X_S}X_S)> \det(\T{X_{S'}}X_{S'})$ iff $\det(I_d+\T{X_S}X_S)>\det(I_d+\T{X_{S'}}X_{S'})$. In addition, \eqref{edp}---and the equivalent problem with objective \eqref{dcrit}---are NP-hard; to see this, note that \textsc{Knapsack} reduces to EDP with dimension $d=1$ by mapping the weight of each item $w_i$ to an experiment with $x_i=w_i$. Nevertheless, \eqref{modified} is submodular, monotone and satisfies $V(\emptyset) = 0$. %, allowing us to use the extensive machinery for the optimization of submodular functions, as well as recent results in the % context of budget feasible auctions \cite{chen,singer-mechanisms}. %\stratis{A potential problem is that all of the above properties hold also for, \emph{e.g.}, $V(S)=\log(1+\det(\T{X_S}X_S))$\ldots} +\begin{comment}\junk{ +As \eqref{dcrit} may take arbitrarily small negative values, to define a meaningful approximation one would consider the (equivalent) maximization of $V(S) = \det\T{X_S}X_S$. %, for some strictly increasing, on-to function $f:\reals_+\to\reals_+$. +However, the following lower bound implies that such an optimization goal cannot be attained under the constraints of truthfulness, budget feasibility, and individual rationality. +\begin{lemma} +For any $M>1$, there is no $M$-approximate, truthful, budget feasible, individually rational mechanism for a budget feasible reverse auction with value function $V(S) = \det{\T{X_S}X_S}$. +For any $M>1$, there is no $M$-approximate, truthful, budget feasible, individually rational mechanism for a budget feasible reverse auction with $V(S) = \det{\T{X_S}X_S}$. +\end{lemma} +\begin{proof} +\input{proof_of_lower_bound1} +\end{proof} +This negative result motivates us to study a problem with a modified objective:}\end{comment} + \begin{comment} \subsection{Experimental Design} diff --git a/related.tex b/related.tex index 9573309..b514244 100644 --- a/related.tex +++ b/related.tex @@ -4,7 +4,7 @@ Budget feasible mechanism design was originally proposed by Singer \cite{singer-mechanisms}. Singer considers the problem of maximizing an arbitrary submodular function subject to a budget constraint in the \emph{value query} model, \emph{i.e.} assuming an oracle providing the value of the submodular objective on any given set. Singer shows that there exists a randomized, 112-approximation mechanism for submodular maximization that is \emph{universally truthful} (\emph{i.e.}, it is a randomized mechanism sampled from a distribution over truthful mechanisms). Chen \emph{et al.}~\cite{chen} improve this result by providing a 7.91-approximate mechanism, and show a corresponding lower bound of $2$ among universally truthful mechanisms for submodular maximization. -In contrast to the above results, no truthful, constant approximation mechanism that runs in polynomial time is presently known for submodular maximization. However, assuming access to an oracle providing the optimum in the full-information setup, Chen \emph{et al.},~provide a truthful, $8.34$-appoximate mechanism; in cases for which the full information problem is NP-hard, as the one we consider here, this mechanism is not poly-time, unless P=NP. Chen \emph{et al.}~also prove a $1+\sqrt{2}$ lower bound for truthful mechanisms, improving upon an earlier bound of 2 by Singer \cite{singer-mechanisms}. +In contrast to the above results, no truthful, constant approximation mechanism that runs in polynomial time is presently known for submodular maximization. However, assuming access to an oracle providing the optimum in the full-information setup, Chen \emph{et al.},~provide a truthful, $8.34$-approximate mechanism; in cases for which the full information problem is NP-hard, as the one we consider here, this mechanism is not poly-time, unless P=NP. Chen \emph{et al.}~also prove a $1+\sqrt{2}$ lower bound for truthful mechanisms, improving upon an earlier bound of 2 by Singer \cite{singer-mechanisms}. Improved bounds, as well as deterministic polynomial mechanisms, are known for specific submodular objectives. For symmetric submodular functions, a truthful mechanism with approximation ratio 2 is known, and this ratio is tight \cite{singer-mechanisms}. Singer also provides a 7.32-approximate truthful mechanism for the budget feasible version of \textsc{Matching}, and a corresponding lower bound of 2 \cite{singer-mechanisms}. Improving an earlier result by Singer, Chen \emph{et al.}~\cite{chen}, give a truthful, $2+\sqrt{2}$-approximate mechanism for \textsc{Knapsack}, and a lower bound of $1+\sqrt{2}$. Finally, a truthful, 31-approximate mechanism is also known for the budgeted version of \textsc{Coverage} \cite{singer-mechanisms,singer-influence}. @@ -13,13 +13,12 @@ a truthful, $O(\log^3 n)$-approximate mechanism \cite{dobz2011-mechanisms} as well as a universally truthful, $O(\frac{\log n}{\log \log n})$-approximate mechanism for subadditive maximization \cite{bei2012budget}. Moreover, in a Bayesian setup, assuming a prior distribution among the agent's costs, there exists a truthful mechanism with a 768/512-approximation ratio \cite{bei2012budget}. %(in terms of expectations) - A series of recent papers \cite{mcsherrytalwar,approximatemechanismdesign,xiao:privacy-truthfulness,chen:privacy-truthfulness} consider the related problem of conducting retrieve data from an \textit{unverified} database: the buyer of the data cannot verify the data reported by individuals and therefore must incentivize them to report truthfully. -McSherry and Talwar \cite{mcsherrytalwar} argue that \emph{differentially private} mechanisms also offer a form of \emph{approximate truthfulness}, as the mechanism's output is only slightly perturbed when an individual unilaterally changes her data value; as a result, reporting untruthfully can only increase an individual's utility by a small amount. Xiao \cite{xiao:privacy-truthfulness}, improving upon earlier work by Nissim \emph{et al.}~\cite{approximatemechanismdesign} construct mechanisms that + A series of recent papers \cite{mcsherrytalwar,approximatemechanismdesign,xiao:privacy-truthfulness,chen:privacy-truthfulness} consider the related problem of retreiving data from an \textit{unverified} database: the auctioneer cannot verify the data reported by individuals and therefore must incentivize them to report this truthfully. +McSherry and Talwar \cite{mcsherrytalwar} argue that \emph{differentially private} mechanisms offer a form of \emph{approximate truthfulness}: if users have a utility that depends on their privacy, reporting their data untruthfully can only increase their utility by a small amount. Xiao \cite{xiao:privacy-truthfulness}, improving upon earlier work by Nissim \emph{et al.}~\cite{approximatemechanismdesign} construct mechanisms that simultaneously achieve exact truthfulness as well as differential privacy. Eliciting private data through a \emph{survey} \cite{roth-liggett}, whereby individuals first decide whether to participate in the survey and then report their data, - also fall under the unverified database setting \cite{xiao:privacy-truthfulness}. Our work differs from the above works in that it does not capture user utility through differential privacy; crucially, we also assume that experiments conducted are \emph{tamper-proof}, and individuals can missreport their costs but not their values. + also fall under the unverified database setting \cite{xiao:privacy-truthfulness}. In the \emph{verified} database setting, Ghosh and Roth~\cite{ghosh-roth:privacy-auction} and Dandekar \emph{et al.}~\cite{pranav} consider budgeted auctions where users have a utility again captured by differential privacy. Our work departs from the above setups in that utilities do not involve privacy, whose effects are assumed to be internalized in the costs reported by the users; crucially, we also assume that experiments are tamper-proof, and individuals can misreport their costs but not their values. -Our work is closest to Roth and Schoenebeck \cite{roth-schoenebeck}, who also consider how to sample individuals from a population to obtain an unbiased estimate of a reported value. The authors assume a prior on the joint distribution -\stratis{to be continued} +Our work is closest to the survey setup of Roth and Schoenebeck~\cite{roth-schoenebeck}, who also consider how to sample individuals with different features who reported a hidden value at a certain cost. The authors assume a prior on the joint distribution between costs and features, and wish to obtain an unbiased estimate of the expectation of the hidden value under the constraints of truthfulness, budget feasibility and individual rationality. Our work departs by learning a more general statistic (a linear model) than data means. We note that, as in \cite{roth-shoenebeck}, costs and features can be arbitrarily corellated (our results are prior-free). %\stratis{TODO: privacy discussion. Logdet objective. Should be one paragraph each.} |
