From 705f11ce5e6cf6dbd41e8a055743d711b8f8fbdb Mon Sep 17 00:00:00 2001 From: Thibaut Horel Date: Mon, 5 Nov 2012 16:27:15 +0100 Subject: More fixes, reapply some lost changes overwritten by previous merge --- intro.tex | 2 +- main.tex | 14 +++++++++----- problem.tex | 4 ++-- related.tex | 4 ++-- 4 files changed, 14 insertions(+), 10 deletions(-) diff --git a/intro.tex b/intro.tex index 2e5c8af..2156be5 100644 --- a/intro.tex +++ b/intro.tex @@ -24,7 +24,7 @@ Our contributions are as follows. We formulate the problem of experimental design subject to a given budget, in presence of strategic agents who specify their costs. In particular, we focus on linear regression. This is naturally viewed as a budget feasible mechanism design problem. The objective function is sophisticated and is related to the covariance of the $x_i$'s. In particular we formulate the {\em Experimental Design Problem} (\EDP) as follows: the experimenter \E\ wishes to find set $S$ of subjects to maximize \begin{align}V(S) = \log\det(I_d+\sum_{i\in S}x_i\T{x_i}) \label{obj}\end{align} with a budget constraint $\sum_{i\in S}c_i\leq B$, where $B$ is \E's budget. %, and other {\em strategic constraints} we don't list here. -The objective function, which is the key, is obtained by optimizing the information gain in $\beta$ when it is learned through linear regression methods, and is the so-called $D$-objective criterion in the literature. +The objective function, which is the key, is obtained by optimizing the information gain in $\beta$ when it is learned through linear regression methods, and is the so-called $D$-optimality criterion in the literature. \item The above objective is submodular. There are several recent results in budget feasible diff --git a/main.tex b/main.tex index 482b1c9..b1f6519 100644 --- a/main.tex +++ b/main.tex @@ -19,8 +19,12 @@ included in $S$ is the one with the highest \emph{marginal-value-per-cost}: \end{align} This process terminates when no more items can be added to $S$ using \eqref{greedy} under budget $B$. This is the generalization of the \emph{value-per-cost} ratio used in the greedy approximation algorithm for \textsc{Knapsack}. However, in contrast to \textsc{Knapsack}, for general submodular -functions, the marginal value of an element in \eqref{greedy} depends on the set to which it the element is added. -% +functions, the marginal value of an element in \eqref{greedy} depends on the set to which the element is added. +Similarly, the value of an element depends on the set in which it is +considered. However, in the following, the value of an element $i$ will refer to +its value as a singleton set: $V(\{i\})$. If there is no ambiguity, we will write +$V(i)$ instead of $V(\{i\})$. + Unfortunately, even for the full information case, the greedy algorithm gives an unbounded approximation ratio. Let $i^* = \argmax_{i\in\mathcal{N}} V(i)$ be the element of maximum value. @@ -28,7 +32,7 @@ unbounded approximation ratio. Let $i^* %It has been noted by Khuller \emph{et al.}~\cite{khuller} that For the maximum coverage problem, taking the maximum between the greedy solution and $V(i^*$) -gives a $\frac{2e}{e-1}$ approximation ratio ~\cite{khuller} . In the general case, +gives a $\frac{2e}{e-1}$ approximation ratio ~\cite{khuller}. In the general case, \junk{we have the following result from Singer \cite{singer-influence} which follows from Chen \emph{et al.}~\cite{chen}: \stratis{Is it Singer or Chen? Also, we need to introduce $V(i)$ somewhere...} @@ -85,8 +89,8 @@ $V(i^*)$ to %$OPT(R_{\mathcal{N}\setminus\{i\}}, B)$, where $R$ denotes the optimal value of a \emph{fractional relaxation} of the function $V$ over the set $\mathcal{N}$. A fuction $R:[0, 1]^{n}\to\reals_+$ defined on the hypercube $[0, 1]^{n}$ is a fractional relaxation of $V$ over the set $\mathcal{N}$ if %(a) $R$ is a function defined on the hypercube $[0, 1]^{n}$ and (b) - $R(\mathbf{1}_S) = V(S)$ for all $S\subseteq\mathcal{N}$, where -$\mathbf{1}_S$ denotes the indicator vector of $S$. The optimization program + $R(\id_S) = V(S)$ for all $S\subseteq\mathcal{N}$, where +$\id_S$ denotes the indicator vector of $S$. The optimization program \eqref{eq:non-strategic} extends naturally to such relaxations: \begin{align} OPT' = \argmax_{\lambda\in[0, 1]^{n}} diff --git a/problem.tex b/problem.tex index f0eeb43..b07cd42 100644 --- a/problem.tex +++ b/problem.tex @@ -134,7 +134,7 @@ incentivize her participation in the study. Note that, in this setup, the featur Ideally, motivated by the $D$-optimality criterion, we would like to design a mechanism that maximizes or approximates \eqref{dcrit} . Since \eqref{dcrit} may take arbitrarily small negative values, to define a meaningful approximation one would consider the (equivalent) maximization of $V(S) = \det\T{X_S}X_S$. %, for some strictly increasing, on-to function $f:\reals_+\to\reals_+$. However, the following lower bound implies that such an optimization goal cannot be attained under the constraints of truthfulness, budget feasibility, and individual rationality. \begin{lemma} -For any $M>1$, there is no $M$-approximate, truthful, budget feasible, individionally rational mechanism for a budget feasible reverse auction with $V(S) = \det{\T{X_S}X_S}$. +For any $M>1$, there is no $M$-approximate, truthful, budget feasible, individually rational mechanism for a budget feasible reverse auction with $V(S) = \det{\T{X_S}X_S}$. \end{lemma} \begin{proof} \input{proof_of_lower_bound1} @@ -151,7 +151,7 @@ This negative result motivates us to study a problem with a modified objective: \end{center} where $I_d\in \reals^{d\times d}$ is the identity matrix. One possible interpretation of \eqref{modified} is that, prior to the auction, the experimenter has free access to $d$ experiments whose features form an orthonormal basis in $\reals^d$. However, \eqref{modified} can also be motivated in the context of \emph{Bayesian experimental design} \cite{chaloner1995bayesian}. In short, the objective \eqref{modified} arises naturally when the experimenter retrieves the model $\beta$ through \emph{ridge regression}, rather than the linear regression \eqref{leastsquares} over the observed data; we explore this connection in Section~\ref{sec:bed}. -Note that maximizing \eqref{modified} is equivalent to maximizing \eqref{dcrit} in the full-information case. In particular, $\det(\T{X_S}X_S)> \det(\T{X_{S'}}X_{S'})$ iff $\det(I_d+\T{X_S}X_S)>\det(I_d+\T{X_{S'}}X_{S'})$. In addition, \eqref{edp}---and the equivalent problem with objective \eqref{dcrit}---are NP-hard; to see this, note that \textsc{Knapsack} reduces to EDP with dimension $d=1$ by mapping the weight of each item $w_i$ to an experiment with $x_i=w_i$. Nevertheless, \eqref{modified} is submodular, monotone and satifies $V(\emptyset) = 0$. +Note that maximizing \eqref{modified} is equivalent to maximizing \eqref{dcrit} in the full-information case. In particular, $\det(\T{X_S}X_S)> \det(\T{X_{S'}}X_{S'})$ iff $\det(I_d+\T{X_S}X_S)>\det(I_d+\T{X_{S'}}X_{S'})$. In addition, \eqref{edp}---and the equivalent problem with objective \eqref{dcrit}---are NP-hard; to see this, note that \textsc{Knapsack} reduces to EDP with dimension $d=1$ by mapping the weight of each item $w_i$ to an experiment with $x_i=w_i$. Nevertheless, \eqref{modified} is submodular, monotone and satisfies $V(\emptyset) = 0$. %, allowing us to use the extensive machinery for the optimization of submodular functions, as well as recent results in the % context of budget feasible auctions \cite{chen,singer-mechanisms}. diff --git a/related.tex b/related.tex index 9573309..cd27ed5 100644 --- a/related.tex +++ b/related.tex @@ -4,7 +4,7 @@ Budget feasible mechanism design was originally proposed by Singer \cite{singer-mechanisms}. Singer considers the problem of maximizing an arbitrary submodular function subject to a budget constraint in the \emph{value query} model, \emph{i.e.} assuming an oracle providing the value of the submodular objective on any given set. Singer shows that there exists a randomized, 112-approximation mechanism for submodular maximization that is \emph{universally truthful} (\emph{i.e.}, it is a randomized mechanism sampled from a distribution over truthful mechanisms). Chen \emph{et al.}~\cite{chen} improve this result by providing a 7.91-approximate mechanism, and show a corresponding lower bound of $2$ among universally truthful mechanisms for submodular maximization. -In contrast to the above results, no truthful, constant approximation mechanism that runs in polynomial time is presently known for submodular maximization. However, assuming access to an oracle providing the optimum in the full-information setup, Chen \emph{et al.},~provide a truthful, $8.34$-appoximate mechanism; in cases for which the full information problem is NP-hard, as the one we consider here, this mechanism is not poly-time, unless P=NP. Chen \emph{et al.}~also prove a $1+\sqrt{2}$ lower bound for truthful mechanisms, improving upon an earlier bound of 2 by Singer \cite{singer-mechanisms}. +In contrast to the above results, no truthful, constant approximation mechanism that runs in polynomial time is presently known for submodular maximization. However, assuming access to an oracle providing the optimum in the full-information setup, Chen \emph{et al.},~provide a truthful, $8.34$-approximate mechanism; in cases for which the full information problem is NP-hard, as the one we consider here, this mechanism is not poly-time, unless P=NP. Chen \emph{et al.}~also prove a $1+\sqrt{2}$ lower bound for truthful mechanisms, improving upon an earlier bound of 2 by Singer \cite{singer-mechanisms}. Improved bounds, as well as deterministic polynomial mechanisms, are known for specific submodular objectives. For symmetric submodular functions, a truthful mechanism with approximation ratio 2 is known, and this ratio is tight \cite{singer-mechanisms}. Singer also provides a 7.32-approximate truthful mechanism for the budget feasible version of \textsc{Matching}, and a corresponding lower bound of 2 \cite{singer-mechanisms}. Improving an earlier result by Singer, Chen \emph{et al.}~\cite{chen}, give a truthful, $2+\sqrt{2}$-approximate mechanism for \textsc{Knapsack}, and a lower bound of $1+\sqrt{2}$. Finally, a truthful, 31-approximate mechanism is also known for the budgeted version of \textsc{Coverage} \cite{singer-mechanisms,singer-influence}. @@ -16,7 +16,7 @@ a truthful, $O(\log^3 n)$-approximate mechanism A series of recent papers \cite{mcsherrytalwar,approximatemechanismdesign,xiao:privacy-truthfulness,chen:privacy-truthfulness} consider the related problem of conducting retrieve data from an \textit{unverified} database: the buyer of the data cannot verify the data reported by individuals and therefore must incentivize them to report truthfully. McSherry and Talwar \cite{mcsherrytalwar} argue that \emph{differentially private} mechanisms also offer a form of \emph{approximate truthfulness}, as the mechanism's output is only slightly perturbed when an individual unilaterally changes her data value; as a result, reporting untruthfully can only increase an individual's utility by a small amount. Xiao \cite{xiao:privacy-truthfulness}, improving upon earlier work by Nissim \emph{et al.}~\cite{approximatemechanismdesign} construct mechanisms that simultaneously achieve exact truthfulness as well as differential privacy. Eliciting private data through a \emph{survey} \cite{roth-liggett}, whereby individuals first decide whether to participate in the survey and then report their data, - also fall under the unverified database setting \cite{xiao:privacy-truthfulness}. Our work differs from the above works in that it does not capture user utility through differential privacy; crucially, we also assume that experiments conducted are \emph{tamper-proof}, and individuals can missreport their costs but not their values. + also fall under the unverified database setting \cite{xiao:privacy-truthfulness}. Our work differs from the above works in that it does not capture user utility through differential privacy; crucially, we also assume that experiments conducted are \emph{tamper-proof}, and individuals can misreport their costs but not their values. Our work is closest to Roth and Schoenebeck \cite{roth-schoenebeck}, who also consider how to sample individuals from a population to obtain an unbiased estimate of a reported value. The authors assume a prior on the joint distribution \stratis{to be continued} -- cgit v1.2.3-70-g09d2 From 431085fd4bb4902922e1a4539e4ba4386537522f Mon Sep 17 00:00:00 2001 From: Thibaut Horel Date: Mon, 5 Nov 2012 17:52:37 +0100 Subject: General --- general.tex | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/general.tex b/general.tex index 46337d6..3815c83 100644 --- a/general.tex +++ b/general.tex @@ -45,7 +45,7 @@ analysis of the approximation ratio, we get the following result which extends Theorem~\ref{thm:main}: \begin{theorem} - There exists a truthful and budget feasible mechanism for the objective + There exists a truthful, individually rational and budget feasible mechanism for the objective function $\tilde{V}$ given by \eqref{eq:normalized}. Furthermore, for any $\varepsilon > 0$, in time $O(\text{poly}(|\mathcal{N}|, d, \log\log \varepsilon^{-1}))$, the algorithm computes a set $S^*$ such that: @@ -64,8 +64,8 @@ Selecting experiments that maximize the information gain in the Bayesian setup l where $h\in \mathcal{H}$ for some subset $\mathcal{H}$ of all possible mappings $h:\Omega\to\reals$, called the \emph{hypothesis space}, and $\varepsilon_i$ are random variables in $\reals$, not necessarily identically distributed, that are independent \emph{conditioned on $h$}. This model is quite broad, and captures many learning tasks, such as: \begin{enumerate} \item\textbf{Generalized Linear Regression.} $\Omega=\reals^d$, $\mathcal{H}$ is the set of linear maps $\{h(x) = \T{\beta}x \text{ s.t. } \beta\in \reals^d\}$, and $\varepsilon_i$ are independent zero-mean normal variables, where $\expt{\varepsilon_i^2}=\sigma_i$. -\item\textbf{Logistic Regression.} $\Omega=\reals^d$, $\mathcal{H}$ is the set of maps $\{h(x) = \frac{e^{\T{\beta} x}}{1+e^{\T{\beta} x}} \text{ s.t. } \beta\in\reals^d\}$, and $\varepsilon_i$ are independent conditioned on $h$ such that $$\varepsilon_i=\begin{cases} 1- h(x_i),& \text{w.~prob.}~h(x_i)\\-h(x_i),&\text{w.~prob.}~1-h(x_i)\end{cases}$$ -\item\textbf{Learning Binary Functions with Bernoulli Noise.} $\Omega = \{0,1\}^d$, and $\mathcal{H}$ is some subset of $2^{\Omega\times\{0,1\}}$, and $$\varepsilon_i =\begin{cases}0, &\text{w.~prob.}~p\\\bar{h}(x_i)-h(x_i), \text{w.~prob.}~1-p\end{cases}$$ +\item\textbf{Logistic Regression.} $\Omega=\reals^d$, $\mathcal{H}$ is the set of maps $\{h(x) = \frac{e^{\T{\beta} x}}{1+e^{\T{\beta} x}} \text{ s.t. } \beta\in\reals^d\}$, and $\varepsilon_i$ are independent conditioned on $h$ such that $$\varepsilon_i=\begin{cases} 1- h(x_i),& \text{w.~prob.}\;h(x_i)\\-h(x_i),&\text{w.~prob.}\;1-h(x_i)\end{cases}$$ +\item\textbf{Learning Binary Functions with Bernoulli Noise.} $\Omega = \{0,1\}^d$, and $\mathcal{H}$ is some subset of $2^{\Omega}$, and $$\varepsilon_i =\begin{cases}0, &\text{w.~prob.}\;p\\\bar{h}(x_i)-h(x_i), &\text{w.~prob.}\;1-p\end{cases}$$ \end{enumerate} In this setup, assume that the experimenter has a prior distribution on the hypothesis $h\in \mathcal{H}$. Then, the information gain objective can be written again as the mutual information between $\beta$ and $y_S$. -- cgit v1.2.3-70-g09d2 From b3c19ae2cbbdeec29f1c383b319119b0672e14f8 Mon Sep 17 00:00:00 2001 From: Thibaut Horel Date: Mon, 5 Nov 2012 17:53:19 +0100 Subject: Spell check in main.tex --- main.tex | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/main.tex b/main.tex index b1f6519..eb1b477 100644 --- a/main.tex +++ b/main.tex @@ -87,7 +87,7 @@ problems, Chen et al.~\cite{chen} %for \textsc{Knapsack} and Singer propose comparing $V(i^*)$ to %$OPT(R_{\mathcal{N}\setminus\{i\}}, B)$, where $R$ denotes the optimal value of a \emph{fractional relaxation} of the function $V$ over the set -$\mathcal{N}$. A fuction $R:[0, 1]^{n}\to\reals_+$ defined on the hypercube $[0, 1]^{n}$ is a fractional relaxation of $V$ +$\mathcal{N}$. A function $R:[0, 1]^{n}\to\reals_+$ defined on the hypercube $[0, 1]^{n}$ is a fractional relaxation of $V$ over the set $\mathcal{N}$ if %(a) $R$ is a function defined on the hypercube $[0, 1]^{n}$ and (b) $R(\id_S) = V(S)$ for all $S\subseteq\mathcal{N}$, where $\id_S$ denotes the indicator vector of $S$. The optimization program @@ -184,7 +184,7 @@ characterization from Singer \cite{singer-mechanisms} gives a formula to We can now state our main result: \begin{theorem}\label{thm:main} - The allocation described in Algorithm~\ref{mechanism}, along with threshold payments, is truthful, individiually rational + The allocation described in Algorithm~\ref{mechanism}, along with threshold payments, is truthful, individually rational and budget feasible. Furthermore, for any $\varepsilon>0$, the mechanism has complexity $O\left(\text{poly}(n, d, \log\log \varepsilon^{-1})\right)$ and returns a set $S^*$ such that: @@ -199,7 +199,7 @@ We can now state our main result: Note that this implies we construct a poly-time mechanism with accuracy arbitrarily close to 19.68, by taking $\varepsilon = \ldots$\stratis{fix me}. In addition, we prove the following lower bound. \begin{theorem}\label{thm:lowerbound} -There is no $2$-approximate, truthful, budget feasible, individionally rational mechanism for EDP. +There is no $2$-approximate, truthful, budget feasible, individually rational mechanism for EDP. \end{theorem} %\stratis{move the proof as appropriate} \begin{proof} -- cgit v1.2.3-70-g09d2 From c94a36daa447f4c7821fad729b80622f59083625 Mon Sep 17 00:00:00 2001 From: Stratis Ioannidis Date: Mon, 5 Nov 2012 09:12:44 -0800 Subject: related --- notes.bib | 22 ++++++++++++++++++++++ related.tex | 9 ++++----- 2 files changed, 26 insertions(+), 5 deletions(-) diff --git a/notes.bib b/notes.bib index 9a49c0e..113f6d2 100644 --- a/notes.bib +++ b/notes.bib @@ -5,6 +5,28 @@ publisher={Cambridge University Press} } +@inproceedings{pranav, +author="Pranav Dandekar and Nadia Fawaz and Stratis Ioannidis", +title= "Privacy Auctions for Recommender Systems", +booktitle = "WINE", +year = 2012 +} +@inproceedings{ghosh-roth:privacy-auction, + author = {Ghosh, Arpita and Roth, Aaron}, + title = {{Selling privacy at auction}}, + booktitle = {{Proc. ACM EC}}, + year = {2011}, + isbn = {978-1-4503-0261-6}, + location = {San Jose, California, USA}, + pages = {199--208}, + numpages = {10}, + doi = {http://doi.acm.org/10.1145/1993574.1993605}, + acmid = {1993605}, + keywords = {auctions, differential privacy}, +} + + + @article{approximatemechanismdesign, author = {Kobbi Nissim and Rann Smorodinsky and diff --git a/related.tex b/related.tex index cd27ed5..b514244 100644 --- a/related.tex +++ b/related.tex @@ -13,13 +13,12 @@ a truthful, $O(\log^3 n)$-approximate mechanism \cite{dobz2011-mechanisms} as well as a universally truthful, $O(\frac{\log n}{\log \log n})$-approximate mechanism for subadditive maximization \cite{bei2012budget}. Moreover, in a Bayesian setup, assuming a prior distribution among the agent's costs, there exists a truthful mechanism with a 768/512-approximation ratio \cite{bei2012budget}. %(in terms of expectations) - A series of recent papers \cite{mcsherrytalwar,approximatemechanismdesign,xiao:privacy-truthfulness,chen:privacy-truthfulness} consider the related problem of conducting retrieve data from an \textit{unverified} database: the buyer of the data cannot verify the data reported by individuals and therefore must incentivize them to report truthfully. -McSherry and Talwar \cite{mcsherrytalwar} argue that \emph{differentially private} mechanisms also offer a form of \emph{approximate truthfulness}, as the mechanism's output is only slightly perturbed when an individual unilaterally changes her data value; as a result, reporting untruthfully can only increase an individual's utility by a small amount. Xiao \cite{xiao:privacy-truthfulness}, improving upon earlier work by Nissim \emph{et al.}~\cite{approximatemechanismdesign} construct mechanisms that + A series of recent papers \cite{mcsherrytalwar,approximatemechanismdesign,xiao:privacy-truthfulness,chen:privacy-truthfulness} consider the related problem of retreiving data from an \textit{unverified} database: the auctioneer cannot verify the data reported by individuals and therefore must incentivize them to report this truthfully. +McSherry and Talwar \cite{mcsherrytalwar} argue that \emph{differentially private} mechanisms offer a form of \emph{approximate truthfulness}: if users have a utility that depends on their privacy, reporting their data untruthfully can only increase their utility by a small amount. Xiao \cite{xiao:privacy-truthfulness}, improving upon earlier work by Nissim \emph{et al.}~\cite{approximatemechanismdesign} construct mechanisms that simultaneously achieve exact truthfulness as well as differential privacy. Eliciting private data through a \emph{survey} \cite{roth-liggett}, whereby individuals first decide whether to participate in the survey and then report their data, - also fall under the unverified database setting \cite{xiao:privacy-truthfulness}. Our work differs from the above works in that it does not capture user utility through differential privacy; crucially, we also assume that experiments conducted are \emph{tamper-proof}, and individuals can misreport their costs but not their values. + also fall under the unverified database setting \cite{xiao:privacy-truthfulness}. In the \emph{verified} database setting, Ghosh and Roth~\cite{ghosh-roth:privacy-auction} and Dandekar \emph{et al.}~\cite{pranav} consider budgeted auctions where users have a utility again captured by differential privacy. Our work departs from the above setups in that utilities do not involve privacy, whose effects are assumed to be internalized in the costs reported by the users; crucially, we also assume that experiments are tamper-proof, and individuals can misreport their costs but not their values. -Our work is closest to Roth and Schoenebeck \cite{roth-schoenebeck}, who also consider how to sample individuals from a population to obtain an unbiased estimate of a reported value. The authors assume a prior on the joint distribution -\stratis{to be continued} +Our work is closest to the survey setup of Roth and Schoenebeck~\cite{roth-schoenebeck}, who also consider how to sample individuals with different features who reported a hidden value at a certain cost. The authors assume a prior on the joint distribution between costs and features, and wish to obtain an unbiased estimate of the expectation of the hidden value under the constraints of truthfulness, budget feasibility and individual rationality. Our work departs by learning a more general statistic (a linear model) than data means. We note that, as in \cite{roth-shoenebeck}, costs and features can be arbitrarily corellated (our results are prior-free). %\stratis{TODO: privacy discussion. Logdet objective. Should be one paragraph each.} -- cgit v1.2.3-70-g09d2