diff options
| author | Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> | 2012-11-04 09:57:24 -0800 |
|---|---|---|
| committer | Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> | 2012-11-04 09:57:24 -0800 |
| commit | 0d429224b86ee6b4002e53214c4d2c9222ecbe27 (patch) | |
| tree | d0e62495e61439550e15fb48ee5feb90e9070d3d | |
| parent | 51620a73f4bb1f641e7d14530066f80dc4ee13a3 (diff) | |
| download | recommendation-0d429224b86ee6b4002e53214c4d2c9222ecbe27.tar.gz | |
intro main
| -rw-r--r-- | general.tex | 4 | ||||
| -rw-r--r-- | main.tex | 95 | ||||
| -rw-r--r-- | problem.tex | 2 |
3 files changed, 54 insertions, 47 deletions
diff --git a/general.tex b/general.tex index 5932c5f..12f88e8 100644 --- a/general.tex +++ b/general.tex @@ -74,8 +74,8 @@ Selecting experiments that maximize the information gain in the Bayesian setup l where $h\in \mathcal{H}$ for some subset $\mathcal{H}$ of all possible mappings $h:\Omega\to\reals$, called the \emph{hypothesis space}, and $\varepsilon_i$ are random variables in $\reals$, not necessarily identically distributed, that are independent \emph{conditioned on $h$}. This model is quite broad, and captures many learning tasks, such as: \begin{enumerate} \item\textbf{Generalized Linear Regression.} $\Omega=\reals^d$, $\mathcal{H}$ is the set of linear maps $\{h(x) = \T{\beta}x \text{ s.t. } \beta\in \reals^d\}$, and $\varepsilon_i$ are independent zero-mean normal variables, where $\expt{\varepsilon_i^2}=\sigma_i$. -\item\textbf{Logistic Regression.} $\Omega=\reals^d$, $\mathcal{H}$ is the set of maps $\{h(x) = \frac{e^{\T{\beta} x}}{1+e^{\T{\beta} x}} \text{ s.t. } \beta\in\reals^d\}$, and $\varepsilon_i$ are independent conditioned on $h$ such that $$\varepsilon_i=\begin{cases} 1- h(x),& \text{w.~prob.}~h(x)\\-h(x),&\text{w.~prob}1-h(x)\end{cases}$$ -\item\textbf{Learning Binary Functions with Bernoulli Noise.} $\Omega = \{0,1\}^d$, and $\mathcal{H}$ is some subset of $2^{\Omega\times\{0,1\}}$, and $$\varepsilon_i =\begin{cases}0, &\text{w.~prob.}p\\\bar{h}(x)-h(x), \text{w.~prob.}1-p\end{cases}$$ +\item\textbf{Logistic Regression.} $\Omega=\reals^d$, $\mathcal{H}$ is the set of maps $\{h(x) = \frac{e^{\T{\beta} x}}{1+e^{\T{\beta} x}} \text{ s.t. } \beta\in\reals^d\}$, and $\varepsilon_i$ are independent conditioned on $h$ such that $$\varepsilon_i=\begin{cases} 1- h(x_i),& \text{w.~prob.}~h(x_i)\\-h(x_i),&\text{w.~prob.}~1-h(x_i)\end{cases}$$ +\item\textbf{Learning Binary Functions with Bernoulli Noise.} $\Omega = \{0,1\}^d$, and $\mathcal{H}$ is some subset of $2^{\Omega\times\{0,1\}}$, and $$\varepsilon_i =\begin{cases}0, &\text{w.~prob.}~p\\\bar{h}(x_i)-h(x_i), \text{w.~prob.}~1-p\end{cases}$$ \end{enumerate} In this setup, assume that the experimenter has a prior distribution on the hypothesis $h\in \mathcal{H}$. Then, the information gain objective can be written again as the mutual information between $\beta$ and $Y_S$. @@ -5,27 +5,27 @@ In this section we present a mechanism for \EDP. Previous works on maximizing submodular functions \cite{nemhauser, sviridenko-submodular} and designing auction mechanisms for submodular utility functions \cite{singer-mechanisms, chen, singer-influence} rely on a greedy heuristic. In this heuristic, elements -are added to the solution set according to the following greedy selection rule: -assume that you have already selected a set $S$, then the next element to be -included in the solution set is: -\begin{displaymath} - i = \argmax_{j\in\mathcal{N}\setminus S}\frac{V(S\cup\{i\}) - V(S)}{c_i} -\end{displaymath} -This is the generalization of the \emph{value-per-cost} ratio used in greedy -heuristic for knapsack problems. However, note that for general submodular -functions, the value of an element depends on the set to which it is added. +are added to the solution set according to the following greedy selection rule. +Assume that $S\subseteq \mathcal{N}$ is the solution set constructed thus far; then, the next element to be +included in $S$ is: +\begin{align} + i = \argmax_{j\in\mathcal{N}\setminus S}\frac{V(S\cup\{i\}) - V(S)}{c_i}\label{greedy} +\end{align} +This process terminates when no more items can be added to $S$ using \eqref{greedy} under budget $B$. This is the generalization of the \emph{value-per-cost} ratio used in the greedy +heuristic for \textsc{Knapsack}. However, in contrast to \textsc{Knapsack}, for general submodular +functions, the marginal value of an element depends on the set to which it is added. -Unfortunately, even for the non-strategic case, the greedy heuristic gives an -unbounded approximation ratio. Let us introduce $i^* -= \argmax_{i\in\mathcal{N}} V(i)$, the element of maximum value (as a singleton -set). It has been noted by Khuller et al. \cite{khuller} that for the maximum +Unfortunately, even for the full information case, the greedy heuristic gives an +unbounded approximation ratio. Let $i^* += \argmax_{i\in\mathcal{N}} V(i)$ be the element of maximum value (as a singleton +set). It has been noted by Khuller \emph{et al.}~\cite{khuller} that for the maximum coverage problem, taking the maximum between the greedy solution and $V(i^*$) gives a $\frac{2e}{e-1}$ approximation ratio. In the general case, we have the following result from Singer \cite{singer-influence} which follows from -Chen et al. \cite{chen}: +Chen \emph{et al.}~\cite{chen}: \stratis{Is it Singer or Chen?} \begin{lemma}[Singer \cite{singer-influence}]\label{lemma:greedy-bound} -Let $S_G$ be the set computed by the greedy heuristic and let us define $i^*$: +Let $S_G$ be the set computed by the greedy heuristic and define $i^*$: \begin{displaymath} i^* = \argmax_{i\in\mathcal{N}} V(i) \end{displaymath} @@ -42,35 +42,43 @@ applied to the strategic case. Indeed, assume that we are in a situation where the mechanism has allocated to the greedy set ($V(S_G) \geq V(i^*)$). If an agent $i$ from the greedy set reduces her cost, it could happen that $V(S_G)$ becomes smaller than $V(i^*)$. In this case the mechanism will allocate to -$i^*$ and $i$ will be out of the allocated set. This breaks the monotonicity of -the allocation function. +$i^*$ and $i$ will be out of the allocated set. This violates the monotonicity of +the allocation function and hence also truthfulness, by Myerson's theorem. -The way this issue has been addressed thus far \cite{chen, singer-influence}, -is to introduce a third quantity: if $V(i^*)$ is larger than this quantity, the +To address this, Chen \emph{et al.}~\cite{chen} %and Singer~\cite{singer-influence}, + introduce a third quantity: if $V(i^*)$ is larger than this quantity, the mechanism allocates to $i^*$, otherwise it allocates to the greedy set $S_G$. This quantity must be provably close to $V(S_G)$, to keep a bounded approximation ratio, while maintaining the monotonicity of the allocation algorithm. Furthermore, it must be computable in polynomial time to keep an -overall polynomial complexity for the allocation algorithm. If the underlying -non-strategic optimization program \eqref{eq:non-strategic} can be solved in -polynomial time, Chen et al. \cite{chen} prove that allocating to $i^*$ when -$V(i^*) \geq C\cdot OPT(V,\mathcal{N}\setminus\{i^*\}$ (for some constant $C$) -and to $S_G$ otherwise yields a 8.34 approximation mechanism. For specific -problems, Chen et al. \cite{chen} for knapsack on one hand, Singer -\cite{singer-influence} for maximum coverage on the other hand, instead compare -$V(i^*)$ to $OPT(R_{\mathcal{N}\setminus\{i\}}, B)$, where $R_\mathcal{N}$ -denotes a fractional relaxation of the function $V$ over the set -$\mathcal{N}$. We say that $R_\mathcal{N}$ is a fractional relaxation of $V$ -over the set $\mathcal{N}$, if (a) $R_\mathcal{N}$ is a function defined on the -hypercube $[0, 1]^{|\mathcal{N}|}$ and (b) for all $S\subseteq\mathcal{N}$, if -$\mathbf{1}_S$ denotes the indicator vector of $S$ we have -$R_\mathcal{N}(\mathbf{1}_S) = V(S)$. The optimization program -\eqref{eq:non-strategic} extends naturally to relaxations: -\begin{displaymath} +overall polynomial complexity for the allocation algorithm. + +\sloppy +When the underlying +full information problem \eqref{eq:non-strategic} can be solved in +polynomial time, Chen et al. \cite{chen} prove that $OPT(V,\mathcal{N}\setminus\{i^*\},B)$ can play the role of this quantity: that is, allocating to $i^*$ when +$V(i^*) \geq C\cdot OPT(V,\mathcal{N}\setminus\{i^*\},B)$ (for some constant $C$) +and to $S_G$ otherwise yields a 8.34-approximation mechanism. However, this is not a poly-time mechanism when the underlying problem is NP hard (unless P=NP), as is the case for \EDP. + +\fussy +For NP-hard +problems, Chen et al.~\cite{chen} %for \textsc{Knapsack} and Singer +%\cite{singer-influence} for \textsc{Coverage} instead +propose comparing +$V(i^*)$ to %$OPT(R_{\mathcal{N}\setminus\{i\}}, B)$, where $R_\mathcal{N}$ denotes +the optimal value of a \emph{fractional relaxation} of the function $V$ over the set +$\mathcal{N}$. A fuction $R_\mathcal{N}:[0, 1]^{|\mathcal{N}|}\to\reals_+$ defined on the hypercube $[0, 1]^{|\mathcal{N}|}$ is a fractional relaxation of $V$ +over the set $\mathcal{N}$ if %(a) $R_\mathcal{N}$ is a function defined on the hypercube $[0, 1]^{|\mathcal{N}|}$ and (b) + $R_\mathcal{N}(\mathbf{1}_S) = V(S)$ for all $S\subseteq\mathcal{N}$, where +$\mathbf{1}_S$ denotes the indicator vector of $S$. The optimization program +\eqref{eq:non-strategic} extends naturally to such relaxations: +\begin{align} OPT(R_\mathcal{N}, B) = \argmax_{\lambda\in[0, 1]^{|\mathcal{N}|}} \left\{R_\mathcal{N}(\lambda) \mid \sum_{i=1}^{|\mathcal{N}|} \lambda_i c_i - \leq B\right\} -\end{displaymath} + \leq B\right\}\label{relax} +\end{align} +To attain truthful constant approximation mechanism for \textsc{Knapsack}, Chen \emph{et al.}~\cite{chen} substitute $OPT(R_\mathcal{N},B)$ for $OPT(V,\mathcal{N},B)$ in the above program, where $R_{\mathcal{N}}$ are relaxations of the \textsc{Knapsack} objectives. +Similarly, Singer~\cite{singer-influence} follows the same approach to obtain a mechanism for \textsc{Coverage}. A relaxation which is commonly used, due to its well-behaved properties in the context of maximizing submodular functions, is the \emph{multi-linear} @@ -87,18 +95,17 @@ $\lambda_i$ or to reject it with probability $1-\lambda_i$: P_\mathcal{N}^\lambda(S) = \prod_{i\in S} \lambda_i \prod_{i\in\mathcal{N}\setminus S} 1 - \lambda_i \end{displaymath} -For knapsack, Chen et al. \cite{chen} directly use the multi-linear extension as -the relaxation to compare $V(i^*)$ to. For maximum coverage however, the +For \textsc{Knapsack}, Chen et al. \cite{chen} directly use the multi-linear extension, as \eqref{relax} can be solve in polynomial time for the \textsc{Knapsack} objective. For \textsc{Coverage} however, the optimal value of the multi-linear extension cannot be computed in polynomial time. Thus, Singer \cite{singer-influence} introduces a second relaxation of the value function which can be proven to be close to the multi-linear -extension, by using the \emph{pipage rounding} method from Ageev and Sviridenko +extension, by using the \emph{pipage rounding} method of Ageev and Sviridenko \cite{pipage}. Here, following these ideas, we introduce another relaxation of the value -function. Note that in our case, the multi-linear extension can be written: +function of \EDP. Note that in our case, the multi-linear extension can be written: \begin{displaymath} - F_\mathcal{N}(\lambda) = \mathbb{E}_{S\sim + CF_\mathcal{N}(\lambda) = \mathbb{E}_{S\sim P_\mathcal{N}^\lambda}\left[\log\det A(S) \right] \;\text{with}\; A(S) = I_d + \sum_{i\in S} x_i\T{x_i} \end{displaymath} @@ -110,10 +117,10 @@ in the above formula: &= \log\det\left(I_d + \sum_{i\in\mathcal{N}} \lambda_i x_i\T{x_i}\right) \end{align} -This function is well-known to be concave and even self-concordant (see \emph{e.g.} +This function is well-known to be concave and even self-concordant (see \emph{e.g.}, \cite{boyd2004convex}). In this case, the analysis of Newton's method for self-concordant functions in \cite{boyd2004convex}, shows that it is possible -to find the maxmimum of $L_\mathcal{N}$ to any precision $\varepsilon$ in +to find the maximum of $L_\mathcal{N}$ to any precision $\varepsilon$ in a number of iterations $O(\log\log\varepsilon^{-1})$. The main challenge will be to prove that $OPT(L_\mathcal{N}, B)$ is close to $V(S_G)$. To do so, our main technical contribution is to prove that $L_\mathcal{N}$ has a bounded diff --git a/problem.tex b/problem.tex index 363731a..e3ee84b 100644 --- a/problem.tex +++ b/problem.tex @@ -85,7 +85,7 @@ A mechanism is truthful iff every $i \in \mathcal{N}$ and every two cost vector OPT(V,\mathcal{N}, B) \leq \alpha V(S). \end{displaymath} The approximation ratio captures the \emph{price of truthfulness}, \emph{i.e.}, the relative value loss incurred by adding the truthfulness constraint. % to the value function maximization. - \item \emph{Computationally efficient.} The allocation and payment function + \item \emph{Computational efficiency.} The allocation and payment function should be computable in polynomial time in the number of agents $n$. %\thibaut{Should we say something about the black-box model for $V$? Needed to say something in general, but not in our case where the value function can be computed in polynomial time}. \end{itemize} |
