intro main

author: Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> 2012-11-04 09:57:24 -0800
committer: Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> 2012-11-04 09:57:24 -0800
commit: 0d429224b86ee6b4002e53214c4d2c9222ecbe27 (patch)
tree: d0e62495e61439550e15fb48ee5feb90e9070d3d
parent: 51620a73f4bb1f641e7d14530066f80dc4ee13a3 (diff)
download: recommendation-0d429224b86ee6b4002e53214c4d2c9222ecbe27.tar.gz
3 files changed, 54 insertions, 47 deletions
diff --git a/general.tex b/general.tex
index 5932c5f..12f88e8 100644
--- a/general.tex
+++ b/general.tex
@@ -74,8 +74,8 @@ Selecting experiments that maximize the information gain in the Bayesian setup l
 where $h\in \mathcal{H}$ for some subset $\mathcal{H}$ of all possible mappings $h:\Omega\to\reals$, called the \emph{hypothesis space}, and $\varepsilon_i$ are random variables in $\reals$, not necessarily identically distributed, that are independent \emph{conditioned on $h$}. This model is quite broad, and captures many learning tasks, such as:
 \begin{enumerate}
 \item\textbf{Generalized Linear Regression.} $\Omega=\reals^d$,  $\mathcal{H}$ is the set of linear maps $\{h(x) = \T{\beta}x \text{ s.t. } \beta\in \reals^d\}$, and $\varepsilon_i$ are independent zero-mean normal variables, where  $\expt{\varepsilon_i^2}=\sigma_i$. 
-\item\textbf{Logistic Regression.} $\Omega=\reals^d$,  $\mathcal{H}$ is the set of maps $\{h(x) = \frac{e^{\T{\beta} x}}{1+e^{\T{\beta} x}} \text{ s.t. } \beta\in\reals^d\}$, and $\varepsilon_i$ are independent conditioned on $h$ such that $$\varepsilon_i=\begin{cases} 1- h(x),& \text{w.~prob.}~h(x)\\-h(x),&\text{w.~prob}1-h(x)\end{cases}$$
-\item\textbf{Learning Binary Functions with Bernoulli Noise.} $\Omega = \{0,1\}^d$, and $\mathcal{H}$ is some subset of $2^{\Omega\times\{0,1\}}$, and $$\varepsilon_i =\begin{cases}0, &\text{w.~prob.}p\\\bar{h}(x)-h(x), \text{w.~prob.}1-p\end{cases}$$
+\item\textbf{Logistic Regression.} $\Omega=\reals^d$,  $\mathcal{H}$ is the set of maps $\{h(x) = \frac{e^{\T{\beta} x}}{1+e^{\T{\beta} x}} \text{ s.t. } \beta\in\reals^d\}$, and $\varepsilon_i$ are independent conditioned on $h$ such that $$\varepsilon_i=\begin{cases} 1- h(x_i),& \text{w.~prob.}~h(x_i)\\-h(x_i),&\text{w.~prob.}~1-h(x_i)\end{cases}$$
+\item\textbf{Learning Binary Functions with Bernoulli Noise.} $\Omega = \{0,1\}^d$, and $\mathcal{H}$ is some subset of $2^{\Omega\times\{0,1\}}$, and $$\varepsilon_i =\begin{cases}0, &\text{w.~prob.}~p\\\bar{h}(x_i)-h(x_i), \text{w.~prob.}~1-p\end{cases}$$
 \end{enumerate}
 
 In this setup, assume that the experimenter has a prior distribution on the hypothesis $h\in \mathcal{H}$. Then, the information gain objective can be written again as the mutual information between $\beta$ and $Y_S$.
diff --git a/main.tex b/main.tex
index 048552e..3319e7b 100644
--- a/main.tex
+++ b/main.tex
@@ -5,27 +5,27 @@ In this section we present a mechanism for \EDP. Previous works on maximizing
 submodular functions \cite{nemhauser, sviridenko-submodular} and designing
 auction mechanisms for submodular utility functions \cite{singer-mechanisms,
 chen, singer-influence} rely on a greedy heuristic. In this heuristic, elements
-are added to the solution set according to the following greedy selection rule:
-assume that you have already selected a set $S$, then the next element to be
-included in the solution set is:
-\begin{displaymath}
-    i = \argmax_{j\in\mathcal{N}\setminus S}\frac{V(S\cup\{i\}) - V(S)}{c_i}
-\end{displaymath}
-This is the generalization of the \emph{value-per-cost} ratio used in greedy
-heuristic for knapsack  problems. However, note that for general submodular
-functions, the value of an element depends on the set to which it is added.
+are added to the solution set according to the following greedy selection rule.
+Assume that $S\subseteq \mathcal{N}$ is the solution set constructed thus far; then, the next element to be
+included in $S$ is:
+\begin{align}
+    i = \argmax_{j\in\mathcal{N}\setminus S}\frac{V(S\cup\{i\}) - V(S)}{c_i}\label{greedy}
+\end{align}
+This process terminates when no more items can be added to $S$ using \eqref{greedy} under budget $B$. This is the generalization of the \emph{value-per-cost} ratio used in the greedy
+heuristic for \textsc{Knapsack}. However, in contrast to \textsc{Knapsack}, for general submodular
+functions, the marginal value of an element depends on the set to which it is added.
 
-Unfortunately, even for the non-strategic case, the greedy heuristic gives an
-unbounded approximation ratio. Let us introduce $i^*
-= \argmax_{i\in\mathcal{N}} V(i)$, the element of maximum value (as a singleton
-set). It has been noted by Khuller et al. \cite{khuller} that for the maximum
+Unfortunately, even for the full information case, the greedy heuristic gives an
+unbounded approximation ratio. Let  $i^*
+= \argmax_{i\in\mathcal{N}} V(i)$ be the element of maximum value (as a singleton
+set). It has been noted by Khuller \emph{et al.}~\cite{khuller} that for the maximum
 coverage problem, taking the maximum between the greedy solution and $V(i^*$)
 gives a $\frac{2e}{e-1}$ approximation ratio. In the general case, we have the
 following result from Singer \cite{singer-influence} which follows from
-Chen et al. \cite{chen}:
+Chen \emph{et al.}~\cite{chen}: \stratis{Is it Singer or Chen?}
 
 \begin{lemma}[Singer \cite{singer-influence}]\label{lemma:greedy-bound}
-Let $S_G$ be the set computed by the greedy heuristic and let us define $i^*$:
+Let $S_G$ be the set computed by the greedy heuristic and define $i^*$:
 \begin{displaymath}
     i^* = \argmax_{i\in\mathcal{N}} V(i)
 \end{displaymath}
@@ -42,35 +42,43 @@ applied to the strategic case. Indeed, assume that we are in a situation where
 the mechanism has allocated to the greedy set ($V(S_G) \geq V(i^*)$). If an
 agent $i$ from the greedy set reduces her cost, it could happen that $V(S_G)$
 becomes smaller than $V(i^*)$. In this case the mechanism will allocate to
-$i^*$ and $i$ will be out of the allocated set. This breaks the monotonicity of
-the allocation function.
+$i^*$ and $i$ will be out of the allocated set. This violates the monotonicity of
+the allocation function and hence also truthfulness, by Myerson's theorem.
 
-The way this issue has been addressed thus far \cite{chen, singer-influence},
-is to introduce a third quantity: if $V(i^*)$ is larger than this quantity, the
+To address this, Chen \emph{et al.}~\cite{chen} %and Singer~\cite{singer-influence},
+ introduce a third quantity: if $V(i^*)$ is larger than this quantity, the
 mechanism allocates to $i^*$, otherwise it allocates to the greedy set $S_G$.
 This quantity must be provably close to $V(S_G)$, to keep a bounded
 approximation ratio, while maintaining the monotonicity of the allocation
 algorithm. Furthermore, it must be computable in polynomial time to keep an
-overall polynomial complexity for the allocation algorithm. If the underlying
-non-strategic optimization program \eqref{eq:non-strategic} can be solved in
-polynomial time, Chen et al. \cite{chen} prove that allocating to $i^*$ when
-$V(i^*) \geq C\cdot OPT(V,\mathcal{N}\setminus\{i^*\}$ (for some constant $C$)
-and to $S_G$ otherwise yields a 8.34 approximation mechanism. For specific
-problems, Chen et al. \cite{chen} for knapsack on one hand, Singer
-\cite{singer-influence} for maximum coverage on the other hand, instead compare
-$V(i^*)$ to $OPT(R_{\mathcal{N}\setminus\{i\}}, B)$, where $R_\mathcal{N}$
-denotes a fractional relaxation of the function $V$ over the set
-$\mathcal{N}$. We say that $R_\mathcal{N}$ is a fractional relaxation of $V$
-over the set $\mathcal{N}$, if (a) $R_\mathcal{N}$ is a function defined on the
-hypercube $[0, 1]^{|\mathcal{N}|}$ and (b) for all $S\subseteq\mathcal{N}$, if
-$\mathbf{1}_S$ denotes the indicator vector of $S$ we have
-$R_\mathcal{N}(\mathbf{1}_S) = V(S)$. The optimization program
-\eqref{eq:non-strategic} extends naturally to relaxations:
-\begin{displaymath}
+overall polynomial complexity for the allocation algorithm. 
+
+\sloppy
+When the underlying
+full information problem \eqref{eq:non-strategic} can be solved in
+polynomial time, Chen et al. \cite{chen} prove that  $OPT(V,\mathcal{N}\setminus\{i^*\},B)$ can play the role of this quantity: that is, allocating to $i^*$ when
+$V(i^*) \geq C\cdot OPT(V,\mathcal{N}\setminus\{i^*\},B)$ (for some constant $C$)
+and to $S_G$ otherwise yields a 8.34-approximation mechanism. However, this is not a poly-time mechanism when the underlying problem is NP hard (unless P=NP), as is the case for \EDP.
+
+\fussy
+For  NP-hard 
+problems, Chen et al.~\cite{chen} %for \textsc{Knapsack} and Singer
+%\cite{singer-influence} for \textsc{Coverage} instead 
+propose comparing
+$V(i^*)$ to %$OPT(R_{\mathcal{N}\setminus\{i\}}, B)$, where $R_\mathcal{N}$ denotes 
+the optimal value of a \emph{fractional relaxation} of the function $V$ over the set
+$\mathcal{N}$. A fuction $R_\mathcal{N}:[0, 1]^{|\mathcal{N}|}\to\reals_+$ defined on the hypercube  $[0, 1]^{|\mathcal{N}|}$ is a fractional relaxation of $V$
+over the set $\mathcal{N}$ if %(a) $R_\mathcal{N}$ is a function defined on the hypercube $[0, 1]^{|\mathcal{N}|}$ and (b) 
+  $R_\mathcal{N}(\mathbf{1}_S) = V(S)$ for all $S\subseteq\mathcal{N}$, where
+$\mathbf{1}_S$ denotes the indicator vector of $S$. The optimization program
+\eqref{eq:non-strategic} extends naturally to such relaxations:
+\begin{align}
     OPT(R_\mathcal{N}, B) = \argmax_{\lambda\in[0, 1]^{|\mathcal{N}|}}
     \left\{R_\mathcal{N}(\lambda) \mid \sum_{i=1}^{|\mathcal{N}|} \lambda_i c_i
-    \leq B\right\}
-\end{displaymath}
+    \leq B\right\}\label{relax}
+\end{align}
+To attain truthful constant approximation mechanism for \textsc{Knapsack},  Chen \emph{et al.}~\cite{chen}  substitute $OPT(R_\mathcal{N},B)$ for $OPT(V,\mathcal{N},B)$ in the above program, where $R_{\mathcal{N}}$ are relaxations of the \textsc{Knapsack} objectives.
+Similarly, Singer~\cite{singer-influence} follows the same approach to obtain a mechanism for \textsc{Coverage}.
 
 A relaxation which is commonly used, due to its well-behaved properties in the
 context of maximizing submodular functions, is the \emph{multi-linear}
@@ -87,18 +95,17 @@ $\lambda_i$ or to reject it with probability $1-\lambda_i$:
     P_\mathcal{N}^\lambda(S) = \prod_{i\in S} \lambda_i
     \prod_{i\in\mathcal{N}\setminus S} 1 - \lambda_i
 \end{displaymath}
-For knapsack, Chen et al. \cite{chen} directly use the multi-linear extension as
-the relaxation to compare $V(i^*)$ to. For maximum coverage however, the
+For \textsc{Knapsack}, Chen et al. \cite{chen} directly use the multi-linear extension, as \eqref{relax} can be solve in polynomial time for the \textsc{Knapsack} objective. For \textsc{Coverage} however, the
 optimal value of the multi-linear extension cannot be computed in polynomial
 time. Thus, Singer \cite{singer-influence} introduces a second relaxation of
 the value function which can be proven to be close to the multi-linear
-extension, by using the \emph{pipage rounding} method from Ageev and Sviridenko
+extension, by using the \emph{pipage rounding} method of Ageev and Sviridenko
 \cite{pipage}.
 
 Here, following these ideas, we introduce another relaxation of the value
-function. Note that in our case, the multi-linear extension can be written:
+function of \EDP. Note that in our case, the multi-linear extension can be written:
 \begin{displaymath}
-    F_\mathcal{N}(\lambda) = \mathbb{E}_{S\sim
+    CF_\mathcal{N}(\lambda) = \mathbb{E}_{S\sim
     P_\mathcal{N}^\lambda}\left[\log\det A(S) \right]
     \;\text{with}\; A(S) = I_d + \sum_{i\in S} x_i\T{x_i}
 \end{displaymath}
@@ -110,10 +117,10 @@ in the above formula:
     &= \log\det\left(I_d + \sum_{i\in\mathcal{N}}
     \lambda_i x_i\T{x_i}\right)
 \end{align}
-This function is well-known to be concave and even self-concordant (see \emph{e.g.}
+This function is well-known to be concave and even self-concordant (see \emph{e.g.},
 \cite{boyd2004convex}). In this case, the analysis of Newton's method for
 self-concordant functions in \cite{boyd2004convex}, shows that it is possible
-to find the maxmimum of $L_\mathcal{N}$ to any precision $\varepsilon$ in
+to find the maximum of $L_\mathcal{N}$ to any precision $\varepsilon$ in
 a number of iterations $O(\log\log\varepsilon^{-1})$. The main challenge will
 be to prove that $OPT(L_\mathcal{N}, B)$ is close to $V(S_G)$. To do so, our
 main technical contribution is to prove that $L_\mathcal{N}$ has a bounded
diff --git a/problem.tex b/problem.tex
index 363731a..e3ee84b 100644
--- a/problem.tex
+++ b/problem.tex
@@ -85,7 +85,7 @@ A mechanism is truthful iff  every $i \in \mathcal{N}$ and every two cost vector
         OPT(V,\mathcal{N}, B) \leq \alpha V(S).
     \end{displaymath}
     The approximation ratio captures the \emph{price of truthfulness}, \emph{i.e.},  the relative value loss incurred by adding the truthfulness constraint. % to the value function maximization.
-    \item \emph{Computationally efficient.} The allocation and payment function
+    \item \emph{Computational efficiency.} The allocation and payment function
     should be computable in polynomial time in the number of
     agents $n$. %\thibaut{Should we say something about the black-box model for $V$?  Needed to say something in general, but not in our case where the value function can be computed in polynomial time}.
 \end{itemize}
author	Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)>	2012-11-04 09:57:24 -0800
committer	Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)>	2012-11-04 09:57:24 -0800
commit	0d429224b86ee6b4002e53214c4d2c9222ecbe27 (patch)
tree	d0e62495e61439550e15fb48ee5feb90e9070d3d
parent	51620a73f4bb1f641e7d14530066f80dc4ee13a3 (diff)
download	recommendation-0d429224b86ee6b4002e53214c4d2c9222ecbe27.tar.gz