Finish completing the discussion before the mechanism, some other corrections following Stratis's comments

author: Thibaut Horel <thibaut.horel@gmail.com> 2012-11-03 01:17:54 +0100
committer: Thibaut Horel <thibaut.horel@gmail.com> 2012-11-03 01:17:54 +0100
commit: b1c62876ed5b5d81efcda1a342ac0d29f8a86568 (patch)
tree: 4482913d46ab5b7696d88b544001fddeaf813653
parent: fb9fa7fc2a39eb415e3348b6756d397e6b820d09 (diff)
download: recommendation-b1c62876ed5b5d81efcda1a342ac0d29f8a86568.tar.gz
2 files changed, 87 insertions, 109 deletions
diff --git a/main.tex b/main.tex
index f799410..24c7629 100644
--- a/main.tex
+++ b/main.tex
@@ -35,7 +35,8 @@ unbounded approximation ratio. Let us introduce $i^*
 set). It has been noted by Khuller et al. \cite{khuller} that for the maximum
 coverage problem, taking the maximum between the greedy solution and $V(i^*$)
 gives a $\frac{2e}{e-1}$ approximation ratio. In the general case, we have the
-following result from \cite{singer-influence} which follows from \cite{chen}:
+following result from Singer \cite{singer-influence} which follows from
+Bei et al. \cite{chen}:
 
 \begin{lemma}[Singer \cite{singer-influence}]\label{lemma:greedy-bound}
 Let $S_G$ be the set computed by the greedy heuristic and let us define $i^*$:
@@ -56,27 +57,71 @@ the mechanism has allocated to the greedy set ($V(S_G) \geq V(i^*)$). If an
 agent $i$ from the greedy set reduces her cost, it could happen that $V(S_G)$
 becomes smaller than $V(i^*)$. In this case the mechanism will allocate to
 $i^*$ and $i$ will be out of the allocated set. This breaks the monotonicity of
-the allocation rule.
+the allocation function.
 
-To address this issue, \cite{chen, singer-influence} compare $V(i^*)$ to
-a quantity which can be proven to be close from $V(S_G)$ while keeping
-keeping monotonicity. In \cite{chen}, the authors suggest comparing $V(i^*)$ to
-$OPT(V,\mathcal{N}\setminus\{i^*\}, B)$. While this yields an approximation
-ratio of 8.34, in the general case, this quantity cannot be computed in
-polynomial time. In \cite{chen, singer-influence}, for the specific problems of
-maximum set coverage and knapsack, the authors compare $V(i^*)$ to
-a  fractional relaxation of the optimization program \eqref{eq:non-strategic}.
+The way this issue has been addressed thus far \cite{chen, singer-influence},
+is to introduce a third quantity: if $V(i^*)$ is larger than this quantity, the
+mechanism allocates to $i^*$, otherwise it allocates to the greedy set $S_G$.
+To keep a bounded approximation ratio, this quantity must be provably close to
+$V(S_G)$ while maintaining the monotonicity of the allocation algorithm.
+Furthermore, it must be computable in polynomial time to keep an overall
+polynomial complexity for the allocation algorithm. If the underlying
+non-strategic optimization program \eqref{eq:non-strategic} can be solved in
+polynomial time, Bei et al. \cite{chen} prove that allocating to $i^*$ when
+$V(i^*) \geq C\cdot OPT(V,\mathcal{N}\setminus\{i^*\}$ (for some constant $C$)
+and to $S_G$ otherwise yields a 8.34 approximation mechanism. For specific
+problems, Bei et al. \cite{chen} for knapsack on one hand, Singer
+\cite{singer-influence} for maximum coverage on the other hand, instead compare
+$V(i^*)$ to $OPT(R_{\mathcal{N}\setminus\{i\}}, B)$, where $R_\mathcal{N}$
+denotes a fractional relaxation of the function $V$ over the set
+$\mathcal{N}$.
 
-Here, following the same path as in \cite{chen, singer-influence}, we choose to
-introduce the following relaxation of the  value function.  Let us define the
-function $L_\mathcal{N}$:
+We say that $R_\mathcal{N}$ is a fractional relaxation of $V$ over the set
+$\mathcal{N}$, if (a) $R_\mathcal{N}$ is a function defined on the hypercube
+$[0, 1]^{|\mathcal{N}|}$ and (b) for all $S\subseteq\mathcal{N}$, if
+$\mathbf{1}_S$ denotes the indicator vector of $S$ we have
+$R_\mathcal{N}(\mathbf{1}_S) = V(S)$. A relaxation which is commonly used, due
+to its nice properties in the context of maximizing submodular functions, is the
+\emph{multi-linear} extension:
+\begin{equation}\label{eq:multilinear}
+    F_\mathcal{N}^V(\lambda) 
+    = \mathbb{E}_{S\sim P_\mathcal{N}^\lambda}\left[V(S)\right]
+    = \sum_{S\subseteq\mathcal{N}} P_\mathcal{N}^\lambda(S) V(S)
+\end{equation}
+where $P_\mathcal{N}^\lambda(S)$ is the probability of choosing the set $S$ if
+we decide for each element in $\mathcal{N}$ to pick it with probability
+$\lambda_i$ or to reject it with probability $1-\lambda_i$:
+\begin{displaymath}
+    P_\mathcal{N}^\lambda = \prod_{i\in S} \lambda_i
+    \prod_{i\in\mathcal{N}\setminus S} 1 - \lambda_i
+\end{displaymath}
+For knapsack, Bei et al. \cite{chen} directly use the multi-linear extension as
+the relaxation to compare $V(i^*)$ to. For maximum coverage however, the
+optimal value of the multi-linear extension cannot be computed in polynomial
+time. Thus, Singer \cite{singer-influence} introduces a second relaxation of
+the value function which can be proven to be close to the multi-linear
+extension, by using the \emph{pipage rounding} method from Ageev and Sviridenko
+\cite{pipage}.
+
+Here, following these ideas, we introduce another relaxation of the value
+function. Note that in our case, the multi-linear extension can be written:
 \begin{displaymath}
-    \forall\lambda\in[0,1]^{|\mathcal{N}|}\,\quad L_{\mathcal{N}}(\lambda) \defeq
-    \log\det\left(I_d + \sum_{i\in\mathcal{N}} \lambda_i x_i \T{x_i}\right)
+    F_\mathcal{N}(\lambda) = \mathbb{E}_{S\sim
+    P_\mathcal{N}^\lambda}\left[\log\det A(S) \right]
+    \;\text{with}\; A(S) = I_d + \sum_{i\in S} x_i\T{x_i}
 \end{displaymath}
-Our mechanism will thus consist in comparing $V(i^*)$ to $OPT(L_\mathcal{N},
-B)$ to decide whether we should allocate to $i^*$ or the greedy set $S_G$. The
-main challenge will be to prove that $OPT(L_\mathcal{N}, B)$ is close to
+Our relaxation follows naturally by swapping the expectation and the $\log\det$
+in the above formula:
+\begin{align}\label{eq:concave}
+    L_\mathcal{N}(\lambda) & \defeq \log\det\mathbb{E}_{S\sim
+    P_\mathcal{N}^\lambda} A(S) \\
+    &= \log\det\left(I_d + \sum_{i\in\mathcal{N}}
+    \lambda_i x_i\T{x_i}\right)
+\end{align}
+This function is well-known to be concave (see \emph{e.g.}
+\cite{boyd2004convex}). Hence, its maximum value can be computed in polynomial
+time (TODO elaborate) and can be used as a threshold rule in our mechanism.
+The main challenge will be to prove that $OPT(L_\mathcal{N}, B)$ is close to
 $V(S_G)$. To do so, our main technical contribution is to prove that
 $L_\mathcal{N}$ has a bounded approximation ratio to the value function $V$
 (lemma~\ref{lemma:relaxation}).
@@ -104,21 +149,10 @@ The mechanism for \EDP{} is presented in algorithm \ref{mechanism}.
     \end{algorithmic}
 \end{algorithm}
 
-\emph{Remarks} \stratis{write prose.}
-\begin{enumerate}
-    \item the function $L_\mathcal{N}$ is concave (see
-        lemma~\ref{lemma:concave}) thus the maximization step on line 2 of the
-        mechanism can be computed in polynomial time, which proves that the
-        mechanism overall has a polynomial complexity. \stratis{citation on complexity? are there approximation issues? Also, concavity is well known, cite as approriately.}
-    \item the stopping rule in the while loop is more sophisticated  \stratis{informal} than just
-        checking against the budget constraint ($c(S) \leq B$). This is to
-        ensure budget feasibility (see lemma~\ref{lemma:budget-feasibility}). \stratis{what is $c(S)$?}
-\end{enumerate}
-
 We can now state the main result of this section:
 \begin{theorem}\label{thm:main}
     The mechanism described in Algorithm~\ref{mechanism} is truthful,
-    individually rational and budget feasible. Furthermore, choosing:
+    and budget feasible. Furthermore, choosing:
     \begin{displaymath}
         C = C^* =  \frac{12e-1 + \sqrt{160e^2-48e + 9}}{2(e-1)}
     \end{displaymath}
@@ -128,8 +162,7 @@ We can now state the main result of this section:
     \end{displaymath}
 \end{theorem}
 
-\stratis{Add lowerbound here too. Remove ``pseudo proof'' below. Start new subsection called proof of Thm... }
-
+\stratis{Add lowerbound here too.}
 
 \subsection{Proof of theorem~\ref{thm:main}}
 
@@ -138,15 +171,13 @@ We can now state the main result of this section:
 The mechanism is monotone.
 \end{lemma}
 
-
-\stratis{We assume by contradiction->Suppose, for contradiction}
 \begin{proof}
-    We assume by contradiction that there exists a user $i$ that has been
+    Suppose, for contradiction, that there exists a user $i$ that has been
     selected by the mechanism and that would not be selected had he reported
     a cost $c_i'\leq c_i$ (all the other costs staying the same).
 
     If $i\neq i^*$ and $i$ has been selected, then we are in the case where
-    $L(x^*) \geq C V(i^*)$ and $i$ was included in the result set by the greedy
+    $L_\mathcal{N}(x^*) \geq C V(i^*)$ and $i$ was included in the result set by the greedy
     part of the mechanism. By reporting a cost $c_i'\leq c_i$, using the
     submodularity of $V$, we see that $i$ will satisfy the greedy selection
     rule:
@@ -164,10 +195,10 @@ The mechanism is monotone.
     \end{align*}
     Hence $i$ will still be included in the result set.
 
-    If $i = i^*$, $i$ is included iff $L(x^*) \leq C V(i^*)$. Reporting $c_i'$
-    instead of $c_i$ does not change the value $V(i^*)$ nor $L(x^*)$ (which is
+    If $i = i^*$, $i$ is included iff $L_\mathcal{N}(x^*) \leq C V(i^*)$. Reporting $c_i'$
+    instead of $c_i$ does not change the value $V(i^*)$ nor $L_\mathcal{N}(x^*)$ (which is
     computed over $\mathcal{N}\setminus\{i^*\}$). Thus $i$ is still included by
-    reporting a different cost. \stratis{$L$ lost its subscript}
+    reporting a different cost.
 \end{proof}
 
 \begin{lemma}\label{lemma:budget-feasibility}
@@ -273,67 +304,6 @@ Our main contribution is lemma~\ref{lemma:relaxation} which gives an
 approximation ratio for $L_\mathcal{N}$ and is key in the proof of
 theorem~\ref{thm:main}.
 
-
-\stratis{Given that the relaxation is the major contribution, and it plays a crucial role in your design, you might want to bring a lot of this before the statement of the theorem. Explain how both chen and singer use two relaxations to sandwitch the optimum. How this is motivated by the work by sviridenko. And, finallly, that you bring in this function that has a natural interpretation as the log det of the expectation.}
-
-We say that $R_\mathcal{N}:[0,1]^{|\mathcal{N}|}\rightarrow\reals$ is
-a relaxation of the value function $V$ over $\mathcal{N}$ if it coincides with
-$V$ at binary points. Formally, for any $S\subseteq\mathcal{N}$, let
-$\mathbf{1}_S$ denote the indicator vector of $S$. $R_\mathcal{N}$ is
-a relaxation of $V$ over $\mathcal{N}$ iff:
-\begin{displaymath}
-    \forall S\subseteq\mathcal{N},\; R_\mathcal{N}(\mathbf{1}_S) = V(S)
-\end{displaymath}
-
-We can extend the optimisation problem defined above to a relaxation by
-extending the cost function:
-\begin{displaymath}
-    \forall \lambda\in[0,1]^{|\mathcal{N}|},\; c(\lambda)
-    = \sum_{i\in\mathcal{N}}\lambda_ic_i
-\end{displaymath}
-The optimisation problem becomes:
-\begin{displaymath}
-    OPT(R_\mathcal{N}, B) =
-    \max_{\lambda\in[0,1]^{|\mathcal{N}|}}\left\{R_\mathcal{N}(\lambda)\,|\, c(\lambda)\leq B\right\}
-\end{displaymath}
-
-The relaxations we will consider here rely on defining a probability
-distribution over subsets of $\mathcal{N}$.
-
-Let $\lambda\in[0,1]^{|\mathcal{N}|}$, let us define:
-\begin{displaymath}
-    P_\mathcal{N}^\lambda(S) = \prod_{i\in S}\lambda_i
-    \prod_{i\in\mathcal{N}\setminus S}(1-\lambda_i)
-\end{displaymath}
-$P_\mathcal{N}^\lambda(S)$ is the probability of picking the set $S$ if we select
-a subset of $\mathcal{N}$ at random by deciding independently for each point to
-include it in the set with probability $\lambda_i$ (and to exclude it with
-probability $1-\lambda_i$).
-
-For readability, let us define $A(S) = I_d + \T{X_S}X_S$, so that the value
-function is $V(S) = \log\det A(S)$.
-
-We have already defined the function $L_\mathcal{N}$ which is a relaxation of
-the value function. Note that it is possible to express it in terms of the
-probabilities $P_\mathcal{N}$:
-\begin{align*}
-    L_{\mathcal{N}}(\lambda) 
-    & = \log\det \mathbb{E}_{S\sim P_\mathcal{N}^\lambda}\big[A(S)\big]\\
-    & = \log\det\left(\sum_{S\subseteq N}
-        P_\mathcal{N}^\lambda(S)A(S)\right)\\
-    & = \log\det\left(I_d + \sum_{i\in\mathcal{N}}
-        \lambda_ix_i\T{x_i}\right)\\
-    & \defeq \log\det \tilde{A}(\lambda)
-\end{align*}
-
-We will also use the \emph{multi-linear extension}:
-\begin{align*}
-    F_\mathcal{N}(\lambda) 
-    & = \mathbb{E}_{S\sim P_\mathcal{N}^\lambda}\big[\log\det A(S)\big]\\
-    & = \sum_{S\subseteq\mathcal{N}} P_\mathcal{N}^\lambda(S) V(S)\\
-    & = \sum_{S\subseteq\mathcal{N}} P_\mathcal{N}^\lambda(S) \log\det A(S)\\
-\end{align*}
-
 \begin{lemma}\label{lemma:concave}
     The \emph{concave relaxation} $L_\mathcal{N}$ is concave\footnote{Hence
     this relaxation is well-named!
diff --git a/problem.tex b/problem.tex
index 0ba9db9..a8b89f4 100644
--- a/problem.tex
+++ b/problem.tex
@@ -14,15 +14,23 @@ $y_S=[y_i]_{i\in S}\in\reals^{|S|}$ the observed measurements,
 & = (\T{X_S}X_S)^{-1}X_S^Ty_S\label{leastsquares}\end{align} 
 %The estimator $\hat{\beta}$ is unbiased, \emph{i.e.}, $\expt{\hat{\beta}} = \beta$ (where the expectation is over the noise variables $\varepsilon_i$). Furthermore, $\hat{\beta}$ is a multidimensional normal random variable with mean $\beta$ and covariance matrix $(X_S\T{X_S})^{-1}$. 
 
-Note that the estimator $\hat{\beta}$ is a linear map of $y_S$; as $y_S$ is a multidimensional normal r.v., so is $\hat{\beta}$ (the randomness coming from the noise terms $\varepsilon_i$). In particular, $\hat{\beta}$ has mean $\beta$ (\emph{i.e.}, it is an \emph{unbianced estimator}) and covariance $(\T{X_S}X_S)^{-1}$.
+Note that the estimator $\hat{\beta}$ is a linear map of $y_S$; as $y_S$ is
+a multidimensional normal r.v., so is $\hat{\beta}$ (the randomness coming from
+the noise terms $\varepsilon_i$). In particular, $\hat{\beta}$ has mean $\beta$
+(\emph{i.e.}, it is an \emph{unbiased estimator}) and covariance
+$(\T{X_S}X_S)^{-1}$.
 
 Let $V:2^\mathcal{N}\to\reals$ be a value function, quantifying how informative a set of experiments $S$ is in estimating $\beta$. The standard optimal experimental design problem amounts to finding a set $S$ that maximizes $V(S)$ subject to the constraint $|S|\leq k$. 
 
-A variety of different value functions are used in experimental design\cite{pukelsheim2006optimal}; almost all make use of the the covariance $(\T{X_S}X_S)^{-1}$ of the estimator $\hat{\beta}$. A value functioned preferred because of its relationship to entropy is the \emph{$D$-optimality criterion}: %which yields the following optimization problem
+A variety of different value functions are used in experimental design\cite{pukelsheim2006optimal}; almost all make use of the covariance $(\T{X_S}X_S)^{-1}$ of the estimator $\hat{\beta}$. A value functioned preferred because of its relationship to entropy is the \emph{$D$-optimality criterion}: %which yields the following optimization problem
 \begin{align}
  V(S) &= \frac{1}{2}\log\det \T{X_S}X_S \label{dcrit} %\\
 \end{align}
-As $\hat{\beta}$ is a multidimensional normal random variable, the $D$-optimality criterion is equal (up to a costant) to the negative of the entropy of $\hat{\beta}$. Hence, selecting a set of experiments $S$ that maximizes $V(S)$ is equivalent to finding the set of experiments that minimizes the uncertainty on $\beta$, as captured by the entropy of its estimator.
+As $\hat{\beta}$ is a multidimensional normal random variable, the
+$D$-optimality criterion is equal (up to a constant) to the negative of the
+entropy of $\hat{\beta}$. Hence, selecting a set of experiments $S$ that
+maximizes $V(S)$ is equivalent to finding the set of experiments that minimizes
+the uncertainty on $\beta$, as captured by the entropy of its estimator.
 
 %As discussed in the next section, in this paper, we work with a modified measure of information function, namely 
 %\begin{align}
@@ -45,7 +53,7 @@ maximizing the value $V(S)$ subject to the constraint $\sum_{i\in S} c_i\leq
 B$. We write:
 \begin{equation}\label{eq:non-strategic}
     OPT(V,\mathcal{N}, B) = \max_{S\subseteq\mathcal{N}} \left\{V(S) \mid
-    \sum_{i\in S}\leq B\right\}
+    \sum_{i\in S}c_i\leq B\right\}
 \end{equation}
 the best value we can reach under the budget constraint $B$ when selecting
 experiments from the set $\mathcal{N}$.
@@ -116,11 +124,11 @@ $\inf\{c_i: i\in f(c_i, c_{-i})\}$.
 \end{enumerate}
 \end{theorem}
 
-This theorem is particularly useful when designing a truthful mechanism: we
-can focus on designing a monotone allocation function. Then the mechanism will
-be truthful as long as we give each agent her threshold payment. Finally, it
-suffices to prove that the sum of threshold payments does not exceed the budget
-to ensure budget feasibility.
+This theorem is particularly useful when designing a budget feasible truthful
+mechanism: we can focus on designing a monotone allocation function. Then, the
+mechanism will be truthful as long as we give each agent her threshold payment.
+Finally, it suffices to prove that the sum of threshold payments does not
+exceed the budget to ensure budget feasibility.
 
 \begin{comment}
 \subsection{Experimental Design}
author	Thibaut Horel <thibaut.horel@gmail.com>	2012-11-03 01:17:54 +0100
committer	Thibaut Horel <thibaut.horel@gmail.com>	2012-11-03 01:17:54 +0100
commit	b1c62876ed5b5d81efcda1a342ac0d29f8a86568 (patch)
tree	4482913d46ab5b7696d88b544001fddeaf813653
parent	fb9fa7fc2a39eb415e3348b6756d397e6b820d09 (diff)
download	recommendation-b1c62876ed5b5d81efcda1a342ac0d29f8a86568.tar.gz