Merge branch 'master' of ssh://palosgit01/git/data_value

author: Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> 2012-11-05 13:14:05 -0800
committer: Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> 2012-11-05 13:14:05 -0800
commit: a4ca8e7701d146f95a927595de13cd2dc9ef42b7 (patch)
tree: e312debdca55e41f84135d1cc4e26c6706bc6191
parent: b0d1e82017eb270398a5747ff2b46969e720b52a (diff)
parent: 5dcc97d94169ba8f7fa6bb5d486cd80f7f85e155 (diff)
download: recommendation-a4ca8e7701d146f95a927595de13cd2dc9ef42b7.tar.gz
2 files changed, 47 insertions, 35 deletions
diff --git a/general.tex b/general.tex
index 65b3c8f..5162887 100644
--- a/general.tex
+++ b/general.tex
@@ -6,7 +6,7 @@
 %as our objective.  Moreover, 
 
 We extend Theorem~\ref{thm:main} to a more general
-Bayesian setting where it is assumed that the experimenter  \E\ has a {\em prior}
+Bayesian setting, where it is assumed that the experimenter  \E\ has a {\em prior}
 distribution on $\beta$: in particular, $\beta$ has a multivariate normal prior
 with zero mean  and covariance $\sigma^2R^{-1}\in \reals^{d^2}$ (where $\sigma^2$ is the noise variance). 
 \E\ estimates $\beta$ through \emph{maximum a posteriori estimation}: \emph{i.e.}, finding the parameter which maximizes the posterior distribution of $\beta$ given the observations $y_S$. Under the linearity assumption \eqref{model} and the Gaussian prior on $\beta$, maximum a posteriori estimation leads to the following maximization \cite{hastie}: 
@@ -34,7 +34,7 @@ clearly follows from \eqref{bayesianobjective}
 by setting $R=I_d$. Hence, the optimization discussed thus far can be interpreted as
 a maximization of the information gain when the prior distribution has
 a covariance $\sigma^2 I_d$, and the experimenter is solving a ridge regression
-problem with penalty term $\norm{x}_2^2$. 
+problem with penalty term $\norm{\beta}_2^2$. 
 
 Our results can be extended to the general Bayesian case, by
 replacing $I_d$ with the positive semidefinite matrix $R$. First, we re-set the
@@ -63,7 +63,7 @@ where $\mu$ is the smallest eigenvalue of $R$.
 
 \subsection{$D$-Optimality and Beyond}
 
-We now reexamine the classical $D$-optimality in \eqref{dcrit}, which is~\ref{bayesianobjective} with $R$ replaced by
+We now reexamine the classical $D$-optimality in \eqref{dcrit}, which is given by objective ~\eqref{bayesianobjective} with $R$ replaced by
 the zero matrix. 
 Since  \eqref{dcrit} may take arbitrarily small negative values, to define a meaningful approximation one would consider  the (equivalent) maximization of $V(S) = \det\T{X_S}X_S$. %, for some strictly increasing, on-to function $f:\reals_+\to\reals_+$. 
 However, the following lower bound implies that such an optimization goal cannot be attained under the constraints of  truthfulness, budget feasibility, and individual rationality.
@@ -75,13 +75,12 @@ For any $M>1$, there is no $M$-approximate, truthful, budget feasible, individua
 \input{proof_of_lower_bound1}
 \end{proof}
 
-Beyond $D$-optimality, 
-{\bf STRATIC, fill in..}
+Beyond $D$-optimality, several other objectives such as $E$-optimality (maximizing the smallest eigenvalue of $\T{X_S}X_S$) or $T$-optimality (maximizing $\mathrm{trace}(\T{X_S}{X_S}))$ are encountered in the literature \cite{pukelsheim2006optimal}, though they do not relate to entropy as $D$-optimality. We leave the task of approaching the maximization of such objectives from a strategic point of view as an open problem.
 
 \subsection{Beyond Linear Models}
 Selecting experiments that maximize the information gain in the Bayesian setup
 leads to a natural generalization to other learning examples beyond linear
-regression. In particular, in the more general PAC learning setup \cite{valiant}, the features $x_i$, $i\in \mathcal{N}$ take values in some general set $\Omega$, called the \emph{query space}, and measurements $y_i\in\reals$ are given by 
+regression. In particular, in the more general PAC learning setup \cite{valiant}, the features $x_i$, $i\in \mathcal{N}$ take values in some general set $\Omega$, called the \emph{query space}. Measurements $y_i\in\reals$ are given by 
  \begin{equation}\label{eq:hypothesis-model}
     y_i = h(x_i) + \varepsilon_i
 \end{equation}
@@ -89,8 +88,8 @@ where $h\in \mathcal{H}$ for some subset $\mathcal{H}$ of all possible mappings
 $h:\Omega\to\reals$, called the \emph{hypothesis space}, and $\varepsilon_i$
 are random variables in $\reals$, not necessarily identically distributed, that
 are independent \emph{conditioned on $h$}. This model is quite broad, and
-captures many learning tasks, such as support vector machines (SVM), linear
-discriminant analysis (LDA), and the following tasks:
+captures many learning tasks, such as support vector machines (SVM) and linear
+discriminant analysis (LDA); we give a few concrete examples below:
 \begin{enumerate}
 \item\textbf{Generalized Linear Regression.} $\Omega=\reals^d$,  $\mathcal{H}$ is the set of linear maps $\{h(x) = \T{\beta}x \text{ s.t. } \beta\in \reals^d\}$, and $\varepsilon_i$ are independent zero-mean normal variables, where  $\expt{\varepsilon_i^2}=\sigma_i$. 
 \item\textbf{Logistic Regression.} $\Omega=\reals^d$,  $\mathcal{H}$ is the set of maps $\{h(x) = \frac{e^{\T{\beta} x}}{1+e^{\T{\beta} x}} \text{ s.t. } \beta\in\reals^d\}$, and $\varepsilon_i$ are independent conditioned on $h$ such that $$\varepsilon_i=\begin{cases} 1- h(x_i),& \text{w.~prob.}\;h(x_i)\\-h(x_i),&\text{w.~prob.}\;1-h(x_i)\end{cases}$$
diff --git a/main.tex b/main.tex
index f3939ec..f1eae87 100644
--- a/main.tex
+++ b/main.tex
@@ -40,7 +40,7 @@ following result from Singer \cite{singer-influence} which follows from
 Chen \emph{et al.}~\cite{chen}: \stratis{Is it Singer or Chen? Also, we need to introduce $V(i)$ somewhere...}
 }
 }
-\begin{lemma}~\cite{singer-influence}\label{lemma:greedy-bound}
+\begin{lemma}~\cite{chen}\label{lemma:greedy-bound}
 Let $S_G$ be the set computed by the greedy algorithm and let 
 %define $i^*$:
 %\begin{displaymath}
@@ -56,7 +56,7 @@ Thus,
 taking the maximum between $V(S_G)$ and $V(i^*)$ yields an approximation
 ratio of $\frac{5e}{e-1}$.  However, 
 this approach breaks incentive compatibility and therefore cannot be directly
-applied to the strategic case.~\cite{singer-influence}
+applied to the strategic case~\cite{singer-influence}.
 \junk{
 Indeed, suppose this allocation mechanism is used, and consider a case where
 the allocates  the greedy set ($V(S_G) \geq V(i^*)$). If an
@@ -71,7 +71,7 @@ For the strategic case,
 \item
 When the underlying
 full information problem \eqref{eq:non-strategic} can be solved in
-polynomial time, Chen et al. \cite{chen} prove that  
+polynomial time, Chen \emph{et al.}~\cite{chen} prove that  
 %$OPT_{-i^*}$, the optimal solution to \eqref{eq:non-strategic} when $i^*$ is excluded from set $\mathcal{N}$, can play the %role of this quantity: that is, 
 allocating to $i^*$ when
 $V(i^*) \geq C\cdot OPT_{-i^*}$ (for some constant $C$)
@@ -187,7 +187,7 @@ follows naturally by swapping the expectation and the $\log\det$
 in the above formula:
 \begin{align}\label{eq:concave}
     L(\lambda) & \defeq \log\det\Big(\mathbb{E}_{S\sim
-    P_\mathcal{N}^\lambda}\big[\log\det (I_d + \sum_{i\in S} x_i\T{x_i})\big]\Big)\notag \\
+    P_\mathcal{N}^\lambda}\big[I_d + \sum_{i\in S} x_i\T{x_i}\big]\Big)\notag \\
     &= \log\det\left(I_d + \sum_{i\in\mathcal{N}}
     \lambda_i x_i\T{x_i}\right)
 \end{align}
@@ -197,7 +197,7 @@ self-concordant functions in \cite{boyd2004convex}, shows that it is possible
 to find the maximum of $L$ to any precision $\varepsilon$ in
 a number of iterations $O(\log\log\varepsilon^{-1})$. 
 The main challenge will
-be to prove that $OPT'$ in~\ref{relax}, for the relaxation $R=L$, is close to $V(S_G)$, which we will address later and is the technical
+be to prove that $OPT'$ in~\eqref{relax}, for the relaxation $R=L$, is close to $V(S_G)$, which we will address later and is the technical
 bulk of the 
 paper. 
 \junk{
@@ -235,7 +235,7 @@ The resulting mechanism for \EDP{} is composed of
  the allocation function
 presented in Algorithm~\ref{mechanism}, and 
 \item
-he payment function which pays each
+the payment function which pays each
 allocated agent $i$ her threshold payment as described in Myerson's Theorem. 
 %The constant $C$ is an absolute
 %constant that determined in Section~\ref{sec:proofofmainthm}
@@ -250,6 +250,7 @@ characterization from~\cite{singer-mechanisms} gives a formula to
 
 We can now state our main result:
 \begin{theorem}\label{thm:main}
+    \sloppy
     The allocation described in Algorithm~\ref{mechanism}, along with threshold payments, is truthful, individually rational 
     and budget feasible. Furthermore, for any $\varepsilon>0$, the mechanism
   runs in time  $O\left(\text{poly}(n, d,
@@ -261,8 +262,11 @@ We can now state our main result:
         & \simeq 19.68V(S^*) + \varepsilon
     \end{align*}
 \end{theorem}
+
+\fussy
+
 %\stratis{Add lowerbound here too.}
-Note that this implies we construct a poly-time mechanism with accuracy arbitrarily close to 19.68, by taking $\varepsilon = \ldots$\stratis{fix me}.
+%Note that this implies we construct a poly-time mechanism with accuracy arbitrarily close to 19.68, by taking $\varepsilon = \ldots$\stratis{fix me}.
 In addition, we prove the following simple lower bound.
 \begin{theorem}\label{thm:lowerbound}
 There is no $2$-approximate, truthful, budget feasible, individually rational mechanism for EDP. 
@@ -280,7 +284,7 @@ We now present the proof of Theorem~\ref{thm:main}. Truthfulness and individual
 rationality follows from monotonicity and threshold payments.  Monotonicity and
 budget feasibility follow more or less from the analysis of Chen \emph{et al.} \cite{chen};
 we briefly restate the proofs below for the sake of completeness.
- Our proof of the approximation ratio and uses a bound on our concave relaxation
+ Our proof of the approximation ratio uses a bound on our concave relaxation
 $L$ (Lemma~\ref{lemma:relaxation}). This is our main technical
 contribution; the proof of this lemma can be found in Section~\ref{sec:relaxation}.
 \begin{lemma}\label{lemma:monotone}
@@ -374,7 +378,6 @@ following lemma  which establishes that $OPT'$, the optimal value \eqref{relax}
 \end{lemma}
 The proof of Lemma~\ref{lemma:relaxation} is our main technical contribution, and can be found in Section \ref{sec:relaxation}.
 
-
 \paragraph{Finishing Proof of Theorem~\ref{thm:main} }
 Note that the lower bound $OPT' \geq  OPT
 $ also holds trivially, as $L$ is a fractional relaxation of $V$ over $\mathcal{N}$.
@@ -447,8 +450,6 @@ This solution is:
     gives the approximation ratio in \eqref{approxbound}, and concludes the proof of Theorem~\ref{thm:main}.\hspace*{\stretch{1}}\qed
 %\end{proof}
 
-
-
 \subsection{Proof of Lemma~\ref{lemma:relaxation}}\label{sec:relaxation}
 %Recall that, since $L$ is a fractional relaxation of  $V$, 
 %\begin{displaymath}
@@ -470,12 +471,12 @@ This is called  \emph{cross-convexity}  in the literature (see, \emph{e.g.},
 \cite{dughmi}), or $\varepsilon$-convexity by Ageev and Sviridenko~\cite{pipage}. 
 }
 
-
 We prove that our multi-linear function $F$ has a property 
  which allows to trade one fractional component of the solution for another until
 one of them becomes integral, without losing any value.
 This property is referred to in the literature as \emph{cross-convexity} (see, \emph{e.g.},
-\cite{dughmi}), or $\varepsilon$-convexity by Ageev and Sviridenko~\cite{pipage} .
+\cite{dughmi}), or $\varepsilon$-convexity by Ageev and
+Sviridenko~\cite{pipage}.
 \junk{
 
 a variant of the $\varepsilon$-convexity of the multi-linear
@@ -556,7 +557,6 @@ a point $\lambda$, we mean that it satisfies the budget constraint $\sum_{i=1}^n
     $\lambda_\varepsilon$ becomes integral.
 \end{proof}
 
-
 Next, we prove the central result of bounding $L$  appropriately in terms of $F$.
 
 \junk{
@@ -597,21 +597,21 @@ For all $\lambda\in[0,1]^{n},$
             \end{aligned}\right.
     \end{displaymath}
     Hence:
-    \begin{multline*}
-        \partial_i F =
+    \begin{displaymath}
+        \partial_i F(\lambda) =
         \sum_{\substack{S\subseteq\mathcal{N}\\ i\in S}}
         P_{\mathcal{N}\setminus\{i\}}^\lambda(S\setminus\{i\})V(S)
         - \sum_{\substack{S\subseteq\mathcal{N}\\ i\in \mathcal{N}\setminus S}}
         P_{\mathcal{N}\setminus\{i\}}^\lambda(S)V(S)
-    \end{multline*}
+    \end{displaymath}
     Now, using that every $S$ such that $i\in S$ can be uniquely written as
     $S'\cup\{i\}$, we can write:
-    \begin{multline*}
-        \partial_i F =
+    \begin{displaymath}
+        \partial_i F(\lambda) =
         \sum_{\substack{S\subseteq\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
         P_{\mathcal{N}\setminus\{i\}}^\lambda(S)\big(V(S\cup\{i\})
         - V(S)\big)
-    \end{multline*}
+    \end{displaymath}
     The marginal contribution of $i$ to
     $S$ can be written as 
 \begin{align*}
@@ -632,12 +632,24 @@ Using this,
         \log\Big(1 + \T{x_i}A(S)^{-1}x_i\Big)
     \end{displaymath}
      The computation of the derivative of $L$ uses standard matrix
-    calculus and gives:
+    calculus. Writing $\tilde{A}(\lambda) = I_d+\sum_{i\in
+    \mathcal{N}}\lambda_ix_i\T{x_i}$:
+    \begin{align*}
+        \det \tilde{A}(\lambda + h\cdot e_i) & = \det\big(\tilde{A}(\lambda)
+        + hx_i\T{x_i}\big)\\
+        & =\det \tilde{A}(\lambda)\big(1+
+        h\T{x_i}\tilde{A}(\lambda)^{-1}x_i\big)
+    \end{align*}
+    Hence:
+    \begin{displaymath}
+       \log\det\tilde{A}(\lambda + h\cdot e_i)= \log\det\tilde{A}(\lambda)
+        + h\T{x_i}\tilde{A}(\lambda)^{-1}x_i + o(h)
+    \end{displaymath}
+    Finally:
     \begin{displaymath}
         \partial_i L(\lambda)
         =\frac{1}{2} \T{x_i}\tilde{A}(\lambda)^{-1}x_i
     \end{displaymath}
-    where $\tilde{A}(\lambda) = I_d+\sum_{i\in \mathcal{N}}\lambda_ix_i\T{x_i}$. 
 
 For two symmetric matrices $A$ and $B$, we write $A\succ B$ ($A\succeq B$) if $A-B$ is positive definite (positive semi-definite).
 This order allows us to define the notion of a \emph{decreasing} as well as  \emph{convex}
@@ -692,15 +704,16 @@ In particular,
     \begin{align*}
         \partial_i F(\lambda) &\geq
         \frac{1}{4}
-        \T{x_i}\bigg(\sum_{S\subseteq\mathcal{N}}P_\mathcal{N}^\lambda(S)A(S)\bigg)^{-1}x_i
-         \geq \frac{1}{2}
-        \partial_i L(\lambda)    \end{align*}
+        \T{x_i}\bigg(\sum_{S\subseteq\mathcal{N}}P_\mathcal{N}^\lambda(S)A(S)\bigg)^{-1}x_i\\
+        & = \frac{1}{4}\T{x_i}\tilde{A}(\lambda)^{-1}x_i\\
+        & = \frac{1}{2}
+     \partial_i L(\lambda)
+    \end{align*}
 
 Having bound the ratio between the partial derivatives, we now bound the ratio $F(\lambda)/L(\lambda)$ from below. Consider the following cases.
     First, if the minimum of the ratio
     $F(\lambda)/L(\lambda)$ is attained at a point interior to the hypercube, then it is
-    a critical point, \emph{i.e.}, $\partial_i \big(F(\lambda)/L(\lambda)\big)=0$ for all $i\in \mathcal{N}$; hence, at such a critical point is
-    characterized by:
+    a critical point, \emph{i.e.}, $\partial_i \big(F(\lambda)/L(\lambda)\big)=0$ for all $i\in \mathcal{N}$; hence, at such a critical point:
     \begin{equation}\label{eq:lhopital}
         \frac{F(\lambda)}{L(\lambda)}
         = \frac{\partial_i F(\lambda)}{\partial_i
author	Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)>	2012-11-05 13:14:05 -0800
committer	Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)>	2012-11-05 13:14:05 -0800
commit	a4ca8e7701d146f95a927595de13cd2dc9ef42b7 (patch)
tree	e312debdca55e41f84135d1cc4e26c6706bc6191
parent	b0d1e82017eb270398a5747ff2b46969e720b52a (diff)
parent	5dcc97d94169ba8f7fa6bb5d486cd80f7f85e155 (diff)
download	recommendation-a4ca8e7701d146f95a927595de13cd2dc9ef42b7.tar.gz