general

author: Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> 2012-11-04 00:42:18 -0700
committer: Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> 2012-11-04 00:42:18 -0700
commit: 5f00b2d6f1486318794959f5b0735dbfd6f2b3a8 (patch)
tree: ce8a49fd0a22ded3eff2abf6bace470fc55fe5c0 /general.tex
parent: 722aa31fef933d37bbf14c27a4055a20a1df5b2f (diff)
download: recommendation-5f00b2d6f1486318794959f5b0735dbfd6f2b3a8.tar.gz
1 files changed, 22 insertions, 2 deletions
diff --git a/general.tex b/general.tex
index cb2154b..5932c5f 100644
--- a/general.tex
+++ b/general.tex
@@ -67,6 +67,26 @@ eigenvalue is larger than 1. Hence $\log\det R\geq 0$ and an approximation on
 $\tilde{V}$ gives an approximation ration on $V$ (see discussion above).
 
 \subsection{Beyond Linear Models}
-Selecting experiments that maximize the information gain in the Bayesian setup leads to a natural generalization to other learning examples beyond linear regression. In particular, suppose that the measurements
+Selecting experiments that maximize the information gain in the Bayesian setup leads to a natural generalization to other learning examples beyond linear regression. In particular, in the more general PAC learning setup \cite{...}, the features $x_i$, $i\in \mathcal{N}$ take values in some general set $\Omega$, called the \emph{query space}, and measurements $y_i\in\reals$ are given by 
+ \begin{equation}\label{eq:hypothesis-model}
+    y_i = h(x_i) + \varepsilon_i
+\end{equation}
+where $h\in \mathcal{H}$ for some subset $\mathcal{H}$ of all possible mappings $h:\Omega\to\reals$, called the \emph{hypothesis space}, and $\varepsilon_i$ are random variables in $\reals$, not necessarily identically distributed, that are independent \emph{conditioned on $h$}. This model is quite broad, and captures many learning tasks, such as:
+\begin{enumerate}
+\item\textbf{Generalized Linear Regression.} $\Omega=\reals^d$,  $\mathcal{H}$ is the set of linear maps $\{h(x) = \T{\beta}x \text{ s.t. } \beta\in \reals^d\}$, and $\varepsilon_i$ are independent zero-mean normal variables, where  $\expt{\varepsilon_i^2}=\sigma_i$. 
+\item\textbf{Logistic Regression.} $\Omega=\reals^d$,  $\mathcal{H}$ is the set of maps $\{h(x) = \frac{e^{\T{\beta} x}}{1+e^{\T{\beta} x}} \text{ s.t. } \beta\in\reals^d\}$, and $\varepsilon_i$ are independent conditioned on $h$ such that $$\varepsilon_i=\begin{cases} 1- h(x),& \text{w.~prob.}~h(x)\\-h(x),&\text{w.~prob}1-h(x)\end{cases}$$
+\item\textbf{Learning Binary Functions with Bernoulli Noise.} $\Omega = \{0,1\}^d$, and $\mathcal{H}$ is some subset of $2^{\Omega\times\{0,1\}}$, and $$\varepsilon_i =\begin{cases}0, &\text{w.~prob.}p\\\bar{h}(x)-h(x), \text{w.~prob.}1-p\end{cases}$$
+\end{enumerate}
 
-TODO: Independent noise model. Captures models such as logistic regression, classification, etc. Arbitrary prior. Show that change in the entropy is submodular (cite Krause, Guestrin). 
+In this setup, assume that the experimenter has a prior distribution on the hypothesis $h\in \mathcal{H}$. Then, the information gain objective can be written again as the mutual information between $\beta$ and $Y_S$.
+\begin{align}V(S) = \entropy(\beta) -\entropy(\beta\mid Y_S),\quad S\subseteq\mathcal{N} \label{general} \end{align}
+This is a monotone set function, and it clearly satisfies $V(\emptyset)=0$. Though, in general, mutual information is not a submodular function, this specific setup leads indeed to a submodular formulation.
+\begin{theorem}The value function given by the information gain \eqref{general} is submodular.
+\end{theorem}
+\begin{proof}
+The theorem is proved in a slightly different context in \cite{guestrin}; we repeat the proof here for the sake of completeness: \stratis{add one-line proof} 
+\end{proof}
+
+This theorem that implies that learning an \emph{arbitrary hypothesis, under an arbitrary prior} when noise is conditionally independent leads to a submodular value function. Hence the universally truthful 7.91-approximate mechanism by Chen \emph{et al.} \cite{chen} immediately applies in the value query model. Moreover, in cases where maximizing \eqref{general} can be done in polynomial time in the full-information setup, the truthful $8.34$-appoximate mechanism of Chen \emph{et al.}~applies again. However that, in many scenarios covered by this model (inlcuding the last two examples above), even computing the entropy under a given set might be a hard task---\emph{i.e.}, the value query model may not apply. Hence, identifying learning tasks in the above class for which truthful or universally truthful constant approximation mechanisms exist, or studying these problems in the context of stronger query models such as the demand model \cite{dobz2011-mechanisms,bei2012budget} remains an interesting open question. 
+
+%TODO: Independent noise model. Captures models such as logistic regression, classification, etc. Arbitrary prior. Show that change in the entropy is submodular (cite Krause, Guestrin).
author	Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)>	2012-11-04 00:42:18 -0700
committer	Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)>	2012-11-04 00:42:18 -0700
commit	5f00b2d6f1486318794959f5b0735dbfd6f2b3a8 (patch)
tree	ce8a49fd0a22ded3eff2abf6bace470fc55fe5c0 /general.tex
parent	722aa31fef933d37bbf14c27a4055a20a1df5b2f (diff)
download	recommendation-5f00b2d6f1486318794959f5b0735dbfd6f2b3a8.tar.gz