summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--general.tex2
-rw-r--r--problem.tex8
2 files changed, 6 insertions, 4 deletions
diff --git a/general.tex b/general.tex
index 589b176..cb2154b 100644
--- a/general.tex
+++ b/general.tex
@@ -67,4 +67,6 @@ eigenvalue is larger than 1. Hence $\log\det R\geq 0$ and an approximation on
$\tilde{V}$ gives an approximation ration on $V$ (see discussion above).
\subsection{Beyond Linear Models}
+Selecting experiments that maximize the information gain in the Bayesian setup leads to a natural generalization to other learning examples beyond linear regression. In particular, suppose that the measurements
+
TODO: Independent noise model. Captures models such as logistic regression, classification, etc. Arbitrary prior. Show that change in the entropy is submodular (cite Krause, Guestrin).
diff --git a/problem.tex b/problem.tex
index 5631f7a..363731a 100644
--- a/problem.tex
+++ b/problem.tex
@@ -121,12 +121,12 @@ Each experiment $i\in
private. In order to obtain the measurement $y_i$, the experimenter needs to
pay agent $i$ a price that exceeds her cost.
-For example, each $i$ may correspond to a human participant; the
+For example, each $i$ may correspond to a human subject; the
feature vector $x_i$ may correspond to a normalized vector of her age, weight,
gender, income, \emph{etc.}, and the measurement $y_i$ may capture some
biometric information (\emph{e.g.}, her red cell blood count, a genetic marker,
-etc.). The cost $c_i$ is the amount the participant deems sufficient to
-incentivize her participation in the study. Note that, in this setup, the feature vectors $x_i$ are public information that the experimenter can consult prior the experiment design. Moreover, though a participant may lie about her true cost $c_i$, she cannot lie about $x_i$ (\emph{i.e.}, all features are verifiable upon collection) or $y_i$ (\emph{i.e.}, she cannot falsify her measurement).
+etc.). The cost $c_i$ is the amount the subject deems sufficient to
+incentivize her participation in the study. Note that, in this setup, the feature vectors $x_i$ are public information that the experimenter can consult prior the experiment design. Moreover, though a subject may lie about her true cost $c_i$, she cannot lie about $x_i$ (\emph{i.e.}, all features are verifiable upon collection) or $y_i$ (\emph{i.e.}, she cannot falsify her measurement).
%\subsection{D-Optimality Criterion}
Ideally, motivated by the $D$-optimality criterion, we would like to design a mechanism that maximizes \eqref{dcrit} within a good approximation ratio. As \eqref{dcrit} may take arbitrarily small negative values, to define a meaningful approximation one would consider the (equivalent) maximization of $V(S) = \det\T{X_S}X_S$. %, for some strictly increasing, on-to function $f:\reals_+\to\reals_+$.
@@ -170,7 +170,7 @@ Though a variety of different measures of information exist in literature (see,
\end{align}
%where $H(\beta)$ is the entropy of the prior distribution and $\entropy(\beta \mid E)$ is the conditional entropy con
\subsection{Linear Regression}
-In this paper, we focus on \emph{linear regression} experiments, aiming to discover a linear function from user data. In particular, we consider a set of $n$ users $\mathcal{N} = \{1,\ldots, n\}$. Each user
+In this paper, we focus on \emph{linear regression} experiments, aiming to discover a linear function from data. In particular, we consider a set of $n$ users $\mathcal{N} = \{1,\ldots, n\}$. Each user
$i\in\mathcal{N}$ has a public vector of features $x_i\in\reals^d$, $\norm{x_i}_2\leq 1$, and an
undisclosed piece of information $y_i\in\reals$.
For example, the features could be the age, weight, or height of user $i$, while the latter can be her propensity to contract a disease.