diff options
| -rw-r--r-- | general.tex | 2 | ||||
| -rw-r--r-- | problem.tex | 8 |
2 files changed, 6 insertions, 4 deletions
diff --git a/general.tex b/general.tex index 589b176..cb2154b 100644 --- a/general.tex +++ b/general.tex @@ -67,4 +67,6 @@ eigenvalue is larger than 1. Hence $\log\det R\geq 0$ and an approximation on $\tilde{V}$ gives an approximation ration on $V$ (see discussion above). \subsection{Beyond Linear Models} +Selecting experiments that maximize the information gain in the Bayesian setup leads to a natural generalization to other learning examples beyond linear regression. In particular, suppose that the measurements + TODO: Independent noise model. Captures models such as logistic regression, classification, etc. Arbitrary prior. Show that change in the entropy is submodular (cite Krause, Guestrin). diff --git a/problem.tex b/problem.tex index 5631f7a..363731a 100644 --- a/problem.tex +++ b/problem.tex @@ -121,12 +121,12 @@ Each experiment $i\in private. In order to obtain the measurement $y_i$, the experimenter needs to pay agent $i$ a price that exceeds her cost. -For example, each $i$ may correspond to a human participant; the +For example, each $i$ may correspond to a human subject; the feature vector $x_i$ may correspond to a normalized vector of her age, weight, gender, income, \emph{etc.}, and the measurement $y_i$ may capture some biometric information (\emph{e.g.}, her red cell blood count, a genetic marker, -etc.). The cost $c_i$ is the amount the participant deems sufficient to -incentivize her participation in the study. Note that, in this setup, the feature vectors $x_i$ are public information that the experimenter can consult prior the experiment design. Moreover, though a participant may lie about her true cost $c_i$, she cannot lie about $x_i$ (\emph{i.e.}, all features are verifiable upon collection) or $y_i$ (\emph{i.e.}, she cannot falsify her measurement). +etc.). The cost $c_i$ is the amount the subject deems sufficient to +incentivize her participation in the study. Note that, in this setup, the feature vectors $x_i$ are public information that the experimenter can consult prior the experiment design. Moreover, though a subject may lie about her true cost $c_i$, she cannot lie about $x_i$ (\emph{i.e.}, all features are verifiable upon collection) or $y_i$ (\emph{i.e.}, she cannot falsify her measurement). %\subsection{D-Optimality Criterion} Ideally, motivated by the $D$-optimality criterion, we would like to design a mechanism that maximizes \eqref{dcrit} within a good approximation ratio. As \eqref{dcrit} may take arbitrarily small negative values, to define a meaningful approximation one would consider the (equivalent) maximization of $V(S) = \det\T{X_S}X_S$. %, for some strictly increasing, on-to function $f:\reals_+\to\reals_+$. @@ -170,7 +170,7 @@ Though a variety of different measures of information exist in literature (see, \end{align} %where $H(\beta)$ is the entropy of the prior distribution and $\entropy(\beta \mid E)$ is the conditional entropy con \subsection{Linear Regression} -In this paper, we focus on \emph{linear regression} experiments, aiming to discover a linear function from user data. In particular, we consider a set of $n$ users $\mathcal{N} = \{1,\ldots, n\}$. Each user +In this paper, we focus on \emph{linear regression} experiments, aiming to discover a linear function from data. In particular, we consider a set of $n$ users $\mathcal{N} = \{1,\ldots, n\}$. Each user $i\in\mathcal{N}$ has a public vector of features $x_i\in\reals^d$, $\norm{x_i}_2\leq 1$, and an undisclosed piece of information $y_i\in\reals$. For example, the features could be the age, weight, or height of user $i$, while the latter can be her propensity to contract a disease. |
