diff options
| author | Thibaut Horel <thibaut.horel@gmail.com> | 2012-10-26 11:13:49 -0700 |
|---|---|---|
| committer | Thibaut Horel <thibaut.horel@gmail.com> | 2012-10-26 11:13:49 -0700 |
| commit | 5810ae8660dffd8805317a83a0d4761ab2018cec (patch) | |
| tree | b0e3018489e62bbb8eb9d826c53b61b6a5b8c1dc | |
| parent | f6b1d3078d0116554d404123ced21664268c3b48 (diff) | |
| download | recommendation-5810ae8660dffd8805317a83a0d4761ab2018cec.tar.gz | |
Some progress in section 2
| -rw-r--r-- | paper.tex | 170 |
1 files changed, 109 insertions, 61 deletions
@@ -44,21 +44,24 @@ \subsection{Notations} -\begin{itemize} - \item If $x\in\mathbf{R}^d$, $x^*$ will denote the transpose of vector $x$. - If $(x, y)\in\mathbf{R}^d\times\mathbf{R}^d$, $x^*y$ is the standard - inner product of $\mathbf{R}^d$. - \item We will often use the following order over symmetric matrices: if $A$ - and $B$ are two $d\times d$ real symmetric matrices, we write that - $A\leq B$ iff: - \begin{displaymath} - \forall x\in\mathbf{R}^d,\quad x^*Ax \leq x^*Bx - \end{displaymath} - That is, iff $B-A$ is symmetric semi-definite positive. - \item Let us recall this property of the ordered defined above: if $A$ and - $B$ are two symmetric definite positive matrices such that $A\leq B$, - then $A^{-1}\leq B^{-1}$. -\end{itemize} +Throughout the paper, we will make use of the following notations: if $x$ is +a (column) vector in $\mathbf{R}^d$, $x^*$ denotes its transposed (line) +vector. Thus, the standard inner product between two vectors $x$ and $y$ is +simply $x^* y$. $\norm{x}_2 = x^*x$ will denote the $L_2$ norm of $x$. + +We will also often use the following order over symmetric matrices: if $A$ and +$B$ are two $d\times d$ and $B$ are two $d\times d$ real symmetric matrices, we +write that $A\leq B$ iff: +\begin{displaymath} + \forall x\in\mathbf{R}^d,\quad + x^*Ax \leq x^*Bx +\end{displaymath} +That is, iff $B-A$ is symmetric semi-definite positive. + +This order let us define the notion of a \emph{decreasing} or \emph{convex} +matrix function similarly to their real counterparts. In particular, let us +recall that the matrix inversion is decreasing and convex over symmetric +definite positive matrices. \subsection{Data model} @@ -77,7 +80,9 @@ we assume a linear model: \forall i\in\mathcal{N},\quad y_i = \beta^* x_i + \varepsilon_i \end{displaymath} where $\beta\in\mathbf{R}^d$ and $\varepsilon_i\in\mathbf{R}$ follows a normal -distribution of mean $0$ and variance $\sigma^2$. +distribution of mean $0$ and variance $\sigma^2$. Furthermore, we assume the +error $\varepsilon$ to be independent of the user: +$(\varepsilon_i)_{i\in\mathcal{N}}$ are mutually independent. After observing the data, the experimenter could simply do linear regression to learn the model parameter $\beta$. However, in a more general setup, the @@ -99,48 +104,91 @@ which is the well-known \emph{ridge regression}. $\mu can thus be seen as linear regression with a regularization term which prevents $\beta$ from having a large $L_2$-norm. +\subsection{Value of data} -\begin{itemize} - \item set of $n$ users $\mathcal{N} = \{1,\ldots, n\}$ - \item each user has a public vector of features $x_i\in\mathbf{R}^d$ and some - undisclosed variable $y_i\in\mathbf{R}$ - \item we assume that the data has been normalized: $\forall - i\in\mathcal{N}$, $\norm{x_i}\leq 1$ - \item \textbf{Ridge regression model:} - \begin{itemize} - \item $y_i = \beta^*x_i + \varepsilon_i$ - \item $\varepsilon_i \sim \mathcal{N}(0,\sigma^2)$, - $(\varepsilon_i)_{i\in \mathcal{N}}$ are mutually independent. - \item prior knowledge of $\beta$: $\beta\sim\mathcal{N}(0,\kappa I_d)$ - \item $\mu = \frac{\kappa}{\sigma^2}$ is the regularization factor. - \item after the variables $y_i$ are disclosed, the experimenter - does \emph{maximum a posteriori estimation} which is equivalent - under Gaussian prior to solving the ridge regression - maximisation problem: - \begin{displaymath} - \beta_{\text{max}} = \argmax \sum_i (y_i - \beta^*x_i)^2 - + \frac{1}{\mu}\sum_i \norm{\beta}_2^2 - \end{displaymath} - \end{itemize} -\end{itemize} +Because the user private variables $y_i$ have not been observed yet when the +experimenter has to decide which users to include in his experiment, we treat +$\beta$ as a random variable whose distribution is updated after observing the +data. -\subsection{Economics} +Let us recall that if $\beta$ is random variable over $\mathbf{R}^d$ whose +probability distribution has a density function $f$ with respect to the +Lebesgue measure, its entropy is given by: +\begin{displaymath} + \mathbb{H}(\beta) \defeq - \int_{b\in\mathbf{R}^d} \log f(b) f(b)\text{d}b +\end{displaymath} -Value function: following the experiment design theory, the value of -data is the decrease of uncertainty about the model. Common measure of -uncertainty, the entropy. Thus, the value of a set $S$ of users is: +A usual way to measure the decrease of uncertainty induced by the observation +of data is to use the entropy. This leads to the following definition of the +value of data called the \emph{value of information}: \begin{displaymath} - V(S) = \mathbb{H}(\beta) - \mathbb{H}(\beta|Y_S) + \forall S\subset\mathcal{N},\quad V(S) = \mathbb{H}(\beta) + - \mathbb{H}(\beta\,|\, + Y_S) \end{displaymath} -where $Y_S = \{y_i,\,i\in S\}$ is the set of observations. +where $Y_S = \{y_i,\,i\in S\}$ is the set of observed data. -In our case (under Gaussian prior): +\begin{theorem} + Under the ridge regression model explained in section TODO, the value of data + is equal to: + \begin{align*} + \forall S\subset\mathcal{N},\; V(S) + & = \frac{1}{2}\log\det\left(I_d + + \mu\sum_{i\in S} x_ix_i^*\right)\\ + & \defeq \frac{1}{2}\log\det A(S) + \end{align*} +\end{theorem} + +\begin{proof} + +Let us denote by $X_S$ the matrix whose rows are the vectors $(x_i^*)_{i\in +S}$. Observe that $A_S$ can simply be written as: +\begin{displaymath} + A_S = I_d + \mu X_S^* X_S +\end{displaymath} + +Let us recall that the entropy of a multivariate normal variable $B$ over +$\mathbf{R}^d$ of covariance $\Sigma I_d$ is given by: +\begin{equation}\label{eq:multivariate-entropy} + \mathbb{H}(B) = \frac{1}{2}\log\big((2\pi e)^d \det \Sigma I_d\big) +\end{equation} + +Using the chain rule for conditional entropy, we get that: +\begin{displaymath} + V(S) = \mathbb{H}(Y_S) - \mathbb{H}(Y_S\,|\,\beta) +\end{displaymath} + +Conditioned on $\beta$, $(Y_S)$ follows a multivariate normal +distribution of mean $X\beta$ and of covariance matrix $\sigma^2 I_n$. Hence: +\begin{equation}\label{eq:h1} + \mathbb{H}(Y_S\,|\,\beta) + = \frac{1}{2}\log\left((2\pi e)^n \det(\sigma^2I_n)\right) +\end{equation} + +$(Y_S)$ also follows a multivariate normal distribution of mean zero. Let us +compute its covariance matrix, $\Sigma_Y$: \begin{align*} - \forall S\subset\mathcal{N},\; V(S) - & = \frac{1}{2}\log\det\left(I_d - + \mu\sum_{i\in S} x_ix_i^*\right)\\ - & \defeq \frac{1}{2}\log\det A(S) + \Sigma_Y & = \expt{YY^*} = \expt{(X_S\beta + E)(X_S\beta + E)^*}\\ + & = \kappa X_S X_S^* + \sigma^2I_n \end{align*} +Thus, we get that: +\begin{equation}\label{eq:h2} + \mathbb{H}(Y_S) + = \frac{1}{2}\log\left((2\pi e)^n \det(\kappa X_S X_S^* + \sigma^2 I_n)\right) +\end{equation} + +Combining \eqref{eq:h1} and \eqref{eq:h2} we get: +\begin{displaymath} + V(S) = \frac{1}{2}\log\det\left(I_n+\frac{\kappa}{\sigma^2}X_S + X_S^*\right) +\end{displaymath} + +Finally, we can use Sylvester's determinant theorem to get the result. +\end{proof} + +It is also interesting to the marginal contribution of a user to a set: the +increase of value induced by adding a user to an already existing set of users. +We have the following lemma. \begin{lemma}[Marginal contribution] \begin{displaymath} @@ -160,18 +208,18 @@ In our case (under Gaussian prior): \end{align*} where the last equality comes from Sylvester's determinant formula. \end{proof} -\begin{itemize} - \item each user $i$ has a cost $c_i$ - \item the auctioneer has a budget constraint $B$ - \item optimisation problem: - \begin{displaymath} - OPT(V,\mathcal{N}, B) = \max_{S\subset\mathcal{N}} \left\{ V(S)\,|\, - \sum_{i\in S}c_i\leq B\right\} - \end{displaymath} -\end{itemize} -Problem, the costs are reported by users, they can lie. It is an auction -problem. Goal design a mechanism (selection and payment scheme) which is: +Because $A(S)$ is symmetric definite positive, the marginal contribution is +positive, which proves that the value function is set increasing. Furthermore, +it is easy to see that if $S\subset S'$, then $A(S)\leq A(S')$. Using the fact +that matrix inversion is decreasing, we see that the marginal contribution of +a fixed user is a set decreasing function. This is the \emph{submodularity} of +the value function. + +\subsection{Auction} + +Explain the optimization problem, why it has to be formulated as an auction +problem. Explain the goals: \begin{itemize} \item truthful \item individually rational |
