conf

author: Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> 2012-10-30 08:28:13 -0700
committer: Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> 2012-10-30 08:28:13 -0700
commit: 2d54b5d27ffd2782aec0bab8200b5b9e55585805 (patch)
tree: 7921aeb0ed2a13a0e9ecb29c2ee26dbe0acaf6ae /problem.tex
parent: dacff6f8d498ef281066742305db90d1121d7f3b (diff)
parent: fe8aa3a0134a9493a2443a0965392254a7125094 (diff)
download: recommendation-2d54b5d27ffd2782aec0bab8200b5b9e55585805.tar.gz
1 files changed, 27 insertions, 28 deletions
diff --git a/problem.tex b/problem.tex
index c3fd38f..db7108b 100644
--- a/problem.tex
+++ b/problem.tex
@@ -15,7 +15,7 @@ For example, the features could be the age, weight, or height of user $i$, while
 
 The variables $x_i$ and $y_i$ are related through the following relationship:
 \begin{align}
-     y_i = \beta^T x_i + \varepsilon_i,\quad\forall i\in\mathcal{N},\label{model}
+     y_i = \T{\beta} x_i + \varepsilon_i,\quad\forall i\in\mathcal{N},\label{model}
 \end{align}
 where $\beta\in\reals^d$ is the unknown parameter and $\varepsilon_i\in\reals$ are independent, normally distributed random variables, with zero mean and variance $\sigma^2_0$. 
 The experimenter wishes to learn $\beta$ by observing the undisclosed $y_i$ of a subset of users $i\in S$. Hence, the set of possible experiments comprises all random variables $y_{S}\in\reals^{|S|}$, $S\subseteq \mathcal{N}$.
@@ -31,7 +31,7 @@ I_d$. Under this prior and the linear model \eqref{model}, the value of informat
     \begin{align}\label{vs}
         V(S)
         & \defeq I(\beta;y_S) = \frac{1}{2}\log\det\left(I_d
-            + \mu\sum_{i\in S} x_ix_i^T\right), 
+            + \mu\sum_{i\in S} x_i\T{x_i}\right), 
      %   & \defeq \frac{1}{2}\log\det A(S)
     \end{align}
 where $\mu = \sigma_1^2/\sigma_0^2$.
@@ -40,7 +40,7 @@ Some intuition behind the Gaussian prior on $\beta$ and the role that parameter
 It is easy to show that, under the linearity assumption \eqref{model} and the gaussian prior on $\beta$, maximum a posteriori estimation leads to the following maximization
 problem:
 \begin{displaymath}
-    \hat{\beta} = \argmin_{\beta\in\reals^d} \sum_i (y_i - \beta^*x_i)^2
+    \hat{\beta} = \argmin_{\beta\in\reals^d} \sum_i (y_i - \T{\beta}x_i)^2
     + \frac{1}{\mu}\sum_i \norm{\beta}_2^2
 \end{displaymath}
 This optimization, commonly known as \emph{ridge regression}, reduces to a least squares fit for $\mu=\infty$. For finite $\mu$, ridge regression
@@ -55,18 +55,17 @@ problem. Explain the goals:
     \item individually rational
     \item budget feasible
     \item has a good approximation ratio
-
-TODO Explain what is already known: it is ok when the function is submodular. When
-should we introduce the notion of submodularity?
 \end{itemize}
 
+\thibaut{TODO Explain what is already known: it is ok when the function is submodular. When
+should we introduce the notion of submodularity?}
 
 \begin{comment}
 \subsection{Linear and Ridge Regression}
 
 Linear regression, a true workhorse of statistical analysis, fits a linear function to a set of observable input and output variables. In particular, consider a set of $n$ value pairs $(x_i,y_i)$, $i\in \mathcal{N} =\{1,\ldots,n\}$. The variables $x_i\in \reals^d$ are usually referred to as \emph{input} variables or \emph{covariates} and $y_i\in \reals$ are referred to as  \emph{output} or \emph{response} variables. For example, the former could be the age, weight, or height of a person, while the latter can be their propensity to contract a disease. Assume that input and output variables are related through the following relationship:
  \begin{displaymath}
-     y_i = \beta^* x_i + \varepsilon_i,\quad\forall i\in\mathcal{N},
+     y_i = \T{\beta} x_i + \varepsilon_i,\quad\forall i\in\mathcal{N},
 \end{displaymath}
 where $\beta\in\reals^d$, is called the \emph{model parameter vector} and $\varepsilon_i\in\reals$ are independent, normally distributed random variables, with zero mean and variance $\sigma^2$. 
 
@@ -85,7 +84,7 @@ multivariate normal distribution of mean zero and covariance matrix $\kappa
 I_d$. Under this assumption, maximum a posteriori estimation leads to the following maximization
 problem, commonly known as \emph{ridge regression}:
 \begin{displaymath}
-    \hat{\beta} = \argmin_{\beta\in\reals^d} \sum_i (y_i - \beta^*x_i)^2
+    \hat{\beta} = \argmin_{\beta\in\reals^d} \sum_i (y_i - T{\beta}x_i)^2
     + \frac{1}{\mu}\sum_i \norm{\beta}_2^2
 \end{displaymath}
 where 
@@ -102,16 +101,16 @@ acts as a sort of ``Occam's razor'', favoring a \emph{parsimonious} model for $\
 
 
 Throughout the paper, we will make use of the following notations: if $x$ is
-a (column) vector in $\reals^d$, $x^*$ denotes its transposed (line)
+a (column) vector in $\reals^d$, $\T{x}$ denotes its transposed (line)
 vector. Thus, the standard inner product between two vectors $x$ and $y$ is
-simply $x^* y$. $\norm{x}_2 = x^*x$ will denote the $L_2$ norm of $x$.
+simply $\T{x}y$. $\norm{x}_2 = \T{x}x$ will denote the $L_2$ norm of $x$.
 
 We will also often use the following order over symmetric matrices: if $A$ and
 $B$ are two $d\times d$ and $B$ are two $d\times d$ real symmetric matrices, we
 write that $A\leq B$ iff:
 \begin{displaymath}
     \forall x\in\reals^d,\quad
-    x^*Ax \leq x^*Bx
+    \T{x}Ax \leq \T{x}Bx
 \end{displaymath}
 That is, iff $B-A$ is symmetric semi-definite positive.
 
@@ -127,7 +126,7 @@ definite positive matrices.
 
 Linear regression, a true workhorse of statistical analysis, fits a linear function to a set of observable input and output variables. In particular, consider a set of $n$ value pairs $(x_i,y_i)$, $i\in \mathcal{N} =\{1,\ldots,n\}$. The variables $x_i\in \reals^d$ are usually referred to as \emph{input} variables or \emph{covariates} and $y_i\in \reals$ are referred to as  \emph{output} or \emph{response} variables. For example, the former could be the age, weight, or height of a person, while the latter can be their propensity to contract a disease. Assume that input and output variables are related through the following relationship:
  \begin{displaymath}
-     y_i = \beta^* x_i + \varepsilon_i,\quad\forall i\in\mathcal{N},
+     y_i = \T{\beta} x_i + \varepsilon_i,\quad\forall i\in\mathcal{N},
 \end{displaymath}
 where $\beta\in\reals^d$, is called the \emph{model parameter vector} and $\varepsilon_i\in\reals$ are independent, normally distributed random variables, with zero mean and variance $\sigma^2$. 
 
@@ -136,9 +135,9 @@ The purpose of linear regression is to recover the vector $\beta$ upon the obser
 
 
 
-In the absense of any additional information, a natural method for estimating $\beta$ is through a least squares fit. However, in a more general setup, 
+In the absence of any additional information, a natural method for estimating $\beta$ is through a least squares fit. However, in a more general setup, 
  a prior knowledge about $\beta$, captured by a distribution over
-$\reals^d$, can also be taken into account. In this setup, after observing the data, $\beta$ can be retreived through \emph{maximum a posteriori estimation}: computing the parameters which maximize the
+$\reals^d$, can also be taken into account. In this setup, after observing the data, $\beta$ can be retrieved through \emph{maximum a posteriori estimation}: computing the parameters which maximize the
 posterior distribution of $\beta$ given the observations.
 
 A common prior assumption for linear regression is that $\beta$ follows a
@@ -146,7 +145,7 @@ multivariate normal distribution of mean zero and covariance matrix $\kappa
 I_d$. Under this assumption, maximum a posteriori estimation leads to the following maximization
 problem, commonly known as \emph{ridge regression}:
 \begin{displaymath}
-    \hat{\beta} = \argmin_{\beta\in\reals^d} \sum_i (y_i - \beta^*x_i)^2
+    \hat{\beta} = \argmin_{\beta\in\reals^d} \sum_i (y_i - \T{\beta}x_i)^2
     + \frac{1}{\mu}\sum_i \norm{\beta}_2^2
 \end{displaymath}
 where 
@@ -184,7 +183,7 @@ experimenter has not seen the data yet, but he wants to know which users he
 should be selecting. His goal is to learn the model underlying the data. Here,
 we assume a linear model:
 \begin{displaymath}
-    \forall i\in\mathcal{N},\quad y_i = \beta^* x_i + \varepsilon_i
+    \forall i\in\mathcal{N},\quad y_i = \T{\beta}x_i + \varepsilon_i
 \end{displaymath}
 where $\beta\in\reals^d$ and $\varepsilon_i\in\reals$ follows a normal
 distribution of mean $0$ and variance $\sigma^2$. Furthermore, we assume the
@@ -203,7 +202,7 @@ a multivariate normal distribution of mean zero and covariance matrix $\kappa
 I_d$. Maximum a posteriori estimation leads to the following maximization
 problem:
 \begin{displaymath}
-    \beta_{\text{max}} = \argmax_{\beta\in\reals^d} \sum_i (y_i - \beta^*x_i)^2
+    \beta_{\text{max}} = \argmax_{\beta\in\reals^d} \sum_i (y_i - \T{\beta}x_i)^2
     + \frac{1}{\mu}\sum_i \norm{\beta}_2^2
 \end{displaymath}
 which is the well-known \emph{ridge regression}. $\mu
@@ -241,17 +240,17 @@ where $Y_S = \{y_i,\,i\in S\}$ is the set of observed data.
     \begin{align*}
         \forall S\subset\mathcal{N},\; V(S)
         & = \frac{1}{2}\log\det\left(I_d
-            + \mu\sum_{i\in S} x_ix_i^*\right)\\
+            + \mu\sum_{i\in S} x_i\T{x_i}\right)\\
         & \defeq \frac{1}{2}\log\det A(S)
     \end{align*}
 \end{theorem}
 
 \begin{proof}
 
-Let us denote by $X_S$ the matrix whose rows are the vectors $(x_i^*)_{i\in
+Let us denote by $X_S$ the matrix whose rows are the vectors $(\T{x_i})_{i\in
 S}$. Observe that $A_S$ can simply be written as:
 \begin{displaymath}
-    A_S  = I_d + \mu X_S^* X_S
+    A_S  = I_d + \mu \T{X_S} X_S
 \end{displaymath}
 
 Let us recall that the entropy of a multivariate normal variable $B$ over
@@ -275,19 +274,19 @@ distribution of mean $X\beta$ and of covariance matrix $\sigma^2 I_n$. Hence:
 $(Y_S)$ also follows a multivariate normal distribution of mean zero. Let us
 compute its covariance matrix, $\Sigma_Y$:
 \begin{align*}
-    \Sigma_Y & = \expt{YY^*} = \expt{(X_S\beta + E)(X_S\beta + E)^*}\\
-             & = \kappa X_S X_S^* + \sigma^2I_n
+    \Sigma_Y & = \expt{Y\T{Y}} = \expt{(X_S\beta + E)\T{(X_S\beta + E)}}\\
+             & = \kappa X_S \T{X_S} + \sigma^2I_n
 \end{align*}
 Thus, we get that:
 \begin{equation}\label{eq:h2}
     \mathbb{H}(Y_S) 
-    = \frac{1}{2}\log\left((2\pi e)^n \det(\kappa X_S X_S^* + \sigma^2 I_n)\right)
+    = \frac{1}{2}\log\left((2\pi e)^n \det(\kappa X_S \T{X_S} + \sigma^2 I_n)\right)
 \end{equation}
 
 Combining \eqref{eq:h1} and \eqref{eq:h2} we get:
 \begin{displaymath}
     V(S) = \frac{1}{2}\log\det\left(I_n+\frac{\kappa}{\sigma^2}X_S
-    X_S^*\right)
+    \T{X_S}\right)
 \end{displaymath}
 
 Finally, we can use Sylvester's determinant theorem to get the result.
@@ -300,7 +299,7 @@ We have the following lemma.
 \begin{lemma}[Marginal contribution]
     \begin{displaymath}
         \Delta_i V(S)\defeq V(S\cup\{i\}) - V(S) 
-        = \frac{1}{2}\log\left(1 + \mu x_i^*A(S)^{-1}x_i\right)
+        = \frac{1}{2}\log\left(1 + \mu \T{x_i} A(S)^{-1}x_i\right)
     \end{displaymath}
 \end{lemma}
 
@@ -308,10 +307,10 @@ We have the following lemma.
     We have:
     \begin{align*}
         V(S\cup\{i\}) & = \frac{1}{2}\log\det A(S\cup\{i\})\\
-        & = \frac{1}{2}\log\det\left(A(S) + \mu x_i x_i^*\right)\\
+        & = \frac{1}{2}\log\det\left(A(S) + \mu x_i \T{x_i}\right)\\
         & = V(S) + \frac{1}{2}\log\det\left(I_d + \mu A(S)^{-1}x_i
-    x_i^*\right)\\
-        & = V(S) + \frac{1}{2}\log\left(1 + \mu x_i^* A(S)^{-1}x_i\right)
+    \T{x_i}\right)\\
+        & = V(S) + \frac{1}{2}\log\left(1 + \mu \T{x_i} A(S)^{-1}x_i\right)
     \end{align*}
     where the last equality comes from Sylvester's determinant formula.
 \end{proof}
author	Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)>	2012-10-30 08:28:13 -0700
committer	Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)>	2012-10-30 08:28:13 -0700
commit	2d54b5d27ffd2782aec0bab8200b5b9e55585805 (patch)
tree	7921aeb0ed2a13a0e9ecb29c2ee26dbe0acaf6ae /problem.tex
parent	dacff6f8d498ef281066742305db90d1121d7f3b (diff)
parent	fe8aa3a0134a9493a2443a0965392254a7125094 (diff)
download	recommendation-2d54b5d27ffd2782aec0bab8200b5b9e55585805.tar.gz