summaryrefslogtreecommitdiffstats
path: root/notes.tex
diff options
context:
space:
mode:
authorThibaut Horel <thibaut.horel@gmail.com>2012-01-16 18:32:54 -0800
committerThibaut Horel <thibaut.horel@gmail.com>2012-01-16 18:32:54 -0800
commit424a6e62941f77c0633beb46c1314679de69f366 (patch)
tree4187d6802aa517421275b3a9ee9cc5af143bb75f /notes.tex
parent8d311a511a7c673698b08d48e80fa19dc8247a71 (diff)
downloadrecommendation-424a6e62941f77c0633beb46c1314679de69f366.tar.gz
More details added to the notes
Diffstat (limited to 'notes.tex')
-rw-r--r--notes.tex8
1 files changed, 5 insertions, 3 deletions
diff --git a/notes.tex b/notes.tex
index f75d2b7..b4460f3 100644
--- a/notes.tex
+++ b/notes.tex
@@ -25,19 +25,21 @@ vector of explanatory variables $x$.
The cost of the regression error will be measured by the MSE:
\begin{displaymath}
- \mathrm{MSE}(f_n) = \expt{\big(f_n(x)-y\big)^2}
+ \mse(f_n) = \expt{\big(f_n(x)-y\big)^2}
\end{displaymath}
The general goal is to understand how the size of the database impacts
the MSE of the derived regression function.
\subsection{From the bivariate normal case to linear regression}
-If $(X,Y)$ is drawn from a bivariate normal distribution. Then, one can
+If $(X,Y)$ is drawn from a bivariate normal distribution with mean
+vector $\mu$ and covariance matrix $\Sigma$. Then, one can
write:
\begin{displaymath}
Y = \condexp{Y}{X} + \big(Y-\condexp{Y}{X}\big)
\end{displaymath}
-
+In this particular case, $\condexp{Y}{X}$ is a linear function of $X$:
+writing $\varepsilon = Y-\condexp{Y}{X}$, it is easy to see that $\expt{X\varepsilon}=0$.
\subsection{Linear regression}
We assume a linear model: