1 files changed, 12 insertions, 3 deletions
diff --git a/notes.tex b/notes.tex
index b4460f3..77ec645 100644
--- a/notes.tex
+++ b/notes.tex
@@ -29,7 +29,7 @@ The cost of the regression error will be measured by the MSE:
 \end{displaymath}
 
 The general goal is to understand how the size of the database impacts
-the MSE of the derived regression function.
+the MSE of the regression function.
 
 \subsection{From the bivariate normal case to linear regression}
 If $(X,Y)$ is drawn from a bivariate normal distribution with mean
@@ -39,7 +39,16 @@ write:
   Y = \condexp{Y}{X} + \big(Y-\condexp{Y}{X}\big)
 \end{displaymath}
 In this particular case, $\condexp{Y}{X}$ is a linear function of $X$: 
-writing $\varepsilon = Y-\condexp{Y}{X}$, it is easy to see that $\expt{X\varepsilon}=0$.
+\begin{displaymath}
+\condexp{Y}{X} = \alpha X + \beta  
+\end{displaymath}
+where $\alpha$ and $\beta$ can be expressed as a function of $\mu$ and
+$\Sigma$. Writing $\varepsilon = Y-\condexp{Y}{X}$, it is easy to see
+that $\expt{X\varepsilon}=0$. Furthermore $\varepsilon$ is also normally
+distributed. Under these assumptions, it can be proven that the least
+square estimator for $(\alpha,\beta)$ is optimal (it reaches the
+Cramér-Rao bound).
+
 \subsection{Linear regression}
 
 We assume a linear model:
@@ -158,7 +167,7 @@ y)^2\big)}
 By the Cauchy-Schwarz inequality:
 \begin{displaymath}
  (1+\norm{y}^2)(1+\norm{x_0}^2)-(x_0\cdot
-y)^2 \geq 0 
+y)^2 > 0 
 \end{displaymath}
 
 Thus the previous inequality is consecutively equivalent to: