[hw1] old changes

author: Thibaut Horel <thibaut.horel@gmail.com> 2015-10-08 22:37:26 -0400
committer: Thibaut Horel <thibaut.horel@gmail.com> 2015-10-08 22:37:26 -0400
commit: 0f9fc1bdb1755f5677e3201bcd35706e09235844 (patch)
tree: ec4cac94a4f06c873f163abdc0b6c8f77fedfdd4 /hw1
parent: fb90afefbf3e6d10164a4c1a496e71a688e600f0 (diff)
download: cs281-0f9fc1bdb1755f5677e3201bcd35706e09235844.tar.gz
1 files changed, 8 insertions, 7 deletions
diff --git a/hw1/main.tex b/hw1/main.tex
index ee9b915..02d80ae 100644
--- a/hw1/main.tex
+++ b/hw1/main.tex
@@ -212,14 +212,14 @@ and we obtain the desired formula for the MAP estimator.
 (f) $n$ needs not be greater than $m$ since $(\lambda Id + X^TX)$ might be
 invertible even if $X^TX$ is not.
 
-(g) Replacing $X$ by $X'$:
+(g) Let us define $X'$:
 \begin{displaymath}
     X' = \begin{pmatrix}
         X\\
         \sqrt{\lambda}Id
     \end{pmatrix}
 \end{displaymath}
-and $Y^T$ by $Y'^ = [Y\; 0]$ and replacing $X$ by $X'$ in the least square
+Replacing $Y^T$ by $Y'^ = [Y\; 0]$ and replacing $X$ by $X'$ in the least square
 estimator $\hat{\beta}$, we see that $X'^TY' = X^TY$. Furthermore:
 \begin{displaymath}
     X'^TX' = \begin{pmatrix}
@@ -230,8 +230,9 @@ estimator $\hat{\beta}$, we see that $X'^TY' = X^TY$. Furthermore:
         \sqrt{\lambda}Id
     \end{pmatrix} = X^TX +\lambda Id
 \end{displaymath}
-Hence we get $\hat{\beta} = (X^TX+\lambda Id)^{-1}X^TY$, which is exactly the
-expression of the MAP estimator in ridge regression.
+Hence the least square estimator yields $\hat{\beta} = (X^TX+\lambda
+Id)^{-1}X^TY$, which is exactly the expression of the MAP estimator in ridge
+regression.
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{problem}[The Dirichlet and multinomial distributions, 12pts]
@@ -507,13 +508,13 @@ by the method L-BFGS for each value of $d$. Answer the following questions:
 We see that the non-linear transformations help reduce the prediction error:
 the data is probably not very well modeled by a linear relation ship. Allowing
 more complex relationships (as $d$ becomes larger the relationship becomes more
-complex) fits it better.
+complex) is a better fit and helps the prediction.
 
 QR is faster than L-BFGS when the number of features is not too large ($d\leq
 400$). For $d=600$, L-BFGS is faster. The asymptotic complexity of QR is
 $O(nd^2 + d^3)$ whereas the complexity of L-BFGS is $O(nd)$ (for a fixed number
-of iterations). So asymptotically, L-BFGS is be faster as we observe for
-$d=600$ (for smaller values of $d$ the constants hidden in the $O$ do not allow
+of iterations). So asymptotically, L-BFGS is faster as we observe for $d=600$
+(for smaller values of $d$ the constants hidden in the $O$ do not allow
 a formal comparison).
 
 \end{document}
author	Thibaut Horel <thibaut.horel@gmail.com>	2015-10-08 22:37:26 -0400
committer	Thibaut Horel <thibaut.horel@gmail.com>	2015-10-08 22:37:26 -0400
commit	0f9fc1bdb1755f5677e3201bcd35706e09235844 (patch)
tree	ec4cac94a4f06c873f163abdc0b6c8f77fedfdd4 /hw1
parent	fb90afefbf3e6d10164a4c1a496e71a688e600f0 (diff)
download	cs281-0f9fc1bdb1755f5677e3201bcd35706e09235844.tar.gz