summaryrefslogtreecommitdiffstats
path: root/hw1
diff options
context:
space:
mode:
authorThibaut Horel <thibaut.horel@gmail.com>2015-10-08 22:37:26 -0400
committerThibaut Horel <thibaut.horel@gmail.com>2015-10-08 22:37:26 -0400
commit0f9fc1bdb1755f5677e3201bcd35706e09235844 (patch)
treeec4cac94a4f06c873f163abdc0b6c8f77fedfdd4 /hw1
parentfb90afefbf3e6d10164a4c1a496e71a688e600f0 (diff)
downloadcs281-0f9fc1bdb1755f5677e3201bcd35706e09235844.tar.gz
[hw1] old changes
Diffstat (limited to 'hw1')
-rw-r--r--hw1/main.tex15
1 files changed, 8 insertions, 7 deletions
diff --git a/hw1/main.tex b/hw1/main.tex
index ee9b915..02d80ae 100644
--- a/hw1/main.tex
+++ b/hw1/main.tex
@@ -212,14 +212,14 @@ and we obtain the desired formula for the MAP estimator.
(f) $n$ needs not be greater than $m$ since $(\lambda Id + X^TX)$ might be
invertible even if $X^TX$ is not.
-(g) Replacing $X$ by $X'$:
+(g) Let us define $X'$:
\begin{displaymath}
X' = \begin{pmatrix}
X\\
\sqrt{\lambda}Id
\end{pmatrix}
\end{displaymath}
-and $Y^T$ by $Y'^ = [Y\; 0]$ and replacing $X$ by $X'$ in the least square
+Replacing $Y^T$ by $Y'^ = [Y\; 0]$ and replacing $X$ by $X'$ in the least square
estimator $\hat{\beta}$, we see that $X'^TY' = X^TY$. Furthermore:
\begin{displaymath}
X'^TX' = \begin{pmatrix}
@@ -230,8 +230,9 @@ estimator $\hat{\beta}$, we see that $X'^TY' = X^TY$. Furthermore:
\sqrt{\lambda}Id
\end{pmatrix} = X^TX +\lambda Id
\end{displaymath}
-Hence we get $\hat{\beta} = (X^TX+\lambda Id)^{-1}X^TY$, which is exactly the
-expression of the MAP estimator in ridge regression.
+Hence the least square estimator yields $\hat{\beta} = (X^TX+\lambda
+Id)^{-1}X^TY$, which is exactly the expression of the MAP estimator in ridge
+regression.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{problem}[The Dirichlet and multinomial distributions, 12pts]
@@ -507,13 +508,13 @@ by the method L-BFGS for each value of $d$. Answer the following questions:
We see that the non-linear transformations help reduce the prediction error:
the data is probably not very well modeled by a linear relation ship. Allowing
more complex relationships (as $d$ becomes larger the relationship becomes more
-complex) fits it better.
+complex) is a better fit and helps the prediction.
QR is faster than L-BFGS when the number of features is not too large ($d\leq
400$). For $d=600$, L-BFGS is faster. The asymptotic complexity of QR is
$O(nd^2 + d^3)$ whereas the complexity of L-BFGS is $O(nd)$ (for a fixed number
-of iterations). So asymptotically, L-BFGS is be faster as we observe for
-$d=600$ (for smaller values of $d$ the constants hidden in the $O$ do not allow
+of iterations). So asymptotically, L-BFGS is faster as we observe for $d=600$
+(for smaller values of $d$ the constants hidden in the $O$ do not allow
a formal comparison).
\end{document}