diff options
| author | Thibaut Horel <thibaut.horel@gmail.com> | 2015-09-22 16:15:25 -0400 |
|---|---|---|
| committer | Thibaut Horel <thibaut.horel@gmail.com> | 2015-09-22 16:15:25 -0400 |
| commit | 83203eee32a460bfe967557a8cc292b39b90d5c6 (patch) | |
| tree | 15ff422803bfdbf0959a71c8f926e4ef465a732e /hw1 | |
| parent | d8fc73b493a8a03cb09bd558d9472a33453608a6 (diff) | |
| download | cs281-83203eee32a460bfe967557a8cc292b39b90d5c6.tar.gz | |
[hw1] Problem 3
Diffstat (limited to 'hw1')
| -rw-r--r-- | hw1/main.tex | 69 |
1 files changed, 64 insertions, 5 deletions
diff --git a/hw1/main.tex b/hw1/main.tex index da77359..3aea165 100644 --- a/hw1/main.tex +++ b/hw1/main.tex @@ -97,7 +97,7 @@ $\sigma_X^2+\sigma_Y^2$. (c) If $X$ has mean $\mu_X$ and $Y$ has mean $\mu_Y$, $X-\mu_X$ and $Y-\mu_Y$ are still independent variables with unchanged variance and zero mean. Applying the previous result, $X+Y-\mu_X-\mu_Y$ is a Gaussian variable with -mean $0$ and variance $\sigma_X^2 + \sigma_Y^2$. Hence, $X+Y$ is normally +mean $0$ and variance $\sigma_X^2 + \sigma_Y^2$. Hence, $X+Y$ is normally distributed with mean $\mu_X+\mu_Y$ and variance $\sigma_X^2+\sigma_Y^2$. (d) Since $x\mapsto e^X$ is a bijection mapping $\mathbb{R}$ to $\mathbb{R}^+$, @@ -151,7 +151,7 @@ $\beta$. (b) We have $H^T = H$, and $H^2 = X(X^TX)^{-1}X^TX(X^TX)^{-1}X^T = X(X^TX)^{-1}X^T = H$. This shows that $H$ is an orthogonal projection matrix. Furthermore, we see that if $x$ is in the column space of $X$, that is, $x -= Xe_i$ where $e_i$ is a vector of the canonical basis, then $Hx$ += Xe_i$ where $e_i$ is a vector of the canonical basis, then $Hx$ = $X(X^TX)^{-1}X^TXe_i = Xe_i = x$. That is, $H$ is the identity matrix on the column space of $X$. This is enough to conclude that $H$ is the orthogonal projection on this subspace. @@ -160,7 +160,7 @@ projection on this subspace. \begin{displaymath} \E(\hat{\beta}) = (X^TX)^{-1}X^TX\beta = \beta \end{displaymath} -for the covariance. Writing $Y = X\beta + \epsilon$ where $\epsilon\sim +for the covariance. Writing $Y = X\beta + \epsilon$ where $\epsilon\sim \mathcal{N}(0, \sigma^2I)$, we see that $\hat{\beta} = \beta + (X^TX)^{-1}X^T\epsilon$. Hence: \begin{displaymath} @@ -168,7 +168,7 @@ for the covariance. Writing $Y = X\beta + \epsilon$ where $\epsilon\sim = \E\big[(X^TX)^{-1}X^T\epsilon\epsilon^TX(X^TX)^{-1}\big] = \sigma^2(X^TX)^{-1} \end{displaymath} -where the last equality used $E(\epsilon\epsilon^T) = \sigma^2 Id$. +where the last equality used $E(\epsilon\epsilon^T) = \sigma^2 Id$. (d) The log-likelihood is: \begin{displaymath} @@ -254,7 +254,66 @@ where $a_{k,n} = a_k + \sum_{i=1}^{n-1} \mathbb{1}\{X_i = k\}$. (Bonus points if \paragraph{Solution} (a) We have $\E(X) = \frac{1}{\sum_{k=1}^K a_k} a$. -(b) +(b) Using Bayes' theorem, we see that the posterior $\theta\given X$ is +proportional to: +\begin{displaymath} + f(\theta\given a)\prob(X=i|\theta) \sim ~ \theta_i\prod_{k=1}^K \theta_k^{a_k-1} +\end{displaymath} +where the multiplicative constant does not depend on $\theta$. In particular we +see that $\theta\given X$ is a Dirichlet distribution of parameter $a + e_X$. +Where $e_i$ represents the $i$th vector of the canonical basis. That is, the +new shape parameter is $a$ where the coordinate corresponding to the +observation $X$ has been increased by 1. + +(c) Using Bayes rule, we have: +\begin{equation} + \label{eq:foo} + \prob(X_n=k\given X_1,\ldots,X_{n-1}) = \int \prob(X_n=k\given \theta) + \prob(\theta \given X_1,\ldots,X_{n-1})d\theta +\end{equation} +Now to compute $\prob(\theta\given X_1,\ldots,X_{n-1})$ we use: +\begin{displaymath} + \prob(\theta\given X_1,\ldots, X_{n-1}) + \sim \prob(\theta)\prod_{i=1}^{n-1}\prob(X_i\given\theta) +\end{displaymath} +where we used the Bayes rule and independence. Using a computation similar to +the one in (b), we see that $\theta\given X_1,\ldots, X_{n-1}$ follows +a Dirichlet distribution with shape parameter $a_n = (a_{1,n},\dots, a_{K,n})$ +(each observation increases the associated coordinate by one). Using this in +\eqref{eq:foo} we see that $\prob(X_n=k|X_1,\ldots,X_{n-1})$ is exactly the +expectation of the $k$th coordinate of the Dirichlet distribution of parameter +$a_n$. Using part (a), this is equal to: +\begin{displaymath} + \prob(X_n=k|X_1,\ldots,X_{n-1}) = \frac{a_{k,n}}{\sum_{k=1}^K a_{k,n}} +\end{displaymath} + +(d) Using the strong law of large numbers, writing $\hat{X}_n += \frac{1}{n}\sum_{i=1}^n \mathbb{1}\{X_i=k\}$, we know that $\hat{X}_n$ +converges almost surely to $\E(X_i=k) = \theta_k$. In particular, the CDF of +the (almost sure) limit $Y$ of $\hat{X}_n$ is the step function equal to $0$ for +$z< \theta_k$ and $1$ for $z\geq \theta_k$. To get the CDF of the marginal +distribution, we integrate over $\theta_k$: +\begin{displaymath} + \prob(Y\leq z) = \int_{0}^1 \mathbb{1}\{z\geq + \theta_k\}p(\theta_k)d\theta_k + = \int_{0}^z p(\theta_k)d\theta_k +\end{displaymath} +we recognize on the right-hand side the CDF of $\theta_k$ which is the $k$th +marginal of a dirichlet distribution: this is a Beta distribution of parameters +$(\alpha_k, \sum_i\alpha_i - \alpha_k)$. Hence the limit (in distribution) $Y$ +of $\hat{X}_n$ is Beta distributed with those parameters. + +(e) It is easy to see that if we denote by $X_n$ the color of the ball drawn at +the $n$th time step, then $X_n\given X_1\ldots X_{n-1}$ has the same law as +the one obtained in part (c). Then we can write: +\begin{displaymath} + \rho_{k,n} = \frac{a_k +\sum_{i=1}^{n-1}\mathbb{1}\{X_i=k\}}{\sum_{k=1}^K a_k + + n} = \frac{a_k}{\sum_k a_k + n} + \frac{1}{1 + \sum_k a_k/n} + \frac{1}{n}\sum_{i=1}^{n-1}\mathbb{1}\{X_i=k\} +\end{displaymath} +Using part (d) we obtain that $\rho_{k,n}$ converges (at least in distribution) +to a Beta distributed variable with parameters $(\alpha_k, \sum_i \alpha_i +- \alpha_k)$. \section*{Physicochemical Properties of Protein Tertiary Structure} |
