1 files changed, 64 insertions, 5 deletions
diff --git a/hw1/main.tex b/hw1/main.tex
index da77359..3aea165 100644
--- a/hw1/main.tex
+++ b/hw1/main.tex
@@ -97,7 +97,7 @@ $\sigma_X^2+\sigma_Y^2$.
 (c) If $X$ has mean $\mu_X$ and $Y$ has mean $\mu_Y$, $X-\mu_X$ and $Y-\mu_Y$
 are still independent variables with unchanged variance and zero mean.
 Applying the previous result, $X+Y-\mu_X-\mu_Y$ is a Gaussian variable with
-mean $0$ and variance $\sigma_X^2 + \sigma_Y^2$. Hence, $X+Y$ is normally
+mean $0$ and variance $\sigma_X^2 + \sigma_Y^2$. Hence, $X+Y$ is normally
 distributed with mean $\mu_X+\mu_Y$ and variance $\sigma_X^2+\sigma_Y^2$.
 
 (d) Since $x\mapsto e^X$ is a bijection mapping $\mathbb{R}$ to $\mathbb{R}^+$,
@@ -151,7 +151,7 @@ $\beta$.
 (b) We have $H^T = H$, and $H^2 = X(X^TX)^{-1}X^TX(X^TX)^{-1}X^T
 = X(X^TX)^{-1}X^T = H$. This shows that $H$ is an orthogonal projection matrix.
 Furthermore, we see that if $x$ is in the column space of $X$, that is, $x
-= Xe_i$ where $e_i$ is a vector of the canonical basis, then $Hx$
+= Xe_i$ where $e_i$ is a vector of the canonical basis, then $Hx$
 = $X(X^TX)^{-1}X^TXe_i = Xe_i = x$. That is, $H$ is the identity matrix on the
 column space of $X$. This is enough to conclude that $H$ is the orthogonal
 projection on this subspace.
@@ -160,7 +160,7 @@ projection on this subspace.
 \begin{displaymath}
     \E(\hat{\beta}) = (X^TX)^{-1}X^TX\beta = \beta
 \end{displaymath}
-for the covariance. Writing $Y = X\beta + \epsilon$ where $\epsilon\sim
+for the covariance. Writing $Y = X\beta + \epsilon$ where $\epsilon\sim
 \mathcal{N}(0, \sigma^2I)$, we see that $\hat{\beta}
 = \beta + (X^TX)^{-1}X^T\epsilon$. Hence:
 \begin{displaymath}
@@ -168,7 +168,7 @@ for the covariance. Writing $Y = X\beta + \epsilon$ where $\epsilon\sim
         = \E\big[(X^TX)^{-1}X^T\epsilon\epsilon^TX(X^TX)^{-1}\big]
         = \sigma^2(X^TX)^{-1}
 \end{displaymath}
-where the last equality used $E(\epsilon\epsilon^T) = \sigma^2 Id$.
+where the last equality used $E(\epsilon\epsilon^T) = \sigma^2 Id$.
 
 (d) The log-likelihood is:
 \begin{displaymath}
@@ -254,7 +254,66 @@ where $a_{k,n} = a_k + \sum_{i=1}^{n-1} \mathbb{1}\{X_i = k\}$. (Bonus points if
 
 \paragraph{Solution} (a) We have $\E(X) = \frac{1}{\sum_{k=1}^K a_k} a$.
 
-(b)
+(b) Using Bayes' theorem, we see that the posterior $\theta\given X$ is
+proportional to:
+\begin{displaymath}
+    f(\theta\given a)\prob(X=i|\theta) \sim ~ \theta_i\prod_{k=1}^K \theta_k^{a_k-1}
+\end{displaymath}
+where the multiplicative constant does not depend on $\theta$. In particular we
+see that $\theta\given X$ is a Dirichlet distribution of parameter $a + e_X$.
+Where  $e_i$ represents the $i$th vector of the canonical basis. That is, the
+new shape parameter is $a$ where the coordinate corresponding to the
+observation $X$ has been increased by 1.
+
+(c) Using Bayes rule, we have:
+\begin{equation}
+    \label{eq:foo}
+    \prob(X_n=k\given X_1,\ldots,X_{n-1}) = \int \prob(X_n=k\given \theta)
+    \prob(\theta \given X_1,\ldots,X_{n-1})d\theta
+\end{equation}
+Now to compute $\prob(\theta\given X_1,\ldots,X_{n-1})$ we use:
+\begin{displaymath}
+    \prob(\theta\given X_1,\ldots, X_{n-1})
+    \sim \prob(\theta)\prod_{i=1}^{n-1}\prob(X_i\given\theta)
+\end{displaymath}
+where we used the Bayes rule and independence. Using a computation similar to
+the one in (b), we see that $\theta\given X_1,\ldots, X_{n-1}$ follows
+a Dirichlet distribution with shape parameter $a_n = (a_{1,n},\dots, a_{K,n})$
+(each observation increases the associated coordinate by one). Using this in
+\eqref{eq:foo} we see that $\prob(X_n=k|X_1,\ldots,X_{n-1})$ is exactly the
+expectation of the $k$th coordinate of the Dirichlet distribution of parameter
+$a_n$. Using part (a), this is equal to:
+\begin{displaymath}
+    \prob(X_n=k|X_1,\ldots,X_{n-1}) = \frac{a_{k,n}}{\sum_{k=1}^K a_{k,n}}
+\end{displaymath}
+
+(d) Using the strong law of large numbers, writing $\hat{X}_n
+= \frac{1}{n}\sum_{i=1}^n \mathbb{1}\{X_i=k\}$, we know that $\hat{X}_n$
+converges almost surely to $\E(X_i=k) = \theta_k$. In particular, the CDF of
+the (almost sure) limit $Y$ of $\hat{X}_n$ is the step function equal to $0$ for
+$z< \theta_k$ and $1$ for $z\geq \theta_k$. To get the CDF of the marginal
+distribution, we integrate over $\theta_k$:
+\begin{displaymath}
+    \prob(Y\leq z) = \int_{0}^1 \mathbb{1}\{z\geq
+    \theta_k\}p(\theta_k)d\theta_k
+    = \int_{0}^z p(\theta_k)d\theta_k
+\end{displaymath}
+we recognize on the right-hand side the CDF of $\theta_k$ which is the $k$th
+marginal of a dirichlet distribution: this is a Beta distribution of parameters
+$(\alpha_k, \sum_i\alpha_i - \alpha_k)$. Hence the limit (in distribution) $Y$
+of $\hat{X}_n$ is Beta distributed with those parameters.
+
+(e) It is easy to see that if we denote by $X_n$ the color of the ball drawn at
+the  $n$th time step, then $X_n\given X_1\ldots X_{n-1}$ has the same law as
+the one obtained in part (c). Then we can write:
+\begin{displaymath}
+    \rho_{k,n} = \frac{a_k +\sum_{i=1}^{n-1}\mathbb{1}\{X_i=k\}}{\sum_{k=1}^K a_k
+    + n} = \frac{a_k}{\sum_k a_k + n} + \frac{1}{1 + \sum_k a_k/n}
+    \frac{1}{n}\sum_{i=1}^{n-1}\mathbb{1}\{X_i=k\}
+\end{displaymath}
+Using part (d) we obtain that $\rho_{k,n}$ converges (at least in distribution)
+to a Beta distributed variable with parameters $(\alpha_k, \sum_i \alpha_i
+- \alpha_k)$.
 
 \section*{Physicochemical Properties of Protein Tertiary Structure}