From cf9120310760d4f6141061e985675f3d1115b9de Mon Sep 17 00:00:00 2001
From: unknown <Brano@Toshibicka.(none)>
Date: Fri, 2 Mar 2012 22:39:27 -0800
Subject: Algorithms

---
 algorithm.tex | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

(limited to 'algorithm.tex')

diff --git a/algorithm.tex b/algorithm.tex
index 450f881..8bdd5a9 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -1,39 +1,43 @@
-\section{Classification Algorithm}
-\label{sec:algorithm}
+\section{Algorithms}
+\label{sec:algorithms}
 
-Provide a guideline for the rest of the section.
+In Section~\ref{sec:uniqueness}, we showed that a nearest-neighbor classifier can accurately predict if a skeleton belongs to the same person if the error of the skeleton measurements is small. In this section, we suggest a probabilistic model for skeleton recognition. In this model, a skeleton is classified based on the distance from average skeleton profiles of people in the training set.
 
 
 \subsection{Mixture of Gaussians}
 \label{sec:mixture of Gaussians}
 
-The nearest-neighbor (NN) classifier worked well on the dead body dataset. It is a well-know fact that the decision boundary of the NN classifier is piecewise linear \cite{bishop06pattern}. A mixture of Gaussians \cite{bishop06pattern} is a generative probabilistic model that can be used for NN classification. In our domain, each person is represented by a single Gaussian, which is centered in the person's mean profile. All Gaussians have the same covariance matrix, which encodes the variance and covariance of attributes around the mean profiles. In particular, our probabilistic model is given by:
+A mixture of Gaussians \cite{bishop06pattern} is a generative probabilistic model, which is typically applied to modeling problems where class densities are unimodal and the feature space is low-dimensional. The joint probability distribution of the model is given by:
 \begin{align}
   P(\bx, y) =  \cN(\bx | \bar{\bx}_y, \Sigma) P(y),
   \label{eq:mixture of Gaussians}
 \end{align}
-where $P(y)$ is the probability that the person y appears in front of the camera and $\cN(\bx | \bar{\bx}_y, \Sigma)$ is the probability that the profile $\bx$ belongs to the person $y$. The probability $\cN(\bx | \bar{\bx}_y, \Sigma)$ is a normal distribution, which is centered in the mean profile of the person $y$ and has a covariance matrix $\Sigma$. Since all Gaussians in the mixture have the same covariance matrix, all decision boundaries, $P(\bx, y_1) = P(\bx, y_2)$ for any $y_1$ and $y_2$, are linear.
+where $P(y)$ is the probability of class $y$ and $\cN(\bx | \bar{\bx}_y, \Sigma)$ is a class conditional that is modeled as a multivariate normal distribution. The distribution is centered at a vector $\bar{\bx}_y$, and the density of the class is described by the covariance matrix $\Sigma$. The decision boundary between any two classes $y_1$ and $y_2$, $P(\bx, y_1) \! = \! P(\bx, y_2)$, is linear when all class conditionals have the same covariance matrix $\Sigma$ \cite{bishop06pattern}. Thus, the mixture of Gaussians model is a probabilistic version of the nearest-neighbor (NN) classifier from Section~\ref{sec:uniqueness}.
 
-The parameters of our model are learned using maximum-likelihood (ML) estimation \cite{bishop06pattern}. The probability $P(y)$ is the fraction of time when the person $y$ appears in training set. The mean profile $\bar{\bx}_y$ is estimated as the mean of all profiles corresponding to the person $y$. The covariance matrix $\Sigma$ is estimated as a weighted sum $\Sigma = \sum_y P(y) \Sigma_y$, where $\Sigma_y$ is the covariance matrix of profiles for the person $y$.
-
-The inference in our model can be performed in a closed form. In particular, note that the predicted label is given by $\hat{y} = \arg\max_y P(y | \bx)$, where:
+The mixture of Gaussians model has many advantages. First, the model can be easily learned using maximum-likelihood (ML) estimation \cite{bishop06pattern}. In particular, $P(y)$ is the frequency of class $y$ in training data, $\bar{\bx}_y$ is the expectation of $\bx$ given $y$, and the covariance matrix $\Sigma$ is estimated as a weighted sum $\Sigma = \sum_y P(y) \Sigma_y$, where $\Sigma_y$ is the covariance matrix corresponding to class $y$. Second, the inference in the model can be performed in a closed form. In particular, the predicted label is given by $\hat{y} = \arg\max_y P(y | \bx)$, where:
 \begin{align}
   P(y | \bx) =
   \frac{P(\bx | y) P(y)}{\sum_y P(\bx | y) P(y)} =
   \frac{\cN(\bx | \bar{\bx}_y, \Sigma) P(y)}{\sum_y \cN(\bx | \bar{\bx}_y, \Sigma) P(y)}.
   \label{eq:inference}
 \end{align}
-The prediction $\hat{y}$ is accepted when the classifier is sufficiently confident. In other words, $P(\hat{y} | \bx) \! > \! h$, where $h \in (0, 1)$ is a threshold that controls the precision and recall of the classifier. In practice, the higher the value of $h$, the lower the recall and the higher the precision.
+In practice, the prediction $\hat{y}$ is accepted when the classifier is confident. In other words, $P(\hat{y} | \bx) \! > \! \delta$, where $\delta \in (0, 1)$ is a threshold that controls the precision and recall of the classifier. In general, the higher the threshold $\delta$, the lower the recall and the higher the precision.
+
+In this work, we use the mixture of Gaussians model for skeleton recognition. In this problem, the feature vector $\bx$ are skeleton measurements and each person corresponds to one class $y$.
 
 
 \subsection{Sequential hypothesis testing}
 \label{sec:SHT}
 
-The mixture of Gaussians model can be extended to temporal inference through sequential hypothesis testing. Sequential hypothesis testing \cite{wald47sequential} is an established statistical framework, where a subject is sequentially tested for belonging to one of several classes. The probability that a sequence of data $\bx^{(1)}, \dots, \bx^{(t)}$ belongs to the class $y$ at time $t$ is given by:
+The mixture of Gaussians model can be extended to temporal inference through sequential hypothesis testing. Sequential hypothesis testing \cite{wald47sequential} is an established statistical framework, where a subject is sequentially tested for belonging to one of several classes. The probability that the sequence of data $\bx^{(1)}, \dots, \bx^{(t)}$ belongs to the class $y$ at time $t$ is given by:
 \begin{align}
   P(y | \bx^{(1)}, \dots, \bx^{(t)}) =
   \frac{\prod_{i = 1}^t \cN(\bx^{(i)} | \bar{\bx}_y, \Sigma) P(y)}
   {\sum_y \prod_{i = 1}^t \cN(\bx^{(i)} | \bar{\bx}_y, \Sigma) P(y)}.
   \label{eq:SHT}
 \end{align}
-The result $\hat{y} = \arg\max_y P(y | \bx^{(1)}, \dots, \bx^{(t)})$ is accepted when $P(\hat{y} | \bx^{(1)}, \dots, \bx^{(t)}) > h$, where the threshold $h \in (0, 1)$ controls the precision and recall of the predictor. In practice, sequential hypothesis testing smooths out the predictions of the base model. As a result, the new predictions are more precise at the same recall.
+In practice, the prediction $\hat{y} = \arg\max_y P(y | \bx^{(1)}, \dots, \bx^{(t)})$ is accepted when the classifier is confident. In other words, $P(\hat{y} | \bx^{(1)}, \dots, \bx^{(t)}) > \delta$, where the threshold $\delta \in (0, 1)$ controls the precision and recall of the predictor. In general, the higher the threshold $\delta$, the higher the precision and the lower the recall.
+
+Sequential hypothesis testing is a common technique for smoothing temporal predictions. In particular, note that the prediction at time $t$ depends on all data up to time $t$. This reduces the variance of predictions, especially when input data are noisy, such as in real-world skeleton recognition.
+
+In skeleton recognition, the sequence $\bx^{(1)}, \dots, \bx^{(t)}$ are skeleton measurements of a person walking towards the camera, for instance. If the camera detects more people, we use tracking in the camera to identify individual skeleton sequences.
-- 
cgit v1.2.3-70-g09d2