summaryrefslogtreecommitdiffstats
path: root/algorithm.tex
diff options
context:
space:
mode:
authorJon Whiteaker <jbw@berkeley.edu>2012-03-05 11:12:04 -0800
committerJon Whiteaker <jbw@berkeley.edu>2012-03-05 11:12:04 -0800
commit73834f5ef1c7be73d054baf4cf5f14a39fef17dc (patch)
tree81e50a05b90a8ad2e7b9eb781c32a44d8fe9146f /algorithm.tex
parent19bb0ff0935f8adc4b63ffd8e8aa58706bdcf7a2 (diff)
downloadkinect-73834f5ef1c7be73d054baf4cf5f14a39fef17dc.tar.gz
jon's final pass part one
Diffstat (limited to 'algorithm.tex')
-rw-r--r--algorithm.tex77
1 files changed, 67 insertions, 10 deletions
diff --git a/algorithm.tex b/algorithm.tex
index b3e7648..9bc0fd6 100644
--- a/algorithm.tex
+++ b/algorithm.tex
@@ -1,29 +1,69 @@
\section{Algorithms}
\label{sec:algorithms}
-In Section~\ref{sec:uniqueness}, we showed that a nearest-neighbor classifier can accurately predict if a skeleton belongs to the same person if the error of the skeleton measurements is small. In this section, we suggest a probabilistic model for skeleton recognition. In this model, a skeleton is classified based on the distance from average skeleton profiles of people in the training set.
+In Section~\ref{sec:uniqueness}, we showed that a nearest-neighbor classifier
+can accurately predict if two sets of skeletal measurements belong to the same
+person if the error of the skeleton measurements is small. In this section, we
+suggest a probabilistic model for skeleton recognition. In this model, a
+skeleton is classified based on the distance from average skeleton profiles of
+people in the training set.
\subsection{Mixture of Gaussians}
\label{sec:mixture of Gaussians}
-A mixture of Gaussians \cite{bishop06pattern} is a generative probabilistic model, which is typically applied to modeling problems where class densities are unimodal and the feature space is low-dimensional. The joint probability distribution of the model is given by:
+A mixture of Gaussians \cite{bishop06pattern} is a generative probabilistic
+model, which is typically applied to modeling problems where class densities
+are unimodal and the feature space is low-dimensional. The joint probability
+distribution of the model is given by:
\begin{align}
P(\bx, y) = \cN(\bx | \bar{\bx}_y, \Sigma) P(y),
\label{eq:mixture of Gaussians}
\end{align}
-where $P(y)$ is the probability of class $y$ and $\cN(\bx | \bar{\bx}_y, \Sigma)$ is a multivariate normal distribution, which models the density of $\bx$ given $y$. The mean of the distribution is $\bar{\bx}_y$ and the variance of $\bx$ is captured by the covariance matrix $\Sigma$. The decision boundary between any two classes is known to be is linear when all conditionals $\cN(\bx | \bar{\bx}_y, \Sigma)$ have the same covariance matrix \cite{bishop06pattern}. In this setting, the mixture of Gaussians model can be viewed as a probabilistic variant of the nearest-neighbor (NN) classifier in Section~\ref{sec:uniqueness}.
+where $P(y)$ is the probability of class $y$ and $\cN(\bx | \bar{\bx}_y,
+\Sigma)$ is a multivariate normal distribution, which models the density of
+$\bx$ given $y$. The mean of the distribution is $\bar{\bx}_y$ and the variance
+of $\bx$ is captured by the covariance matrix $\Sigma$. The decision boundary
+between any two classes is known to be is linear when all conditionals $\cN(\bx
+| \bar{\bx}_y, \Sigma)$ have the same covariance matrix \cite{bishop06pattern}.
+In this setting, the mixture of Gaussians model can be viewed as a
+probabilistic variant of the nearest-neighbor (NN) classifier in
+Section~\ref{sec:uniqueness}.
+
+The mixture of Gaussians model has many advantages. First, the model can be
+easily learned using maximum-likelihood (ML) estimation \cite{bishop06pattern}.
+In particular, $P(y)$ is the frequency of $y$ in the training set,
+$\bar{\bx}_y$ is the expectation of $\bx$ given $y$, and the covariance matrix
+is computed as $\Sigma = \sum_y P(y) \Sigma_y$, where $\Sigma_y$ represents the
+covariance of $\bx$ given $y$. Second, the inference in the model can be
+performed in a closed form. In particular, the model predicts $\hat{y} =
+\arg\max_y P(y | \bx)$, where:
-The mixture of Gaussians model has many advantages. First, the model can be easily learned using maximum-likelihood (ML) estimation \cite{bishop06pattern}. In particular, $P(y)$ is the frequency of $y$ in the training set, $\bar{\bx}_y$ is the expectation of $\bx$ given $y$, and the covariance matrix is computed as $\Sigma = \sum_y P(y) \Sigma_y$, where $\Sigma_y$ represents the covariance of $\bx$ given $y$. Second, the inference in the model can be performed in a closed form. In particular, the model predicts $\hat{y} = \arg\max_y P(y | \bx)$, where:
\begin{align}
P(y | \bx) =
\frac{P(\bx | y) P(y)}{\sum_y P(\bx | y) P(y)} =
\frac{\cN(\bx | \bar{\bx}_y, \Sigma) P(y)}{\sum_y \cN(\bx | \bar{\bx}_y, \Sigma) P(y)}.
\label{eq:inference}
\end{align}
-In practice, the prediction $\hat{y}$ is accepted when the classifier is confident. In other words, $P(\hat{y} | \bx) \! > \! \delta$, where $\delta \in (0, 1)$ is a threshold that controls the precision and recall of the classifier. In general, the higher the threshold $\delta$, the lower the recall and the higher the precision.
-In this work, we use the mixture of Gaussians model for skeleton recognition. Skeleton measurements are represented by a vector $\bx$ and each person is assigned to one class $y$. In particlar, our dataset $\cD = \set{(\bx_1, y_1), \dots, (\bx_n, y_n)}$ consists of $n$ pairs $(\bx_i, y_i)$, where $y_i$ is the label of the skeleton $\bx_i$. To verify that our method is suitable for skeleton recognition, we plot for each skeleton feature $x_k$ (Section~\ref{sec:experiment}) the histogram of differences between all measurements and the expectation given the class $(\bx_i)_k - \E{}{x_k | y_i}$ (Figure~\ref{fig:error marginals}). All histograms look approximately normal. This indicates that all class conditionals $P(\bx | y)$ are multivariate normal and our generative model, although very simple, may be nearly optimal \cite{bishop06pattern}.
+In practice, the prediction $\hat{y}$ is accepted when the classifier is
+confident. In other words, $P(\hat{y} | \bx) \! > \! \delta$, where $\delta \in
+(0, 1)$ is a threshold that controls the precision and recall of the
+classifier. In general, the higher the threshold $\delta$, the lower the recall
+and the higher the precision.
+
+In this work, we use the mixture of Gaussians model for skeleton recognition.
+Skeleton measurements are represented by a vector $\bx$ and each person is
+assigned to one class $y$. In particlar, our dataset $\cD = \set{(\bx_1, y_1),
+\dots, (\bx_n, y_n)}$ consists of $n$ pairs $(\bx_i, y_i)$, where $y_i$ is the
+label of the skeleton $\bx_i$. To verify that our method is suitable for
+skeleton recognition, we plot for each skeleton feature $x_k$
+(Section~\ref{sec:experiment}) the histogram of differences between all
+measurements and the expectation given the class $(\bx_i)_k - \E{}{x_k | y_i}$
+(Figure~\ref{fig:error marginals}). All histograms look approximately normal.
+This suggests that all class conditionals $P(\bx | y)$ are multivariate normal
+and our generative model, although very simple, may be nearly optimal
+\cite{bishop06pattern}.
\begin{figure}[t]
\centering
@@ -39,15 +79,32 @@ In this work, we use the mixture of Gaussians model for skeleton recognition. Sk
\subsection{Sequential hypothesis testing}
\label{sec:SHT}
-The mixture of Gaussians model can be extended to temporal inference through sequential hypothesis testing. Sequential hypothesis testing \cite{wald47sequential} is an established statistical framework, where a subject is sequentially tested for belonging to one of several classes. The probability that the sequence of data $\bx^{(1)}, \dots, \bx^{(t)}$ belongs to the class $y$ at time $t$ is given by:
+The mixture of Gaussians model can be extended to temporal inference through
+sequential hypothesis testing. Sequential hypothesis testing
+\cite{wald47sequential} is an established statistical framework, where a
+subject is sequentially tested for belonging to one of several classes. The
+probability that the sequence of data $\bx^{(1)}, \dots, \bx^{(t)}$ belongs to
+the class $y$ at time $t$ is given by:
+
\begin{align}
P(y | \bx^{(1)}, \dots, \bx^{(t)}) =
\frac{\prod_{i = 1}^t \cN(\bx^{(i)} | \bar{\bx}_y, \Sigma) P(y)}
{\sum_y \prod_{i = 1}^t \cN(\bx^{(i)} | \bar{\bx}_y, \Sigma) P(y)}.
\label{eq:SHT}
\end{align}
-In practice, the prediction $\hat{y} = \arg\max_y P(y | \bx^{(1)}, \dots, \bx^{(t)})$ is accepted when the classifier is confident. In other words, $P(\hat{y} | \bx^{(1)}, \dots, \bx^{(t)}) > \delta$, where the threshold $\delta \in (0, 1)$ controls the precision and recall of the predictor. In general, the higher the threshold $\delta$, the higher the precision and the lower the recall.
+In practice, the prediction $\hat{y} = \arg\max_y P(y | \bx^{(1)}, \dots,
+\bx^{(t)})$ is accepted when the classifier is confident. In other words,
+$P(\hat{y} | \bx^{(1)}, \dots, \bx^{(t)}) > \delta$, where the threshold
+$\delta \in (0, 1)$ controls the precision and recall of the predictor. In
+general, the higher the threshold $\delta$, the higher the precision and the
+lower the recall.
-Sequential hypothesis testing is a common technique for smoothing temporal predictions. In particular, note that the prediction at time $t$ depends on all data up to time $t$. This reduces the variance of predictions, especially when input data are noisy, such as in the domain of skeleton recognition.
+Sequential hypothesis testing is a common technique for smoothing temporal
+predictions. In particular, note that the prediction at time $t$ depends on all
+data up to time $t$. This reduces the variance of predictions, especially when
+input data are noisy, such as in the domain of skeleton recognition.
-In skeleton recognition, the sequence $\bx^{(1)}, \dots, \bx^{(t)}$ are skeleton measurements of a person walking towards the camera, for instance. If the camera detects more people, we use tracking to distinguish individual skeleton sequences.
+In skeleton recognition, the sequence $\bx^{(1)}, \dots, \bx^{(t)}$ is skeleton
+measurements of a person walking towards the camera, for instance. If the
+Kinect detects multiple people, we use the figure tracking of the tools in
+\xref{sec:experiment:dataset} to distinguish individual skeleton sequences.