From fc4a35726bdaf8bc31481664a3a6a9442f99f73a Mon Sep 17 00:00:00 2001 From: Thibaut Horel Date: Fri, 14 Sep 2012 18:43:20 -0700 Subject: Tune up of the algoritm section --- algorithm.tex | 130 +++++++++++++++++++++++++++++++--------------------------- 1 file changed, 69 insertions(+), 61 deletions(-) (limited to 'algorithm.tex') diff --git a/algorithm.tex b/algorithm.tex index c02a0e5..97d0b42 100644 --- a/algorithm.tex +++ b/algorithm.tex @@ -4,57 +4,67 @@ In Section~\ref{sec:uniqueness}, we showed that a nearest-neighbor classifier can accurately predict if two sets of skeletal measurements belong to the same person if the error of the skeleton measurements is small. In this section, we -suggest a probabilistic model for skeleton recognition. In this model, a -skeleton is classified based on the distance from average skeleton profiles of -people in the training set. - +suggest a probabilistic model for skeleton measurements which leads to a simple +algorithm for skeleton recognition. \subsection{Mixture of Gaussians} \label{sec:mixture of Gaussians} -A mixture of Gaussians \cite{bishop06pattern} is a generative probabilistic -model, which is typically applied to modeling problems where class densities -are unimodal and the feature space is low-dimensional. The joint probability -distribution of the model is given by: -\begin{align} - P(\bx, y) = \cN(\bx | \bar{\bx}_y, \Sigma) P(y), - \label{eq:mixture of Gaussians} -\end{align} -where $\bx$ denotes an observation, $y$ is a class, $P(y)$ is the probability of the class, and $\cN(\bx | \bar{\bx}_y, \Sigma)$ is the conditional probability of $\bx$ given $y$. The conditional is a multivariate normal distribution. Its mean is $\bar{\bx}_y$ and the variance of $\bx$ is captured by the covariance matrix $\Sigma$. The decision boundary between two classes $y_1$ and $y_2$, $P(\bx, y_1) = P(\bx, y_2)$ for all $\bx$, is linear when all conditionals $\cN(\bx | \bar{\bx}_y, \Sigma)$ have the same covariance matrix $\Sigma$ \cite{bishop06pattern}. In this case, the mixture of Gaussians model can be viewed as a probabilistic variant of the nearest-neighbor (NN) classifier in Section~\ref{sec:uniqueness}. +In this section, skeleton measurements are represented by a feature vector +$\bx\in\mathbf{R}^9$ (see section \ref{sec:experiment-design} for a description +of the skeleton features). Each measurement is labeled with a class $y$ +representing the identity of the individual it belongs to (there are as many +classes as people taking part in the experiment). + +We want a probabilistic model for the joint distribution $P(\bx, y)$. Our +dataset $\cD = \set{(\bx_1, y_1),\ldots,(\bx_n, y_n)}$ consists of $n$ pairs +$(\bx_i, y_i)$ where $y_i$ is the label of the skeleton $\bx_i$. For each +feature $k\in \set{1,\ldots, 9}$, we plot the histogram of the set of +differences $\set{(\bx_i)_k - (\bar{\bx}_{y_i})_k,\; i\in\set{3,\ldots, n}}$, +between a measurement of a feature $(\bx_i)_k$ and the average of all the +measurements of the same feature belonging to the same class +$(\bar{\bx}_{y_i})_k$. These histograms are presented in Figure~\ref{fig:error +marginals}. All histograms look approximately normal, which suggests that all +class conditionals $P(\bx, y)$ are multivariate normal. + +Based on these observations, our model takes the form: +\begin{equation}\label{eq:mixture of Gaussians} + P(\bx, y) = \cN(\bx | \bar{\bx}_y, \Sigma) P(y) +\end{equation} +which means that the conditional distribution of $\bx$ given $y$ is +multivariate normal of mean $\bar{\bx}_y = \E{}{\bx | y}$ and of covariance +$\Sigma$. Note that the covariance matrix $\Sigma$ does not depend on the class +$y$. This is because we assume that the observed variance is only caused by the +noise introduced by the measurement process (low resolution of the Kinect +camera, approximation of the skeleton mapping algorithm provided by the +Microsoft SDK) and not by the identity of the individual walking in front of +the camera. + +This probabilistic model is called a mixture of Gaussians +\cite{bishop06pattern} and has many advantages. First, the parameters of the +model can be easily learned: $P(y)$, the probability of belonging to class $y$ +is simply the frequency of $y$ in the training set, $\bar{\bx}_y$ is the sample +mean of the observations belonging to class $y$, and $\Sigma$ is learned as +$\Sigma = \sum_y P(y)\Sigma_y$, where $\Sigma_y$ represents the sample +covariance matrix of $\bx$ given $y$. Second, the inference in the model can be +performed in a closed form by maximum-likelihood estimation: given an +observation $\bx$, the model predicts $\hat{y} = \arg\max_y P(y | \bx)$, where: +\begin{equation}\label{eq:inference} + P(y|\bx) = \frac{P(\bx, y)}{\sum_y P(\bx|y)P(y)} + = \frac{\cN(\bx|\bar{\bx}_y, \Sigma)P(y)}{\sum_y\cN(\bx|\bar{\bx}_y, + \Sigma)P(y)} +\end{equation} +In this setting, the decision boundary between two classes $y_1$ and $y_2$: +$\set{\bx | P(\bx, y_1) = P(\bx, y_2)}$ is an hyperplane \cite{bishop06pattern} +and the mixture of Gaussians model can be viewed as a probabilistic variant of +the nearest-neighbor (NN) classifier in Section~\ref{sec:uniqueness}. -The mixture of Gaussians model has many advantages. First, the model can be -easily learned using maximum-likelihood (ML) estimation \cite{bishop06pattern}. -In particular, $P(y)$ is the frequency of $y$ in the training set, -$\bar{\bx}_y$ is the expectation of $\bx$ given $y$, and the covariance matrix -is computed as $\Sigma = \sum_y P(y) \Sigma_y$, where $\Sigma_y$ represents the -covariance of $\bx$ given $y$. Second, the inference in the model can be -performed in a closed form. In particular, the model predicts $\hat{y} = -\arg\max_y P(y | \bx)$, where: -\begin{align} - P(y | \bx) = - \frac{P(\bx | y) P(y)}{\sum_y P(\bx | y) P(y)} = - \frac{\cN(\bx | \bar{\bx}_y, \Sigma) P(y)}{\sum_y \cN(\bx | \bar{\bx}_y, \Sigma) P(y)}. - \label{eq:inference} -\end{align} In practice, the prediction $\hat{y}$ is accepted when the classifier is -confident. In other words, $P(\hat{y} | \bx) \! > \! \delta$, where $\delta \in +confident, that is, $P(\hat{y} | \bx) \! > \! \delta$, where $\delta \in (0, 1)$ is a threshold that controls the precision and recall of the classifier. In general, the higher the threshold $\delta$, the lower the recall and the higher the precision. -In this work, we use the mixture of Gaussians model for skeleton recognition. -Skeleton measurements are represented by a vector $\bx$ and each person is -assigned to one class $y$. In particlar, our dataset $\cD = \set{(\bx_1, y_1), -\dots, (\bx_n, y_n)}$ consists of $n$ pairs $(\bx_i, y_i)$, where $y_i$ is the -label of the skeleton $\bx_i$. To verify that our method is suitable for -skeleton recognition, we plot for each skeleton feature $x_k$ -(Section~\ref{sec:experiment}) the histogram of differences between all -measurements and the expectation given the class $(\bx_i)_k - \E{}{x_k | y_i}$ -(Figure~\ref{fig:error marginals}). All histograms look approximately normal. -This suggests that all class conditionals $P(\bx | y)$ are multivariate normal -and our generative model, although very simple, may be nearly optimal -\cite{bishop06pattern}. - \begin{figure*}[t] \centering \includegraphics[width=0.99\textwidth]{graphics/limbs.pdf} @@ -65,35 +75,33 @@ and our generative model, although very simple, may be nearly optimal \label{fig:error marginals} \end{figure*} - \subsection{Sequential hypothesis testing} \label{sec:SHT} -The mixture of Gaussians model can be extended to temporal inference through -sequential hypothesis testing. Sequential hypothesis testing -\cite{wald47sequential} is an established statistical framework, where a -subject is sequentially tested for belonging to one of several classes. The -probability that the sequence of data $\bx^{(1)}, \dots, \bx^{(t)}$ belongs to -the class $y$ at time $t$ is given by: -\begin{align} +In our setting (see Section~\ref{seq:experiment-desgin}, skeletons measurements +are not isolated. On the contrary, everytime a person walks in front of the +camera we get a set of time-indexed measurements belonging to the same +individual that we want to classify. The mixture of Gaussians model can be +extended to temporal inference through through the Sequential hypothesis +testing \cite{wald47sequential} framework. In this framework, a subject is +sequentially tested for belonging to one of several class, by assuming that +conditioned on the class, the measurements are independent realisations of the +same random variable. In our case, the probability that the sequence of data +$\bx^{(1)}, \dots, \bx^{(t)}$ belongs to the class $y$ at time $t$ is given by: +\begin{equation}\label{eq:SHT} P(y | \bx^{(1)}, \dots, \bx^{(t)}) = \frac{\prod_{i = 1}^t \cN(\bx^{(i)} | \bar{\bx}_y, \Sigma) P(y)} {\sum_y \prod_{i = 1}^t \cN(\bx^{(i)} | \bar{\bx}_y, \Sigma) P(y)}. - \label{eq:SHT} -\end{align} -In practice, the prediction $\hat{y} = \arg\max_y P(y | \bx^{(1)}, \dots, -\bx^{(t)})$ is accepted when the classifier is confident. In other words, -$P(\hat{y} | \bx^{(1)}, \dots, \bx^{(t)}) > \delta$, where the threshold -$\delta \in (0, 1)$ controls the precision and recall of the predictor. In -general, the higher the threshold $\delta$, the higher the precision and the -lower the recall. +\end{equation} + +As in previous section, the prediction is made using maximum likelihood: +$\hat{y} = \arg\max_y P(y | \bx^{(1)}, \dots, \bx^{(t)})$. In practice, the +prediction is accepted when the classifier is confident, that is $P(\hat{y} +| \bx^{(1)}, \dots, \bx^{(t)}) > \delta$, where the threshold $\delta \in (0, +1)$ controls the precision and recall of the predictor. Sequential hypothesis testing is a common technique for smoothing temporal predictions. In particular, note that the prediction at time $t$ depends on all data up to time $t$. This reduces the variance of predictions, especially when input data are noisy, such as in the domain of skeleton recognition. -In skeleton recognition, the sequence $\bx^{(1)}, \dots, \bx^{(t)}$ is skeleton -measurements of a person walking towards the camera, for instance. If the -Kinect detects multiple people, we use the figure tracking of the tools in -\xref{sec:experiment:dataset} to distinguish individual skeleton sequences. -- cgit v1.2.3-70-g09d2