diff options
| author | Jon Whiteaker <jbw@berkeley.edu> | 2012-03-04 18:19:20 -0800 |
|---|---|---|
| committer | Jon Whiteaker <jbw@berkeley.edu> | 2012-03-04 18:21:49 -0800 |
| commit | ed0be68bfe1098830cc860a0bf3862ec8693aa2e (patch) | |
| tree | 66a44d2cf2522e569ff295080a8ea4b614a90afc | |
| parent | 3dc183008e040aef7c64a4a5ede9557856326e31 (diff) | |
| download | kinect-ed0be68bfe1098830cc860a0bf3862ec8693aa2e.tar.gz | |
jon's pass on first half of section 5
| -rw-r--r-- | conclusion.tex | 2 | ||||
| -rw-r--r-- | data/combined/scriptSkeleton.m | 6 | ||||
| -rw-r--r-- | experimental.tex | 139 | ||||
| -rw-r--r-- | uniqueness.tex | 2 |
4 files changed, 88 insertions, 61 deletions
diff --git a/conclusion.tex b/conclusion.tex index 1270fce..7a3e9ed 100644 --- a/conclusion.tex +++ b/conclusion.tex @@ -3,7 +3,7 @@ In this paper, we introduce skeleton recognition. We show that skeleton measurements are unique enough to distinguish individuals using a dataset of -real skeletons. We present an probabilistic model for recognition, and extend +real skeletons. We present a probabilistic model for recognition, and extend it to take advantage of consecutive frames. Finally we test our model by collecting data for a week in a real-world setting. Our results show that skeleton recognition performs close to face recognition, and it can be used in diff --git a/data/combined/scriptSkeleton.m b/data/combined/scriptSkeleton.m index 82f04b7..ef5b382 100644 --- a/data/combined/scriptSkeleton.m +++ b/data/combined/scriptSkeleton.m @@ -124,9 +124,9 @@ for ex = 1 : numEx % evaluation
for i = 1 : 1000
- %th = 5 - i;
- %sub = test(norm1(test) < th);
- th = i/1000
+ th = 5 - i/100;
+ sub = test(norm1(test) < th);
+ %th = i/1000
sub = test(exp(conf(test)) > th);
prec(ex, i) = mean(y(sub) == yp(sub));
recall(ex, i) = length(sub) / length(test);
diff --git a/experimental.tex b/experimental.tex index 8bd7b9d..6c8146c 100644 --- a/experimental.tex +++ b/experimental.tex @@ -2,27 +2,32 @@ \label{sec:experiment} We conduct a real-life uncontrolled experiment using the Kinect to test to the -algorithm. First we present the manner and environment in which we perform +algorithm. First we describe our approach to data collection. Second we describe how the data is processed and classified. Finally, we discuss the results. \subsection{Dataset} The Kinect outputs three primary signals in real-time: a color image stream, a -depth image stream, and microphone output. For our purposes, we focus on the -depth image stream. As the Kinect was designed to interface directly with the -Xbox 360, the tools to interact with it on a PC are limited. -Libfreenect~\cite{libfreenect} is a reverse engineered driver which gives -access to the raw depth images from the Kinect. This raw data could be used to -implement the algorithms \eg of Plagemann~\etal{}~\cite{plagemann:icra10}. -Alternatively, OpenNI~\cite{openni}, a framework sponsored by -PrimeSense~\cite{primesense}, the company behind the technology of the Kinect, -offers figure detection and skeleton fitting algorithms on top of raw access to -the data streams. However, the skeleton fitting algorithm of OpenNI requires -each individual to strike a specific pose for calibration. More recently, the -Kinect for Windows SDK~\cite{kinect-sdk} was released, and its skeleton fitting -algorithm operates in real-time without calibration. Given that the Kinect for -Windows SDK is the state-of-the-art, we use it to perform our data collection. +depth image stream, and microphone output (\fref{fig:hallway}). For our +purposes, we focus on the depth image stream. As the Kinect was designed to +interface directly with the Xbox 360, the tools to interact with it on a PC are +limited. The OpenKinect project released libfreenect~\cite{libfreenect}, a +reverse engineered driver which gives access to the raw depth images of the +Kinect. This raw data could be used to implement skeleton fitting algorithms, +\eg those of Plagemann~\etal{}~\cite{plagemann:icra10}. Alternatively, +OpenNI~\cite{openni}, an open framework led by PrimeSense, the company behind +the technology of the Kinect, offers figure detection and skeleton fitting +algorithms on top of raw access to the data streams. More recently, the Kinect +for Windows SDK~\cite{kinect-sdk} was released, and its skeleton fitting +algorithm operates in real-time without calibration. + +Prior to the release of the Kinect SDK, we experimented with using OpenNI for +skeleton recognition with positive results. Unfortunately, the skeleton +fitting algorithm of OpenNI requires each individual to strike a specific pose +for calibration, making it more difficult to collect a lot of data. Upon the +release of the Kinect SDK, we selected it to perform our data collection, given +that it is the state-of-the-art and does not require calibration. We collect data using the Kinect SDK over a period of a week in a research laboratory setting. The Kinect is placed at the tee of a well traversed @@ -66,20 +71,43 @@ Second, we reduce the number of features to nine by using the vertical symmetry of the human body: if two body parts are symmetric about the vertical axis, we bundle them into one feature by averaging their lengths. If only one of them is present, we take its value. If neither of them is present, the feature is -reported as missing for the frame. The resulting nine features include the six -arm, leg, and pelvis measurements from \xref{sec:uniqueness}, and three -additional measurements: spine length, shoulder breadth, and head size. -Finally, any frame with a missing feature is filtered out. +reported as missing for the frame. Finally, any frame with a missing feature is +filtered out. The resulting nine features include the six arm, leg, and pelvis +measurements from \xref{sec:uniqueness}, and three additional measurements: +spine length, shoulder breadth, and head size. Here we list the nine features as +pairs of joints: %The resulting nine features are: Head-ShoulderCenter, ShoulderCenter-Shoulder, %Shoulder-Elbow, Elbow-Wrist, ShoulderCenter-Spine, Spine-HipCenter, %HipCenter-HipSide, HipSide-Knee, Knee-Ankle. +\vspace{-1.5\baselineskip} +\begin{table} +\begin{center} +\begin{tabular}{ll} +Head-ShoulderCenter & Spine-HipCenter\\ +ShoulderCenter-Shoulder & HipCenter-Hip\\ +Shoulder-Elbow & Hip-Knee\\ +Elbow-Wrist & Knee-Ankle\\ +ShoulderCenter-Spine &\\ +\end{tabular} +\end{center} +\end{table} +\vspace{-2.5\baselineskip} + Each detected skeleton also has an ID number which identifies the figure it maps to from the figure detection stage. When there are consecutive frames with the same ID, it means that the skeleton-fitting algorithm was able to detect the skeleton in a contiguous way. This allows us to define the concept of a \emph{run}: a sequence of frames with the same skeleton ID. +We perform five experiments. First, we test the performance of skeleton +recognition using traditional 10-fold cross validation, to represent an offline +setting. Second, we run our algorithms in an online setting by training and +testing the data over time. Third, we pit skeleton recognition against the +state-of-the-art in face recognition. Next, we test how our solution performs +when people are walking away from the camera. Finally, we study what happens +if the noise from the Kinect is reduced. + %\begin{table} %\begin{center} %\caption{Data set statistics. The right part of the table shows the @@ -110,26 +138,28 @@ the skeleton in a contiguous way. This allows us to define the concept of a \subsection{Offline learning setting} +In the first experiment, we study the accuracy of skeleton recognition using +10-fold cross validation. The data set is partitioned into 10 continuous time +sequences of equal size. For a given recall threshold, the algorithm is trained +on 9 continuous time sequences and trained on the last one. This is repeated +for the 10 possible testing subsamples. Averaging the prediction rate over +these 10 training-testing experiments yields the prediction rate for the chosen +threshold. We test the mixture of Gaussians (MoG) and sequential hypothesis +testing (SHT) models, and find that SHT generally performs better than MoG, and +that accuracy increases as group size decreases. -The mixture of Gaussians model is evaluated on the whole dataset by -doing 10-fold cross validation: the data set is partitioned into 10 -subsamples of equal size. For a given recall threshold, the algorithm -is trained on 9 subsamples and trained on the last one. This is -repeated for the 10 possible testing subsample. Averaging the -prediction rate over these 10 training-testing experiments yields the -prediction rate for the chosen threshold. -\fref{fig:offline} shows the precision-recall plot as the -threshold varies. Several curves are obtained for different group -sizes: people are ordered based on their numbers of frames, and all -the frames belonging to someone beyond a given rank in this ordering -are removed from the data set. The decrease of performance when -increasing the number of people in the data set can be explained -by the overlaps between skeleton profiles due to the noise, as -discussed in Section~\ref{sec:uniqueness}, but also by the very few -number of runs available for the least present people, as seen in -\fref{fig:frames}, which does not permit a proper training of -the algorithm. +\fref{fig:offline} shows the precision-recall plot as the threshold varies. +Both algrithms perform better than three times the majority class baseline of +15\% with a recall of 100\% on all people. Several curves are obtained for +different group sizes: people are ordered based on their frequency of +appearance (\fref{fig:frames}, and all the frames belonging to people beyond a +given rank in this ordering are removed. The decrease of performance when +increasing the number of people in the data set can be explained by the +overlaps between skeleton profiles due to the noise, as discussed in +Section~\ref{sec:uniqueness}, but also by the very few number of runs available +for the least present people, as seen in \fref{fig:frames}, which does not +permit a proper training of the algorithm. \begin{figure*}[t] \begin{center} @@ -141,9 +171,7 @@ the algorithm. \includegraphics[]{graphics/offline-sht.pdf} \label{fig:offline:sht} } - \caption{Precision-recall curve for the mixture of Gaussians model - with 10-fold cross validation. The data set is restricted to the top - $n_p$ most present people} + \caption{Results with 10-fold cross validation for the top $n_p$ most present people} \label{fig:offline} \end{center} \end{figure*} @@ -160,15 +188,16 @@ the algorithm. \subsection{Online learning setting} -Even though the previous evaluation is standard, it does not properly -reflect the reality. A real-life setting could be the following: the -camera is placed at the entrance of a building. When a person enters -the building, his identity is detected based on the electronic key -system and a new labeled run is added to the data set. The -identification algorithm is then retrained on the augmented data set, -and the newly obtained classifier can be deployed in the building. +In the second experiment, we evaluate skeleton recognition in an online +setting. Even though the previous evaluation is standard, it does not properly +reflect reality. A real-life setting could be as follows. The camera is placed +at the entrance of a building. When a person enters the building, his identity +is detected based on the electronic key system and a new labeled run is added +to the data set. The identification algorithm is then retrained on the +augmented data set, and the newly obtained classifier can be deployed in the +building. The results of -In this setting, the Sequential Hypothesis Testing (SHT) algorithm is more +In this setting, the sequential hypothesis testing (SHT) algorithm is more suitable than the algorithm used in the previous paragraph, because it accounts for the fact that a person identity does not change across a run. The analysis is therefore performed by partitioning the dataset @@ -185,11 +214,11 @@ experiments. \includegraphics[width=0.49\textwidth]{graphics/online-nb.pdf} \label{fig:online:nb} } -\subfloat[Sequential Hypothesis Learning]{ +\subfloat[Sequential hypothesis testing]{ \includegraphics[width=0.49\textwidth]{graphics/online-sht.pdf} \label{fig:online:sht} } -\caption{Precision-recall curves for the online setting. $n_p$ is the size of +\caption{Results for the online setting, where $n_p$ is the size of the group as in Figure~\ref{fig:offline}} \label{fig:online} \end{center} @@ -218,7 +247,7 @@ algorithm and would thus bias the results in favor of skeleton recognition. \includegraphics[width=0.49\textwidth]{graphics/face.pdf} \end{center} \vspace{-1.5\baselineskip} - \caption{Precision-recall curve for face recognition and skeleton recognition} + \caption{Results for face recognition versus skeleton recognition} \label{fig:face} } \parbox[t]{0.49\linewidth}{ @@ -226,9 +255,7 @@ algorithm and would thus bias the results in favor of skeleton recognition. \includegraphics[width=0.49\textwidth]{graphics/back.pdf} \end{center} \vspace{-1.5\baselineskip} - \caption{Precision-recall curve - with people walking away - from and toward the camera} + \caption{Results with people walking away from and toward the camera} \label{fig:back} } \end{figure} @@ -298,9 +325,7 @@ the newly obtained data set. \includegraphics[width=0.49\textwidth]{graphics/var.pdf} \end{center} \vspace{-1.5\baselineskip} - \caption{Precision-recall curve for the sequential hypothesis - testing algorithm in the online setting for all the people with and - without halving the variance of the noise} + \caption{Results with and without halving the variance of the noise} \label{fig:var} \end{figure} diff --git a/uniqueness.tex b/uniqueness.tex index 84ec9ce..5bd55e4 100644 --- a/uniqueness.tex +++ b/uniqueness.tex @@ -106,6 +106,8 @@ there are two distinct sources of noise: \item the noise of the new measurement: this comes from the device doing the measurement. \end{itemize} +In \xref{sec:experiment} we show that we can learn good models despite this +noise. %%% Local Variables: %%% mode: latex |
