diff options
| -rw-r--r-- | conclusion.tex | 14 | ||||
| -rw-r--r-- | experimental.tex | 86 | ||||
| -rw-r--r-- | intro.tex | 30 | ||||
| -rw-r--r-- | references.bib | 16 | ||||
| -rw-r--r-- | related.tex | 22 |
5 files changed, 98 insertions, 70 deletions
diff --git a/conclusion.tex b/conclusion.tex index 0d435b1..d79938f 100644 --- a/conclusion.tex +++ b/conclusion.tex @@ -28,11 +28,13 @@ give false positives, which caused skeletons to be fit on a window and a vacuum cleaner during our data collection. Finally, as the resolution of range cameras increases and skeleton fitting -algorithms improve, so will the accuracy of skeleton recognition. Microsoft is -planning on putting the Kinect technology inside -laptops\footnote{\url{http://www.thedaily.com/page/2012/01/27/012712-tech-kinect-laptop/}} -and the Asus Xtion -pro\footnote{\url{http://www.asus.com/Multimedia/Motion_Sensor/Xtion_PRO/}} is -a range camera like the Kinect designed for PCs. The increased usage of range +algorithms improve, so will the accuracy of skeleton recognition. +%Microsoft is +%planning on putting the Kinect technology inside +%laptops\footnote{\url{http://www.thedaily.com/page/2012/01/27/012712-tech-kinect-laptop/}} +%and the Asus Xtion +%pro\footnote{\url{http://www.asus.com/Multimedia/Motion_Sensor/Xtion_PRO/}} is +%a range camera like the Kinect designed for PCs. +The increased usage of range cameras and competition among vendors can only lead to advancements in the associated technologies. diff --git a/experimental.tex b/experimental.tex index 76678ca..a7871eb 100644 --- a/experimental.tex +++ b/experimental.tex @@ -34,24 +34,24 @@ results. \end{center} \end{figure*} - The Kinect outputs three primary signals in real-time: a color image stream, a depth image stream, and microphone output (\fref{fig:hallway}). For our -purposes, we focus on the depth image stream. As the Kinect was designed to -interface directly with the Xbox 360, the tools to interact with it on a PC are -limited. The OpenKinect project released -\textsf{libfreenect}~\cite{libfreenect}, a reverse engineered driver which -gives access to the raw depth images of the Kinect. This raw data could be -used to implement skeleton fitting algorithms, \eg those of -Plagemann~\etal{}~\cite{plagemann:icra10}. Alternatively, -OpenNI~\cite{openni}, an open framework led by PrimeSense, the company behind -the technology of the Kinect, offers figure tracking and skeleton fitting -algorithms on top of raw access to the data streams. More recently, the Kinect -for Windows SDK~\cite{kinect-sdk} was released, also with figure tracking -and skeleton fitting algorithms. -%and its skeleton fitting -%algorithm operates in real-time without calibration. - +purposes, we focus on the depth image stream. +%As the Kinect was designed to +%interface directly with the Xbox 360, the tools to interact with it on a PC are +%limited. The OpenKinect project released +%\textsf{libfreenect}~\cite{libfreenect}, a reverse engineered driver which +%gives access to the raw depth images of the Kinect. This raw data could be +%used to implement skeleton fitting algorithms, \eg those of +%Plagemann~\etal{}~\cite{plagemann:icra10}. Alternatively, +%OpenNI~\cite{openni}, an open framework led by PrimeSense, the company behind +%the technology of the Kinect, offers figure tracking and skeleton fitting +%algorithms on top of raw access to the data streams. More recently, the Kinect +%for Windows SDK~\cite{kinect-sdk} was released, also with figure tracking +%and skeleton fitting algorithms. +%%and its skeleton fitting +%%algorithm operates in real-time without calibration. +% We evaluated both OpenNI and the Kinect SDK for skeleton recognition. The skeleton fitting algorithm of OpenNI requires each individual to strike a specific pose for calibration, making it more difficult to collect a lot of @@ -262,20 +262,28 @@ recognition rates mostly above 90\% for group sizes of 3 and 5. \subsection{Face recognition} In the third experiment, we compare the performance of skeleton recognition -with the performance of face recognition as given by \textsf{face.com}. At the -time of writing, this is the best performing face recognition algorithm on the -LFW dataset\footnote{\url{http://vis-www.cs.umass.edu/lfw/results.html}}. +with the performance of face recognition. For this experiment we set $n_p = 5$ +and train on one half of the data and test on the remaining half. For +comparison, the MoG algorithm is run with the same training-testing +partitioning of the dataset. The results are shown in \fref{fig:face}. +Skeleton recognition performs within 10\% of face recognition at most +thresholds. -We use the REST API of \textsf{face.com} to do face recognition on our dataset. -Due to the restrictions of the API, for this experiment we set $n_p = 5$ and -train on one half of the data and test on the remaining half. For comparison, -the MoG algorithm is run with the same training-testing partitioning of the -dataset. In this setting, SHT is not relevant for the comparison, because -\textsf{face.com} does not give the possibility to mark a sequence of frames as -belonging to the same run. This additional information would be used by the SHT -algorithm and would thus bias the experiment in favor of skeleton recognition. -The results are shown in \fref{fig:face}. Skeleton recognition performs -within 10\% of face recognition at most thresholds. +%In the third experiment, we compare the performance of skeleton recognition +%with the performance of face recognition as given by \textsf{face.com}. At the +%time of writing, this is the best performing face recognition algorithm on the +%LFW dataset\footnote{\url{http://vis-www.cs.umass.edu/lfw/results.html}}. +% +%We use the REST API of \textsf{face.com} to do face recognition on our dataset. +%Due to the restrictions of the API, for this experiment we set $n_p = 5$ and +%train on one half of the data and test on the remaining half. For comparison, +%the MoG algorithm is run with the same training-testing partitioning of the +%dataset. In this setting, SHT is not relevant for the comparison, because +%\textsf{face.com} does not give the possibility to mark a sequence of frames as +%belonging to the same run. This additional information would be used by the SHT +%algorithm and would thus bias the experiment in favor of skeleton recognition. +%The results are shown in \fref{fig:face}. Skeleton recognition performs +%within 10\% of face recognition at most thresholds. %outperforms %skeleton recognition, but by less than 10\% at most thresholds. %These results are promising, given that \textsf{face.com} is the @@ -288,8 +296,8 @@ within 10\% of face recognition at most thresholds. \subsection{Walking away} In the next experiment, we include the runs in which people are walking away -from the Kinect that we could positively identify. The performance of face -recognition outperforms skeleton recognition in the previous setting. However, +from the Kinect that we could positively identify. While, face +recognition outperforms skeleton recognition in the previous setting, there are many cases where only skeleton recognition is possible. For example, when people are walking away from the Kinect. Coming back to the raw data collected during the experiment design, we manually label the runs of people @@ -310,14 +318,6 @@ they are walking towards the camera. % \label{fig:back} %\end{figure} -\begin{figure}[t] - \centering - \includegraphics[width=0.49\textwidth]{graphics/back.pdf} -\vspace{-1.5\baselineskip} -\caption{Results with people walking away from and toward the camera} -\label{fig:back} -\end{figure} - \fref{fig:back} compares the results obtained in \xref{sec:experiment:offline} with people walking toward the camera, with the results of the same experiment on the dataset of runs of people walking away from the camera. @@ -330,6 +330,14 @@ from the camera with similar performance. Note that while we could not obtain enough labeled data for a full comparison when it is dark, manual experiments show similar performance when there is no visible light. +\begin{figure}[t!] + \centering + \includegraphics[width=0.49\textwidth]{graphics/back.pdf} +\vspace{-1.5\baselineskip} +\caption{Results with people walking away from and toward the camera} +\label{fig:back} +\end{figure} + \subsection{Reducing the noise} For the final experiment, we explore the potential of skeleton recognition with @@ -11,7 +11,7 @@ recognition. In recent years, advances in range cameras have given us access to increasingly accurate real-time depth imaging. Furthermore, the low-cost and widely available Kinect~\cite{kinect} has brought range imaging to the masses. In -parallel, the automatic detection of body parts from depth images has led to +parallel, automatic detection of body parts from depth images has led to real-time skeleton fitting. %In this paper we show that skeleton fitting is accurate and unique enough in @@ -19,10 +19,10 @@ real-time skeleton fitting. We make the following contributions. First, we show that ground truth skeleton measurements can uniquely identify a person. Second, we propose two models for skeleton recognition. Finally, we evaluate our hypothesis using real-world -data collected with the Kinect. Our results show that skeleton recognition can +data collected with a Kinect. Our results show that skeleton recognition can identify three people with 95\% accuracy, and five people with 85\% accuracy. -Furthermore, skeleton recognition can be performed in more situations than face -recognition, such as when a person is not facing the camera. +Furthermore, skeleton recognition can be performed in situations where face +recognition cannot, such as when a person is not facing the camera. %As the resolution and accuracy of range cameras improve, so will the accuracy %and precision of skeleton fitting algorithms. @@ -33,16 +33,14 @@ recognition, such as when a person is not facing the camera. The paper is organized as follows. First we discuss prior methods of person recognition, in addition to the advances in the technologies -pertaining to skeleton fitting (Section~\ref{sec:related}). Next we -use a dataset of actual skeletal measurements to show that -skeletons are a unique enough descriptor for person recognition. -(Section~\ref{sec:uniqueness}). We then discuss an error model and the -resulting algorithm to do person recognition -(Section~\ref{sec:algorithms}). Finally, we collect skeleton data with -a Kinect in an uncontrolled setting and we apply preprocessing and -classification algorithms to this dataset -(Section~\ref{sec:experiment}). We evaluate the performance of -skeleton recognition with varying group size and compare it to face -recognition. We conclude by discussing challenges working with the -Kinect and future work (Section~\ref{sec:conclusion}). +pertaining to skeleton fitting (Section~\ref{sec:related}). Next we use a +dataset of actual skeletal measurements to show that skeletons are a unique +enough descriptor for person recognition. (Section~\ref{sec:uniqueness}). We +then discuss an error model and the resulting algorithm to do person +recognition (Section~\ref{sec:algorithms}). Finally, we collect skeleton data +with a Kinect in an uncontrolled setting. We apply preprocessing and +classification algorithms to this dataset and evaluate the performance of +skeleton recognition with varying group size (Section~\ref{sec:experiment}). +We conclude by discussing challenges working with the Kinect and future work +(Section~\ref{sec:conclusion}). diff --git a/references.bib b/references.bib index 0dc0d9e..8fb99e0 100644 --- a/references.bib +++ b/references.bib @@ -425,3 +425,19 @@ pages = "331--342" address = "New York, NY",
year = "1947"
}
+
+@inproceedings{munsell:eccv12,
+ title={Person identification using full-body motion and anthropometric biometrics from kinect videos},
+ author={Munsell, Brent C and Temlyakov, Andrew and Qu, Chengzheng and Wang, Song},
+ booktitle={Computer Vision--ECCV 2012. Workshops and Demonstrations},
+ pages={91--100},
+ year={2012},
+ organization={Springer}
+}
+@article{hofmann:btas12,
+ title={2.5 D Gait Biometrics using the Depth Gradient Histogram Energy Image},
+ author={Hofmann, Martin and Bachmann, Sebastian and Rigoll, Gerhard}
+ booktitle={Biometrics: Theory, Applications and Systems (BTAS), 2012 IEEE Fifth International Conference on}
+ year={Sept.},
+ pages={399-403}
+}
diff --git a/related.tex b/related.tex index 1ac25d9..561c041 100644 --- a/related.tex +++ b/related.tex @@ -19,7 +19,7 @@ Physiological traits include faces, fingerprints, and irises; speech and gait are behavioral. Faces and gait are the most relevant biometrics for this paper as they both can be collected passively and involve image processing. -Approaches to gait recognition fall into two categories: silhouette-based and +Approaches to gait recognition typicaly fall into two categories: silhouette-based and model-based. Silhouette-based techniques recognize gaits from a binary representation of the silhouette as extracted from each image, while model-based techniques fit a 3-D model to the silhouette to better track @@ -56,17 +56,21 @@ limited. However, Zhao~\etal~\cite{zhao20063d} perform gait recognition in 3-D using multiple cameras. By moving to 3-D, many of the problems related to silhouette extraction and model fitting are removed. Additionally we can take advantage of the wealth of research relating to 3-D motion -capture~\cite{mocap-survey}. Specifically, range cameras offer real-time depth -imaging, and the Kinect~\cite{kinect} in particular is a widely available range -camera with a low price point. Figure detection and skeleton fitting have also -been studied in motion capture, mapping to region of interest detectors and -human body part identification or pose estimation respectively in this +capture~\cite{mocap-survey}. +%Specifically, range cameras offer real-time depth +%imaging, and +The Kinect~\cite{kinect} is a widely available 3-D sensor (also known as a +range camera) with a low price point, that has been leveraged to improve gait +recognition~\cite{munsell:eccv12,hoffman:btas12}. Figure detection and +skeleton fitting have also been studied in motion capture, mapping to region of +interest detectors and human body part identification or pose estimation +respectively in this context~\cite{plagemann:icra10,ganapathi:cvpr10,shotton:cvpr11}. Furthermore, OpenNI~\cite{openni} and the Kinect for Windows SDK~\cite{kinect-sdk} are two systems that perform figure detection and skeleton fitting for the Kinect. -Given the maturity of the solutions, we will use existing implementations of figure -detection and skeleton fitting. Therefore this paper will focus primarily on -the classification part of skeleton recognition. +Given the maturity of the solutions, we will use existing implementations of +figure detection and skeleton fitting. Therefore this paper will focus +primarily on the classification part of skeleton recognition. %a person from an image to measure gait, but can also be measured from floor |
