diff options
| author | Thibaut Horel <thibaut.horel@gmail.com> | 2012-03-05 14:34:54 -0800 |
|---|---|---|
| committer | Thibaut Horel <thibaut.horel@gmail.com> | 2012-03-05 14:34:54 -0800 |
| commit | 898a85e9d27cac39f403a9a499ea49578a856f4f (patch) | |
| tree | 2501910f1d4aec0123804d0fe5ab76714758f00f | |
| parent | 75bdb4858889f2af6e074ed9448b6ded1a81cbc4 (diff) | |
| download | kinect-898a85e9d27cac39f403a9a499ea49578a856f4f.tar.gz | |
Final changes
| -rw-r--r-- | conclusion.tex | 21 | ||||
| -rw-r--r-- | experimental.tex | 27 | ||||
| -rw-r--r-- | related.tex | 2 | ||||
| -rw-r--r-- | uniqueness.tex | 4 |
4 files changed, 27 insertions, 27 deletions
diff --git a/conclusion.tex b/conclusion.tex index d423e29..0d435b1 100644 --- a/conclusion.tex +++ b/conclusion.tex @@ -2,14 +2,15 @@ \label{sec:conclusion} In this paper, we present exciting and promising results for skeleton -recognition. With greater than 90\% accuracy for less than 10 people, skeleton -recognition can already be used in households, \eg to load personalized -settings for a home entertainment system. Skeleton recognition performs less -than 10\% worse than face recognition in the current setting. This is a good -result considering face recognition has been studied for years and is more -mature. Furthermore, skeleton recognition works in many situations when face -recognition does not. For example, when a person is not facing the camera or -when it is dark. +recognition. With greater than 90\% accuracy for less than 10 people, +skeleton recognition can already be used in households, \eg to load +personalized settings for a home entertainment system. Skeleton +recognition performs less than 10\% worse than face recognition in the +current setting. This is a good result considering face recognition +has been studied for years and is more mature \cite{face-survey}. +Furthermore, skeleton recognition works in many situations when face +recognition does not. For example, when a person is not facing the +camera or when it is dark. %we introduce skeleton recognition. We show that skeleton %measurements are unique enough to distinguish individuals using a dataset of @@ -29,9 +30,9 @@ vacuum cleaner during our data collection. Finally, as the resolution of range cameras increases and skeleton fitting algorithms improve, so will the accuracy of skeleton recognition. Microsoft is planning on putting the Kinect technology inside -laptops~\footnote{\url{http://www.thedaily.com/page/2012/01/27/012712-tech-kinect-laptop/}} +laptops\footnote{\url{http://www.thedaily.com/page/2012/01/27/012712-tech-kinect-laptop/}} and the Asus Xtion -pro~\footnote{\url{http://www.asus.com/Multimedia/Motion_Sensor/Xtion_PRO/}} is +pro\footnote{\url{http://www.asus.com/Multimedia/Motion_Sensor/Xtion_PRO/}} is a range camera like the Kinect designed for PCs. The increased usage of range cameras and competition among vendors can only lead to advancements in the associated technologies. diff --git a/experimental.tex b/experimental.tex index 8085321..cc66af4 100644 --- a/experimental.tex +++ b/experimental.tex @@ -45,9 +45,9 @@ hallway. For each frame, the Kinect SDK performs figure detection to identify regions of interest. Then, it fits a skeleton to the identified figures and outputs a set of joints in real world coordinates. The view of the Kinect is seen in \fref{fig:hallway}, showing the color image, the depth image with -figures, and the fitted skeleton of a person in a single frame. Skeletons are +detected figures, and the fitted skeleton of a person in a single frame. Skeletons are fit from roughly 1-5 meters away from the Kinect. For each frame with a -skeleton we record color image and the positions of the joints. +skeleton we record the color image and the positions of the joints. \begin{figure}[t] \begin{center} @@ -78,7 +78,7 @@ Second, we reduce the number of features to nine by using the vertical symmetry of the human body: if two body parts are symmetric about the vertical axis, we bundle them into one feature by averaging their lengths. If only one of them is present, we take its value. If neither of them is present, the feature is -reported as missing for the frame. Finally, any frame with a missing feature is +reported as missing for the frame. Any frame with a missing feature is filtered out. The resulting nine features include the six arm, leg, and pelvis measurements from \xref{sec:uniqueness}, and three additional measurements: spine length, shoulder breadth, and head size. Here we list the nine features as @@ -110,9 +110,9 @@ range of the camera, we only keep the frames of a run that are 2-3 meters away from the Kinect. Ground truth person identification is obtained by manually labelling each run -based on the images captured by the color camera of the Kinect. For ease of -labelling, only the runs with people walking toward the camera are kept. These -are the runs where the average distance from the skeleton joints to the camera +based on the images captured from the color image stream of the Kinect. For ease of +labelling, only the runs with people walking toward the Kinect are kept. These +are the runs where the average distance from the skeleton joints to the Kinect is increasing. We perform five experiments. First, we test the performance of @@ -219,7 +219,7 @@ we reach 90\% accuracy at 60\% recall for a group size of 10 people. In the second experiment, we evaluate skeleton recognition in an online setting. Even though the previous evaluation is standard, it does not properly -reflect reality. A real-life setting could be as follows. The camera is placed +reflect reality. A real-world setting could be as follows. The camera is placed at the entrance of a building. When a person enters the building, his identity is detected based on the electronic key system and a new labeled run is added to the dataset. The identification algorithm is then retrained on the @@ -279,9 +279,7 @@ recognition rates mostly above 90\% for group sizes of 3 and 5. In the third experiment, we compare the performance of skeleton recognition with the performance of face recognition as given by \textsf{face.com}. At the time of writing, this is the best performing face recognition algorithm on the -LFW dataset~\footnote{\url{http://vis-www.cs.umass.edu/lfw/results.html}}. -The results show that face recognition has better accuracy than skeleton -recognition, but not by a large margin. +LFW dataset\footnote{\url{http://vis-www.cs.umass.edu/lfw/results.html}}. We use the publicly available REST API of \textsf{face.com} to do face recognition on our dataset. Due to the restrictions of the API, for this @@ -381,13 +379,14 @@ observation $\bx_i$ is replaced by $\bx_i'$ defined by: \begin{equation} \bx_i' = \bar{\bx}_{y_i} + \frac{\bx_i-\bar{\bx}_{y_i}}{2} \end{equation} -We believe that a reducing factor of 2 for the noise's variance is realistic -given the relative low resolution of the Kinect's infrared camera. +We believe that reducing the noise's variance by half is realistic +given the relatively low resolution of the Kinect's infrared camera. \fref{fig:var} compares the precision-recall curve of \fref{fig:offline:sht} to the curve of the same experiment run on the newly obtained dataset. We observe -a roughly 20\% increase in performace across most thresholds. Note that these -results would significantly outperform face recognition. +a roughly 20\% increase in performace across most thresholds. We +believe these results would significantly outperform face recognition +in a similar setting. %\begin{figure}[t] % \begin{center} diff --git a/related.tex b/related.tex index f6ac87f..a7fd7e3 100644 --- a/related.tex +++ b/related.tex @@ -64,7 +64,7 @@ human body part identification or pose estimation respectively in this context~\cite{plagemann:icra10,ganapathi:cvpr10,shotton:cvpr11}. Furthermore, OpenNI~\cite{openni} and the Kinect for Windows SDK~\cite{kinect-sdk} are two systems that perform figure detection and skeleton fitting for the Kinect. -Given the maturity of the solutions, we will use implementations of figure +Given the maturity of the solutions, we will use existing implementations of figure detection and skeleton fitting. Therefore this paper will focus primarily on the classification part of skeleton recognition. diff --git a/uniqueness.tex b/uniqueness.tex index 6c90310..1814446 100644 --- a/uniqueness.tex +++ b/uniqueness.tex @@ -83,10 +83,10 @@ as well as the ROC of the best performing face detection algorithm in the image-restricted LFW benchmark: \emph{Associate-Predict} \cite{associate}. -The results show that with a standard deviation of 3mm, nearest +The results show that with a standard deviation of 3 millimeters, nearest neighbor performs quite similarly to face detection at low false-positive rate. At this noise level, the error is smaller than -1cm with 99.9\% probability. Even with a standard deviation of 5mm, it +1 centimeter with 99.9\% probability. Even with a standard deviation of 5 millimeters, it is still possible to detect 90\% of the matched pairs with a false positive rate of 6\%. |
