summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--conclusion.tex21
-rw-r--r--experimental.tex27
-rw-r--r--related.tex2
-rw-r--r--uniqueness.tex4
4 files changed, 27 insertions, 27 deletions
diff --git a/conclusion.tex b/conclusion.tex
index d423e29..0d435b1 100644
--- a/conclusion.tex
+++ b/conclusion.tex
@@ -2,14 +2,15 @@
\label{sec:conclusion}
In this paper, we present exciting and promising results for skeleton
-recognition. With greater than 90\% accuracy for less than 10 people, skeleton
-recognition can already be used in households, \eg to load personalized
-settings for a home entertainment system. Skeleton recognition performs less
-than 10\% worse than face recognition in the current setting. This is a good
-result considering face recognition has been studied for years and is more
-mature. Furthermore, skeleton recognition works in many situations when face
-recognition does not. For example, when a person is not facing the camera or
-when it is dark.
+recognition. With greater than 90\% accuracy for less than 10 people,
+skeleton recognition can already be used in households, \eg to load
+personalized settings for a home entertainment system. Skeleton
+recognition performs less than 10\% worse than face recognition in the
+current setting. This is a good result considering face recognition
+has been studied for years and is more mature \cite{face-survey}.
+Furthermore, skeleton recognition works in many situations when face
+recognition does not. For example, when a person is not facing the
+camera or when it is dark.
%we introduce skeleton recognition. We show that skeleton
%measurements are unique enough to distinguish individuals using a dataset of
@@ -29,9 +30,9 @@ vacuum cleaner during our data collection.
Finally, as the resolution of range cameras increases and skeleton fitting
algorithms improve, so will the accuracy of skeleton recognition. Microsoft is
planning on putting the Kinect technology inside
-laptops~\footnote{\url{http://www.thedaily.com/page/2012/01/27/012712-tech-kinect-laptop/}}
+laptops\footnote{\url{http://www.thedaily.com/page/2012/01/27/012712-tech-kinect-laptop/}}
and the Asus Xtion
-pro~\footnote{\url{http://www.asus.com/Multimedia/Motion_Sensor/Xtion_PRO/}} is
+pro\footnote{\url{http://www.asus.com/Multimedia/Motion_Sensor/Xtion_PRO/}} is
a range camera like the Kinect designed for PCs. The increased usage of range
cameras and competition among vendors can only lead to advancements in the
associated technologies.
diff --git a/experimental.tex b/experimental.tex
index 8085321..cc66af4 100644
--- a/experimental.tex
+++ b/experimental.tex
@@ -45,9 +45,9 @@ hallway. For each frame, the Kinect SDK performs figure detection to identify
regions of interest. Then, it fits a skeleton to the identified figures and
outputs a set of joints in real world coordinates. The view of the Kinect is
seen in \fref{fig:hallway}, showing the color image, the depth image with
-figures, and the fitted skeleton of a person in a single frame. Skeletons are
+detected figures, and the fitted skeleton of a person in a single frame. Skeletons are
fit from roughly 1-5 meters away from the Kinect. For each frame with a
-skeleton we record color image and the positions of the joints.
+skeleton we record the color image and the positions of the joints.
\begin{figure}[t]
\begin{center}
@@ -78,7 +78,7 @@ Second, we reduce the number of features to nine by using the vertical symmetry
of the human body: if two body parts are symmetric about the vertical axis, we
bundle them into one feature by averaging their lengths. If only one of them is
present, we take its value. If neither of them is present, the feature is
-reported as missing for the frame. Finally, any frame with a missing feature is
+reported as missing for the frame. Any frame with a missing feature is
filtered out. The resulting nine features include the six arm, leg, and pelvis
measurements from \xref{sec:uniqueness}, and three additional measurements:
spine length, shoulder breadth, and head size. Here we list the nine features as
@@ -110,9 +110,9 @@ range of the camera, we only keep the frames of a run that are 2-3 meters away
from the Kinect.
Ground truth person identification is obtained by manually labelling each run
-based on the images captured by the color camera of the Kinect. For ease of
-labelling, only the runs with people walking toward the camera are kept. These
-are the runs where the average distance from the skeleton joints to the camera
+based on the images captured from the color image stream of the Kinect. For ease of
+labelling, only the runs with people walking toward the Kinect are kept. These
+are the runs where the average distance from the skeleton joints to the Kinect
is increasing.
We perform five experiments. First, we test the performance of
@@ -219,7 +219,7 @@ we reach 90\% accuracy at 60\% recall for a group size of 10 people.
In the second experiment, we evaluate skeleton recognition in an online
setting. Even though the previous evaluation is standard, it does not properly
-reflect reality. A real-life setting could be as follows. The camera is placed
+reflect reality. A real-world setting could be as follows. The camera is placed
at the entrance of a building. When a person enters the building, his identity
is detected based on the electronic key system and a new labeled run is added
to the dataset. The identification algorithm is then retrained on the
@@ -279,9 +279,7 @@ recognition rates mostly above 90\% for group sizes of 3 and 5.
In the third experiment, we compare the performance of skeleton recognition
with the performance of face recognition as given by \textsf{face.com}. At the
time of writing, this is the best performing face recognition algorithm on the
-LFW dataset~\footnote{\url{http://vis-www.cs.umass.edu/lfw/results.html}}.
-The results show that face recognition has better accuracy than skeleton
-recognition, but not by a large margin.
+LFW dataset\footnote{\url{http://vis-www.cs.umass.edu/lfw/results.html}}.
We use the publicly available REST API of \textsf{face.com} to do face
recognition on our dataset. Due to the restrictions of the API, for this
@@ -381,13 +379,14 @@ observation $\bx_i$ is replaced by $\bx_i'$ defined by:
\begin{equation}
\bx_i' = \bar{\bx}_{y_i} + \frac{\bx_i-\bar{\bx}_{y_i}}{2}
\end{equation}
-We believe that a reducing factor of 2 for the noise's variance is realistic
-given the relative low resolution of the Kinect's infrared camera.
+We believe that reducing the noise's variance by half is realistic
+given the relatively low resolution of the Kinect's infrared camera.
\fref{fig:var} compares the precision-recall curve of \fref{fig:offline:sht} to
the curve of the same experiment run on the newly obtained dataset. We observe
-a roughly 20\% increase in performace across most thresholds. Note that these
-results would significantly outperform face recognition.
+a roughly 20\% increase in performace across most thresholds. We
+believe these results would significantly outperform face recognition
+in a similar setting.
%\begin{figure}[t]
% \begin{center}
diff --git a/related.tex b/related.tex
index f6ac87f..a7fd7e3 100644
--- a/related.tex
+++ b/related.tex
@@ -64,7 +64,7 @@ human body part identification or pose estimation respectively in this
context~\cite{plagemann:icra10,ganapathi:cvpr10,shotton:cvpr11}. Furthermore,
OpenNI~\cite{openni} and the Kinect for Windows SDK~\cite{kinect-sdk} are two
systems that perform figure detection and skeleton fitting for the Kinect.
-Given the maturity of the solutions, we will use implementations of figure
+Given the maturity of the solutions, we will use existing implementations of figure
detection and skeleton fitting. Therefore this paper will focus primarily on
the classification part of skeleton recognition.
diff --git a/uniqueness.tex b/uniqueness.tex
index 6c90310..1814446 100644
--- a/uniqueness.tex
+++ b/uniqueness.tex
@@ -83,10 +83,10 @@ as well as the ROC of the best performing face detection algorithm in
the image-restricted LFW benchmark: \emph{Associate-Predict}
\cite{associate}.
-The results show that with a standard deviation of 3mm, nearest
+The results show that with a standard deviation of 3 millimeters, nearest
neighbor performs quite similarly to face detection at low
false-positive rate. At this noise level, the error is smaller than
-1cm with 99.9\% probability. Even with a standard deviation of 5mm, it
+1 centimeter with 99.9\% probability. Even with a standard deviation of 5 millimeters, it
is still possible to detect 90\% of the matched pairs with a false
positive rate of 6\%.