Final changes

author: Thibaut Horel <thibaut.horel@gmail.com> 2012-03-05 14:34:54 -0800
committer: Thibaut Horel <thibaut.horel@gmail.com> 2012-03-05 14:34:54 -0800
commit: 898a85e9d27cac39f403a9a499ea49578a856f4f (patch)
tree: 2501910f1d4aec0123804d0fe5ab76714758f00f /experimental.tex
parent: 75bdb4858889f2af6e074ed9448b6ded1a81cbc4 (diff)
download: kinect-898a85e9d27cac39f403a9a499ea49578a856f4f.tar.gz
1 files changed, 13 insertions, 14 deletions
diff --git a/experimental.tex b/experimental.tex
index 8085321..cc66af4 100644
--- a/experimental.tex
+++ b/experimental.tex
@@ -45,9 +45,9 @@ hallway.  For each frame, the Kinect SDK performs figure detection to identify
 regions of interest.  Then, it fits a skeleton to the identified figures and
 outputs a set of joints in real world coordinates.  The view of the Kinect is
 seen in \fref{fig:hallway}, showing the color image, the depth image with
-figures, and the fitted skeleton of a person in a single frame.  Skeletons are
+detected figures, and the fitted skeleton of a person in a single frame.  Skeletons are
 fit from roughly 1-5 meters away from the Kinect.  For each frame with a
-skeleton we record color image and the positions of the joints.
+skeleton we record the color image and the positions of the joints.
 
 \begin{figure}[t]
   \begin{center}
@@ -78,7 +78,7 @@ Second, we reduce the number of features to nine by using the vertical symmetry
 of the human body: if two body parts are symmetric about the vertical axis, we
 bundle them into one feature by averaging their lengths. If only one of them is
 present, we take its value. If neither of them is present, the feature is
-reported as missing for the frame. Finally, any frame with a missing feature is
+reported as missing for the frame. Any frame with a missing feature is
 filtered out.  The resulting nine features include the six arm, leg, and pelvis
 measurements from \xref{sec:uniqueness}, and three additional measurements:
 spine length, shoulder breadth, and head size.  Here we list the nine features as
@@ -110,9 +110,9 @@ range of the camera, we only keep the frames of a run that are 2-3 meters away
 from the Kinect.
 
 Ground truth person identification is obtained by manually labelling each run
-based on the images captured by the color camera of the Kinect. For ease of
-labelling, only the runs with people walking toward the camera are kept. These
-are the runs where the average distance from the skeleton joints to the camera
+based on the images captured from the color image stream of the Kinect. For ease of
+labelling, only the runs with people walking toward the Kinect are kept. These
+are the runs where the average distance from the skeleton joints to the Kinect
 is increasing.
 
 We perform five experiments.  First, we test the performance of
@@ -219,7 +219,7 @@ we reach 90\% accuracy at 60\% recall for a group size of 10 people.
 
 In the second experiment, we evaluate skeleton recognition in an online
 setting.  Even though the previous evaluation is standard, it does not properly
-reflect reality. A real-life setting could be as follows. The camera is placed
+reflect reality. A real-world setting could be as follows. The camera is placed
 at the entrance of a building. When a person enters the building, his identity
 is detected based on the electronic key system and a new labeled run is added
 to the dataset. The identification algorithm is then retrained on the
@@ -279,9 +279,7 @@ recognition rates mostly above 90\% for group sizes of 3 and 5.
 In the third experiment, we compare the performance of skeleton recognition
 with the performance of face recognition as given by \textsf{face.com}.  At the
 time of writing, this is the best performing face recognition algorithm on the
-LFW dataset~\footnote{\url{http://vis-www.cs.umass.edu/lfw/results.html}}.
-The results show that face recognition has better accuracy than skeleton
-recognition, but not by a large margin.
+LFW dataset\footnote{\url{http://vis-www.cs.umass.edu/lfw/results.html}}.
 
 We use the publicly available REST API of \textsf{face.com} to do face
 recognition on our dataset.  Due to the restrictions of the API, for this
@@ -381,13 +379,14 @@ observation $\bx_i$ is replaced by $\bx_i'$ defined by:
 \begin{equation}
   \bx_i' = \bar{\bx}_{y_i} + \frac{\bx_i-\bar{\bx}_{y_i}}{2}
 \end{equation}
-We believe that a reducing factor of 2 for the noise's variance is realistic
-given the relative low resolution of the Kinect's infrared camera. 
+We believe that reducing the noise's variance by half is realistic
+given the relatively low resolution of the Kinect's infrared camera. 
 
 \fref{fig:var} compares the precision-recall curve of \fref{fig:offline:sht} to
 the curve of the same experiment run on the newly obtained dataset. We observe
-a roughly 20\% increase in performace across most thresholds.  Note that these
-results would significantly outperform face recognition.
+a roughly 20\% increase in performace across most thresholds.  We
+believe these results would significantly outperform face recognition
+in a similar setting.
 
 %\begin{figure}[t]
 %  \begin{center}
author	Thibaut Horel <thibaut.horel@gmail.com>	2012-03-05 14:34:54 -0800
committer	Thibaut Horel <thibaut.horel@gmail.com>	2012-03-05 14:34:54 -0800
commit	898a85e9d27cac39f403a9a499ea49578a856f4f (patch)
tree	2501910f1d4aec0123804d0fe5ab76714758f00f /experimental.tex
parent	75bdb4858889f2af6e074ed9448b6ded1a81cbc4 (diff)
download	kinect-898a85e9d27cac39f403a9a499ea49578a856f4f.tar.gz