summaryrefslogtreecommitdiffstats
path: root/experimental.tex
diff options
context:
space:
mode:
authorThibaut Horel <thibaut.horel@gmail.com>2012-03-05 14:34:54 -0800
committerThibaut Horel <thibaut.horel@gmail.com>2012-03-05 14:34:54 -0800
commit898a85e9d27cac39f403a9a499ea49578a856f4f (patch)
tree2501910f1d4aec0123804d0fe5ab76714758f00f /experimental.tex
parent75bdb4858889f2af6e074ed9448b6ded1a81cbc4 (diff)
downloadkinect-898a85e9d27cac39f403a9a499ea49578a856f4f.tar.gz
Final changes
Diffstat (limited to 'experimental.tex')
-rw-r--r--experimental.tex27
1 files changed, 13 insertions, 14 deletions
diff --git a/experimental.tex b/experimental.tex
index 8085321..cc66af4 100644
--- a/experimental.tex
+++ b/experimental.tex
@@ -45,9 +45,9 @@ hallway. For each frame, the Kinect SDK performs figure detection to identify
regions of interest. Then, it fits a skeleton to the identified figures and
outputs a set of joints in real world coordinates. The view of the Kinect is
seen in \fref{fig:hallway}, showing the color image, the depth image with
-figures, and the fitted skeleton of a person in a single frame. Skeletons are
+detected figures, and the fitted skeleton of a person in a single frame. Skeletons are
fit from roughly 1-5 meters away from the Kinect. For each frame with a
-skeleton we record color image and the positions of the joints.
+skeleton we record the color image and the positions of the joints.
\begin{figure}[t]
\begin{center}
@@ -78,7 +78,7 @@ Second, we reduce the number of features to nine by using the vertical symmetry
of the human body: if two body parts are symmetric about the vertical axis, we
bundle them into one feature by averaging their lengths. If only one of them is
present, we take its value. If neither of them is present, the feature is
-reported as missing for the frame. Finally, any frame with a missing feature is
+reported as missing for the frame. Any frame with a missing feature is
filtered out. The resulting nine features include the six arm, leg, and pelvis
measurements from \xref{sec:uniqueness}, and three additional measurements:
spine length, shoulder breadth, and head size. Here we list the nine features as
@@ -110,9 +110,9 @@ range of the camera, we only keep the frames of a run that are 2-3 meters away
from the Kinect.
Ground truth person identification is obtained by manually labelling each run
-based on the images captured by the color camera of the Kinect. For ease of
-labelling, only the runs with people walking toward the camera are kept. These
-are the runs where the average distance from the skeleton joints to the camera
+based on the images captured from the color image stream of the Kinect. For ease of
+labelling, only the runs with people walking toward the Kinect are kept. These
+are the runs where the average distance from the skeleton joints to the Kinect
is increasing.
We perform five experiments. First, we test the performance of
@@ -219,7 +219,7 @@ we reach 90\% accuracy at 60\% recall for a group size of 10 people.
In the second experiment, we evaluate skeleton recognition in an online
setting. Even though the previous evaluation is standard, it does not properly
-reflect reality. A real-life setting could be as follows. The camera is placed
+reflect reality. A real-world setting could be as follows. The camera is placed
at the entrance of a building. When a person enters the building, his identity
is detected based on the electronic key system and a new labeled run is added
to the dataset. The identification algorithm is then retrained on the
@@ -279,9 +279,7 @@ recognition rates mostly above 90\% for group sizes of 3 and 5.
In the third experiment, we compare the performance of skeleton recognition
with the performance of face recognition as given by \textsf{face.com}. At the
time of writing, this is the best performing face recognition algorithm on the
-LFW dataset~\footnote{\url{http://vis-www.cs.umass.edu/lfw/results.html}}.
-The results show that face recognition has better accuracy than skeleton
-recognition, but not by a large margin.
+LFW dataset\footnote{\url{http://vis-www.cs.umass.edu/lfw/results.html}}.
We use the publicly available REST API of \textsf{face.com} to do face
recognition on our dataset. Due to the restrictions of the API, for this
@@ -381,13 +379,14 @@ observation $\bx_i$ is replaced by $\bx_i'$ defined by:
\begin{equation}
\bx_i' = \bar{\bx}_{y_i} + \frac{\bx_i-\bar{\bx}_{y_i}}{2}
\end{equation}
-We believe that a reducing factor of 2 for the noise's variance is realistic
-given the relative low resolution of the Kinect's infrared camera.
+We believe that reducing the noise's variance by half is realistic
+given the relatively low resolution of the Kinect's infrared camera.
\fref{fig:var} compares the precision-recall curve of \fref{fig:offline:sht} to
the curve of the same experiment run on the newly obtained dataset. We observe
-a roughly 20\% increase in performace across most thresholds. Note that these
-results would significantly outperform face recognition.
+a roughly 20\% increase in performace across most thresholds. We
+believe these results would significantly outperform face recognition
+in a similar setting.
%\begin{figure}[t]
% \begin{center}