summaryrefslogtreecommitdiffstats
path: root/experimental.tex
diff options
context:
space:
mode:
authorJon Whiteaker <jbw@berkeley.edu>2012-09-06 16:41:55 -0700
committerJon Whiteaker <jbw@berkeley.edu>2012-09-06 16:41:55 -0700
commit6af39548a765109ca94ac8162eec1aee7828b8c3 (patch)
tree856565c933f8cbd80fd6d2f547e524114e0dcde4 /experimental.tex
parentbd894794b31290499656e67eb0c81bbed4bcbf56 (diff)
downloadkinect-6af39548a765109ca94ac8162eec1aee7828b8c3.tar.gz
minor updates
Diffstat (limited to 'experimental.tex')
-rw-r--r--experimental.tex50
1 files changed, 27 insertions, 23 deletions
diff --git a/experimental.tex b/experimental.tex
index f59be72..2c537df 100644
--- a/experimental.tex
+++ b/experimental.tex
@@ -43,11 +43,11 @@ We collect data using the Kinect SDK over a period of a week in a research
laboratory setting. The Kinect is placed at the tee of a frequently used
hallway. For each frame, the Kinect SDK performs figure detection to identify
regions of interest. Then, it fits a skeleton to the identified figures and
-outputs a set of joints in real world coordinates. The view of the Kinect is
-seen in \fref{fig:hallway}, showing the color image, the depth image with
-detected figures, and the fitted skeleton of a person in a single frame. Skeletons are
-fit from roughly 1-5 meters away from the Kinect. For each frame with a
-skeleton we record the color image and the positions of the joints.
+outputs a set of joints in real world coordinates. The view from the Kinect
+SDK is seen in \fref{fig:hallway}, showing the color image, the depth image
+with detected figures, and the fitted skeleton of a person in a single frame.
+Skeletons are fit from roughly 1-5 meters away from the Kinect. For each frame
+with a skeleton we record the color image and the positions of the joints.
\begin{figure*}[t]
\begin{center}
@@ -64,6 +64,7 @@ another part of the body. In those cases, the coordinates of these joints are
either absent from the frame or present but tagged as \emph{Inferred} by the
Kinect SDK. Inferred means that even though the joint is not visible in the
frame, the skeleton-fitting algorithm attempts to guess the right location.
+Note that in the experiment design we exclude inferred data points.
\subsection{Experiment design}
@@ -269,19 +270,20 @@ with the performance of face recognition as given by \textsf{face.com}. At the
time of writing, this is the best performing face recognition algorithm on the
LFW dataset\footnote{\url{http://vis-www.cs.umass.edu/lfw/results.html}}.
-We use the publicly available REST API of \textsf{face.com} to do face
-recognition on our dataset. Due to the restrictions of the API, for this
-experiment we set $n_p = 5$ and train on one half of the data and test on the
-remaining half. For comparison, the MoG algorithm is run with the same
-training-testing partitioning of the dataset. In this setting, SHT is not
-relevant for the comparison, because \textsf{face.com} does not give the
-possibility to mark a sequence of frames as belonging to the same run. This
-additional information would be used by the SHT algorithm and would thus bias
-the experiment in favor of skeleton recognition. The results are shown in
-\fref{fig:face}. Face recognition outperforms skeleton recognition, but by
-less than 10\% at most thresholds.
-%These results are promising, given that
-%\textsf{face.com} is the state-of-the-art in face recognition.
+We use the REST API of \textsf{face.com} to do face recognition on our dataset.
+Due to the restrictions of the API, for this experiment we set $n_p = 5$ and
+train on one half of the data and test on the remaining half. For comparison,
+the MoG algorithm is run with the same training-testing partitioning of the
+dataset. In this setting, SHT is not relevant for the comparison, because
+\textsf{face.com} does not give the possibility to mark a sequence of frames as
+belonging to the same run. This additional information would be used by the SHT
+algorithm and would thus bias the experiment in favor of skeleton recognition.
+The results are shown in \fref{fig:face}. Skeleton recognition performs
+within 10\% of face recognition at most thresholds.
+%outperforms
+%skeleton recognition, but by less than 10\% at most thresholds.
+%These results are promising, given that \textsf{face.com} is the
+%state-of-the-art in face recognition.
%However, this result does not take into account the disparity in the number of
%runs which face recognition and skeleton recognition can classify frames,
@@ -337,7 +339,9 @@ datasets are completely disjoint, the SHT algorithm is not learning the same
profile for a person walking toward the camera and for a person walking away
from the camera. The third curve of \fref{fig:back} shows the precision-recall
curve when training and testing on the combined dataset of runs toward and away
-from the camera with similar performance.
+from the camera with similar performance. Note that while we could not obtain
+enough labeled data for a full comparison when it is dark, manual experiments
+show similar performance when there is no visible light.
\subsection{Reducing the noise}
@@ -353,10 +357,10 @@ resolution by artificially reducing the noise from our Kinect dataset.
%the only source of classification error in our algorithm is the dispersion of
%the observed limbs' lengths away from the exact measurements.
To simulate a reduction of the noise level, the dataset is modified as follows:
-we measure the average skeletal profile of each person, and for each frame
-we divide the empirical variance from the average by 2. Formally, using the
-same notations as in Section~\ref{sec:mixture of Gaussians}, each observation
-$\bx_i$ is replaced by $\bx_i'$ defined by:
+we measure the average skeletal profile of each person across the entire
+dataset, and for each frame we divide the empirical variance from the average
+by 2. Formally, using the same notations as in Section~\ref{sec:mixture of
+Gaussians}, each observation $\bx_i$ is replaced by $\bx_i'$ defined by:
\begin{equation}
\bx_i' = \bar{\bx}_{y_i} + \frac{\bx_i-\bar{\bx}_{y_i}}{2}
\end{equation}