diff options
| author | Jon Whiteaker <jbw@berkeley.edu> | 2012-09-06 16:41:55 -0700 |
|---|---|---|
| committer | Jon Whiteaker <jbw@berkeley.edu> | 2012-09-06 16:41:55 -0700 |
| commit | 6af39548a765109ca94ac8162eec1aee7828b8c3 (patch) | |
| tree | 856565c933f8cbd80fd6d2f547e524114e0dcde4 /experimental.tex | |
| parent | bd894794b31290499656e67eb0c81bbed4bcbf56 (diff) | |
| download | kinect-6af39548a765109ca94ac8162eec1aee7828b8c3.tar.gz | |
minor updates
Diffstat (limited to 'experimental.tex')
| -rw-r--r-- | experimental.tex | 50 |
1 files changed, 27 insertions, 23 deletions
diff --git a/experimental.tex b/experimental.tex index f59be72..2c537df 100644 --- a/experimental.tex +++ b/experimental.tex @@ -43,11 +43,11 @@ We collect data using the Kinect SDK over a period of a week in a research laboratory setting. The Kinect is placed at the tee of a frequently used hallway. For each frame, the Kinect SDK performs figure detection to identify regions of interest. Then, it fits a skeleton to the identified figures and -outputs a set of joints in real world coordinates. The view of the Kinect is -seen in \fref{fig:hallway}, showing the color image, the depth image with -detected figures, and the fitted skeleton of a person in a single frame. Skeletons are -fit from roughly 1-5 meters away from the Kinect. For each frame with a -skeleton we record the color image and the positions of the joints. +outputs a set of joints in real world coordinates. The view from the Kinect +SDK is seen in \fref{fig:hallway}, showing the color image, the depth image +with detected figures, and the fitted skeleton of a person in a single frame. +Skeletons are fit from roughly 1-5 meters away from the Kinect. For each frame +with a skeleton we record the color image and the positions of the joints. \begin{figure*}[t] \begin{center} @@ -64,6 +64,7 @@ another part of the body. In those cases, the coordinates of these joints are either absent from the frame or present but tagged as \emph{Inferred} by the Kinect SDK. Inferred means that even though the joint is not visible in the frame, the skeleton-fitting algorithm attempts to guess the right location. +Note that in the experiment design we exclude inferred data points. \subsection{Experiment design} @@ -269,19 +270,20 @@ with the performance of face recognition as given by \textsf{face.com}. At the time of writing, this is the best performing face recognition algorithm on the LFW dataset\footnote{\url{http://vis-www.cs.umass.edu/lfw/results.html}}. -We use the publicly available REST API of \textsf{face.com} to do face -recognition on our dataset. Due to the restrictions of the API, for this -experiment we set $n_p = 5$ and train on one half of the data and test on the -remaining half. For comparison, the MoG algorithm is run with the same -training-testing partitioning of the dataset. In this setting, SHT is not -relevant for the comparison, because \textsf{face.com} does not give the -possibility to mark a sequence of frames as belonging to the same run. This -additional information would be used by the SHT algorithm and would thus bias -the experiment in favor of skeleton recognition. The results are shown in -\fref{fig:face}. Face recognition outperforms skeleton recognition, but by -less than 10\% at most thresholds. -%These results are promising, given that -%\textsf{face.com} is the state-of-the-art in face recognition. +We use the REST API of \textsf{face.com} to do face recognition on our dataset. +Due to the restrictions of the API, for this experiment we set $n_p = 5$ and +train on one half of the data and test on the remaining half. For comparison, +the MoG algorithm is run with the same training-testing partitioning of the +dataset. In this setting, SHT is not relevant for the comparison, because +\textsf{face.com} does not give the possibility to mark a sequence of frames as +belonging to the same run. This additional information would be used by the SHT +algorithm and would thus bias the experiment in favor of skeleton recognition. +The results are shown in \fref{fig:face}. Skeleton recognition performs +within 10\% of face recognition at most thresholds. +%outperforms +%skeleton recognition, but by less than 10\% at most thresholds. +%These results are promising, given that \textsf{face.com} is the +%state-of-the-art in face recognition. %However, this result does not take into account the disparity in the number of %runs which face recognition and skeleton recognition can classify frames, @@ -337,7 +339,9 @@ datasets are completely disjoint, the SHT algorithm is not learning the same profile for a person walking toward the camera and for a person walking away from the camera. The third curve of \fref{fig:back} shows the precision-recall curve when training and testing on the combined dataset of runs toward and away -from the camera with similar performance. +from the camera with similar performance. Note that while we could not obtain +enough labeled data for a full comparison when it is dark, manual experiments +show similar performance when there is no visible light. \subsection{Reducing the noise} @@ -353,10 +357,10 @@ resolution by artificially reducing the noise from our Kinect dataset. %the only source of classification error in our algorithm is the dispersion of %the observed limbs' lengths away from the exact measurements. To simulate a reduction of the noise level, the dataset is modified as follows: -we measure the average skeletal profile of each person, and for each frame -we divide the empirical variance from the average by 2. Formally, using the -same notations as in Section~\ref{sec:mixture of Gaussians}, each observation -$\bx_i$ is replaced by $\bx_i'$ defined by: +we measure the average skeletal profile of each person across the entire +dataset, and for each frame we divide the empirical variance from the average +by 2. Formally, using the same notations as in Section~\ref{sec:mixture of +Gaussians}, each observation $\bx_i$ is replaced by $\bx_i'$ defined by: \begin{equation} \bx_i' = \bar{\bx}_{y_i} + \frac{\bx_i-\bar{\bx}_{y_i}}{2} \end{equation} |
