summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorJon Whiteaker <jbw@berkeley.edu>2012-03-01 16:20:18 -0800
committerJon Whiteaker <jbw@berkeley.edu>2012-03-01 16:20:18 -0800
commitcb8ffa38631e47d37002dc0528040d72ec34ccad (patch)
tree890d13136b0d45c52868b3593827803eef2e3c41
parentff62de7b794b5cae8af37135579888c27785ff6b (diff)
downloadkinect-cb8ffa38631e47d37002dc0528040d72ec34ccad.tar.gz
changing real-world section organization
-rw-r--r--experimental.tex129
-rw-r--r--kinect.tex2
-rw-r--r--references.bib28
3 files changed, 84 insertions, 75 deletions
diff --git a/experimental.tex b/experimental.tex
index 42d5965..cfdaf78 100644
--- a/experimental.tex
+++ b/experimental.tex
@@ -1,37 +1,35 @@
-\section{Experiment design}
+\section{Real-World Evaluation}
We conduct a real-life uncontrolled experiment using the Kinect to test to the
-algorithm. First we discuss the signal outputs of the Kinect. Second we
-describe the environment in which we collect the data. Finally, we interpret
-the data.
+algorithm. First we present the manner and environment in which we perform
+data collection. Second we describe how the data is processed and classified.
+Finally, we discuss the results.
-\subsection{Kinect} The Kinect outputs three primary signals in real-time: a
-color image stream, a depth image stream, and microphone output. For our
-purposes, we focus on the depth image stream. As the Kinect was designed to
-interface directly with the Xbox 360~\cite{xbox}, the tools to interact with it
-on a PC are limited. Libfreenect~\cite{libfreenect} is a reverse engineered
-driver which gives access to the raw depth images from the Kinect. This raw
-data could be used to implement the algorithms \eg of
-Plagemann~\etal{}~\cite{plagemann:icra10}. Alternatively,
-OpenNI~\cite{openni}, a framework sponsored by PrimeSense~\cite{primesense},
-the company behind the technology of the Kinect, offers figure detection and
-skeleton fitting algorithms on top of raw access to the data streams. However,
-the skeleton fitting algorithm of OpenNI requires each individual to strike a
-specific pose for calibration. More recently, the Kinect for Windows
-SDK~\cite{kinect-sdk} was released, and its skeleton fitting algorithm operates
-in real-time without calibration. Given that the Kinect for Windows SDK is the
-state-of-the-art, we use it to perform our data collection.
+\subsection{Dataset}
-\subsection{Environment}
-We collect data using the Kinect SDK over a period of a week in a research
-laboratory setting. The Kinect is placed at the tee of a well traversed hallway.
-The view of the Kinect is seen in \fref{fig:hallway}, showing the color image,
-the depth image, and the fitted skeleton of a single frame.
+The Kinect outputs three primary signals in real-time: a color image stream, a
+depth image stream, and microphone output. For our purposes, we focus on the
+depth image stream. As the Kinect was designed to interface directly with the
+Xbox 360, the tools to interact with it on a PC are limited.
+Libfreenect~\cite{libfreenect} is a reverse engineered driver which gives
+access to the raw depth images from the Kinect. This raw data could be used to
+implement the algorithms \eg of Plagemann~\etal{}~\cite{plagemann:icra10}.
+Alternatively, OpenNI~\cite{openni}, a framework sponsored by
+PrimeSense~\cite{primesense}, the company behind the technology of the Kinect,
+offers figure detection and skeleton fitting algorithms on top of raw access to
+the data streams. However, the skeleton fitting algorithm of OpenNI requires
+each individual to strike a specific pose for calibration. More recently, the
+Kinect for Windows SDK~\cite{kinect-sdk} was released, and its skeleton fitting
+algorithm operates in real-time without calibration. Given that the Kinect for
+Windows SDK is the state-of-the-art, we use it to perform our data collection.
-\begin{itemize}
-\item 1 week
-\item 23 people
-\end{itemize}
+We collect data using the Kinect SDK over a period of a week in a research
+laboratory setting. The Kinect is placed at the tee of a well traversed
+hallway. The view of the Kinect is seen in \fref{fig:hallway}, showing the
+color image, the depth image, and the fitted skeleton of a person in a single
+frame. For each frame where a person is detected and a skeleton is fitted we
+collect the 3D coordinates of 20 body joints, and the color image recorded by
+the RGB camera.
\begin{figure}
\begin{center}
@@ -41,55 +39,42 @@ the depth image, and the fitted skeleton of a single frame.
\label{fig:hallway}
\end{figure}
-\subsection{Data set}
+For some frames, one or several joints are out of the frame or are occluded by
+another part of the body. In those cases, the coordinates of these joints are
+either absent from the frame or present but tagged as \emph{Inferred} by the
+Kinect SDK. Inferred means that even though the joint is not visible in the
+frame, the skeleton-fitting algorithm attempts to guess the right location.
+
+
+Ground truth person identification is obtained by manually labelling each run
+based on the images captured by the RGB camera of the Kinect. For ease of
+labelling, only the runs with people walking toward the camera are kept. These
+are the runs where the average distance from the skeleton joints to the camera
+is increasing.
-The original dataset consists of the sequence of all the frames where
-a skeleton was detected by the Kinect SDK. For each frames the
-following data is available:
-\begin{itemize}
-\item the 3D coordinates of 20 body joints,
-\item a color picture recorded by the video camera.
-\end{itemize}
-For some of frames, one or several joints are occluded by another part
-of the body. In those cases, the coordinates of these joints are
-either absent from the frame or present but tagged as \emph{Inferred}
-by the Kinect SDK. It means that even though the joint is not
-present on the frame, the skeleton-fitting algorithm is able to guess
-its location.
+\subsection{Experiment design}
-Each frame also has a skeleton ID number. If this numbers stays the
+Several reductions are then applied to the data set to extract \emph{features}
+from the raw data. First, the lengths of 15 body parts are computed from the
+joint coordinates. These are distances between two contiguous joints in the
+human body. If one of the two joints of a body part is not present or inferred
+in a frame, the corresponding body part is reported as absent for the frame.
+Second, the number of features is reduced to 9 by using the vertical symmetry
+of the human body: if two body parts are symmetric about the vertical axis, we
+bundle them into one feature by averaging their lengths. If only one of them is
+present, we take the value of its counterpart. If none of them are present, the
+feature is reported as missing for the frame. The resulting nine features are:
+Head-ShoulderCenter, ShoulderCenter-Shoulder, Shoulder-Elbow, Elbow-Wrist,
+ShoulderCenter-Spine, Spine-HipCenter, HipCenter-HipSide, HipSide-Knee,
+Knee-Ankle. Finally, any frame with a missing feature is filtered out.
+
+Each detected skeleton also has an ID number which identifies which figure
+it maps to from the figure detection stage. When there are consecutive number stays the
same across several frames, it means that the skeleton-fitting
algorithm was able to detect the skeleton in a contiguous way. This
allows us to define the concept of a \emph{run}: a sequence of frames
with the same skeleton ID.
-Ground truth person recognition is obtained by manually labelling each
-run based on the images captured by the video camera of the
-Kinect. For ease of labelling, only the runs with people walking
-toward the camera are kept. These are the runs where the average
-distance from the skeleton joints to the camera is increasing.
-
-Several reductions are then applied to the data set to extract
-\emph{features} from the raw data:
-\begin{itemize}
-\item From the joints coordinates, the lengths of 15 body parts are
- computed. These are distances between two contiguous joints in the
- human body. If one of the two joints of a body part is not present
- or inferred in a frame, the corresponding body part is reported as
- absent for the frame.
-\item The number of features is then reduced to 9 by using the
- vertical symmetry of the human body: if two body parts are symmetric
- about the vertical axis, we bundle them into one feature by
- averaging their lengths. If only one of them is present, we take the
- value of its counterpart. If none of them are present, the feature
- is reported as missing for the frame. The resulting nine features
- are: Head-ShoulderCenter, ShoulderCenter-Shoulder, Shoulder-Elbow,
- Elbow-Wrist, ShoulderCenter-Spine, Spine-HipCenter,
- HipCenter-HipSide, HipSide-Knee, Knee-Ankle.
-\item Finally, all the frames where any of the 9 features is missing
- are filtered out.
-\end{itemize}
-
\begin{table}
\begin{center}
\begin{tabular}{|l|r||r|r|r|}
@@ -108,7 +93,7 @@ in the ordering given by the number of frames.}
\label{tab:dataset}
\end{table}
-
+\subsection{Results}
diff --git a/kinect.tex b/kinect.tex
index 55c4a08..1e87cc8 100644
--- a/kinect.tex
+++ b/kinect.tex
@@ -34,7 +34,7 @@
\begin{document}
\pagestyle{headings}
\mainmatter
-\def\ECCV12SubNumber{***} % Insert your submission number here
+\def\ECCV12SubNumber{543} % Insert your submission number here
\title{On the Viability of Skeleton Recognition} % Replace with your title
diff --git a/references.bib b/references.bib
index 66cefb7..0f41363 100644
--- a/references.bib
+++ b/references.bib
@@ -118,11 +118,35 @@
@string{LANMAN = PROC # "IEEE Workshopp on Local and Metropolitan Area Networks (LANMAN)"},
+@misc{openni,
+ key = {{OpenNI}},
+ title = {OpenNI},
+ howpublished = {\url{http://www.openni.org/}},
+}
+
+@misc{libfreenect,
+ key = {{OpenKinect: libfreenect}},
+ title = {OpenKinect: libfreenect},
+ howpublished = {\url{http://www.openkinect.org/}},
+}
+
+@misc{kinect-sdk,
+ key = {{Kinect for Windows}},
+ title = {Kinect for Windows},
+ howpublished = {Microsoft Corp., Redmond, WA},
+ howpublished = {\url{http://www.openni.org/}},
+}
+
+@misc{PrimeSense,
+ key = {{PrimeSense}},
+ title = {PrimeSense: Natural Interaction},
+ howpublished = {\url{http://www.primesense.com/}},
+}
@misc{kinect,
+ key = {{Kinect for Xbox 360}},
title = {Kinect for Xbox 360},
- institution = {Microsoft Corp.},
- location = {Redmond, WA}
+ howpublished = {Microsoft Corp., Redmond, WA},
}
@misc{comon,