diff options
| -rw-r--r-- | experimental.tex | 129 | ||||
| -rw-r--r-- | kinect.tex | 2 | ||||
| -rw-r--r-- | references.bib | 28 |
3 files changed, 84 insertions, 75 deletions
diff --git a/experimental.tex b/experimental.tex index 42d5965..cfdaf78 100644 --- a/experimental.tex +++ b/experimental.tex @@ -1,37 +1,35 @@ -\section{Experiment design} +\section{Real-World Evaluation} We conduct a real-life uncontrolled experiment using the Kinect to test to the -algorithm. First we discuss the signal outputs of the Kinect. Second we -describe the environment in which we collect the data. Finally, we interpret -the data. +algorithm. First we present the manner and environment in which we perform +data collection. Second we describe how the data is processed and classified. +Finally, we discuss the results. -\subsection{Kinect} The Kinect outputs three primary signals in real-time: a -color image stream, a depth image stream, and microphone output. For our -purposes, we focus on the depth image stream. As the Kinect was designed to -interface directly with the Xbox 360~\cite{xbox}, the tools to interact with it -on a PC are limited. Libfreenect~\cite{libfreenect} is a reverse engineered -driver which gives access to the raw depth images from the Kinect. This raw -data could be used to implement the algorithms \eg of -Plagemann~\etal{}~\cite{plagemann:icra10}. Alternatively, -OpenNI~\cite{openni}, a framework sponsored by PrimeSense~\cite{primesense}, -the company behind the technology of the Kinect, offers figure detection and -skeleton fitting algorithms on top of raw access to the data streams. However, -the skeleton fitting algorithm of OpenNI requires each individual to strike a -specific pose for calibration. More recently, the Kinect for Windows -SDK~\cite{kinect-sdk} was released, and its skeleton fitting algorithm operates -in real-time without calibration. Given that the Kinect for Windows SDK is the -state-of-the-art, we use it to perform our data collection. +\subsection{Dataset} -\subsection{Environment} -We collect data using the Kinect SDK over a period of a week in a research -laboratory setting. The Kinect is placed at the tee of a well traversed hallway. -The view of the Kinect is seen in \fref{fig:hallway}, showing the color image, -the depth image, and the fitted skeleton of a single frame. +The Kinect outputs three primary signals in real-time: a color image stream, a +depth image stream, and microphone output. For our purposes, we focus on the +depth image stream. As the Kinect was designed to interface directly with the +Xbox 360, the tools to interact with it on a PC are limited. +Libfreenect~\cite{libfreenect} is a reverse engineered driver which gives +access to the raw depth images from the Kinect. This raw data could be used to +implement the algorithms \eg of Plagemann~\etal{}~\cite{plagemann:icra10}. +Alternatively, OpenNI~\cite{openni}, a framework sponsored by +PrimeSense~\cite{primesense}, the company behind the technology of the Kinect, +offers figure detection and skeleton fitting algorithms on top of raw access to +the data streams. However, the skeleton fitting algorithm of OpenNI requires +each individual to strike a specific pose for calibration. More recently, the +Kinect for Windows SDK~\cite{kinect-sdk} was released, and its skeleton fitting +algorithm operates in real-time without calibration. Given that the Kinect for +Windows SDK is the state-of-the-art, we use it to perform our data collection. -\begin{itemize} -\item 1 week -\item 23 people -\end{itemize} +We collect data using the Kinect SDK over a period of a week in a research +laboratory setting. The Kinect is placed at the tee of a well traversed +hallway. The view of the Kinect is seen in \fref{fig:hallway}, showing the +color image, the depth image, and the fitted skeleton of a person in a single +frame. For each frame where a person is detected and a skeleton is fitted we +collect the 3D coordinates of 20 body joints, and the color image recorded by +the RGB camera. \begin{figure} \begin{center} @@ -41,55 +39,42 @@ the depth image, and the fitted skeleton of a single frame. \label{fig:hallway} \end{figure} -\subsection{Data set} +For some frames, one or several joints are out of the frame or are occluded by +another part of the body. In those cases, the coordinates of these joints are +either absent from the frame or present but tagged as \emph{Inferred} by the +Kinect SDK. Inferred means that even though the joint is not visible in the +frame, the skeleton-fitting algorithm attempts to guess the right location. + + +Ground truth person identification is obtained by manually labelling each run +based on the images captured by the RGB camera of the Kinect. For ease of +labelling, only the runs with people walking toward the camera are kept. These +are the runs where the average distance from the skeleton joints to the camera +is increasing. -The original dataset consists of the sequence of all the frames where -a skeleton was detected by the Kinect SDK. For each frames the -following data is available: -\begin{itemize} -\item the 3D coordinates of 20 body joints, -\item a color picture recorded by the video camera. -\end{itemize} -For some of frames, one or several joints are occluded by another part -of the body. In those cases, the coordinates of these joints are -either absent from the frame or present but tagged as \emph{Inferred} -by the Kinect SDK. It means that even though the joint is not -present on the frame, the skeleton-fitting algorithm is able to guess -its location. +\subsection{Experiment design} -Each frame also has a skeleton ID number. If this numbers stays the +Several reductions are then applied to the data set to extract \emph{features} +from the raw data. First, the lengths of 15 body parts are computed from the +joint coordinates. These are distances between two contiguous joints in the +human body. If one of the two joints of a body part is not present or inferred +in a frame, the corresponding body part is reported as absent for the frame. +Second, the number of features is reduced to 9 by using the vertical symmetry +of the human body: if two body parts are symmetric about the vertical axis, we +bundle them into one feature by averaging their lengths. If only one of them is +present, we take the value of its counterpart. If none of them are present, the +feature is reported as missing for the frame. The resulting nine features are: +Head-ShoulderCenter, ShoulderCenter-Shoulder, Shoulder-Elbow, Elbow-Wrist, +ShoulderCenter-Spine, Spine-HipCenter, HipCenter-HipSide, HipSide-Knee, +Knee-Ankle. Finally, any frame with a missing feature is filtered out. + +Each detected skeleton also has an ID number which identifies which figure +it maps to from the figure detection stage. When there are consecutive number stays the same across several frames, it means that the skeleton-fitting algorithm was able to detect the skeleton in a contiguous way. This allows us to define the concept of a \emph{run}: a sequence of frames with the same skeleton ID. -Ground truth person recognition is obtained by manually labelling each -run based on the images captured by the video camera of the -Kinect. For ease of labelling, only the runs with people walking -toward the camera are kept. These are the runs where the average -distance from the skeleton joints to the camera is increasing. - -Several reductions are then applied to the data set to extract -\emph{features} from the raw data: -\begin{itemize} -\item From the joints coordinates, the lengths of 15 body parts are - computed. These are distances between two contiguous joints in the - human body. If one of the two joints of a body part is not present - or inferred in a frame, the corresponding body part is reported as - absent for the frame. -\item The number of features is then reduced to 9 by using the - vertical symmetry of the human body: if two body parts are symmetric - about the vertical axis, we bundle them into one feature by - averaging their lengths. If only one of them is present, we take the - value of its counterpart. If none of them are present, the feature - is reported as missing for the frame. The resulting nine features - are: Head-ShoulderCenter, ShoulderCenter-Shoulder, Shoulder-Elbow, - Elbow-Wrist, ShoulderCenter-Spine, Spine-HipCenter, - HipCenter-HipSide, HipSide-Knee, Knee-Ankle. -\item Finally, all the frames where any of the 9 features is missing - are filtered out. -\end{itemize} - \begin{table} \begin{center} \begin{tabular}{|l|r||r|r|r|} @@ -108,7 +93,7 @@ in the ordering given by the number of frames.} \label{tab:dataset} \end{table} - +\subsection{Results} @@ -34,7 +34,7 @@ \begin{document} \pagestyle{headings} \mainmatter -\def\ECCV12SubNumber{***} % Insert your submission number here +\def\ECCV12SubNumber{543} % Insert your submission number here \title{On the Viability of Skeleton Recognition} % Replace with your title diff --git a/references.bib b/references.bib index 66cefb7..0f41363 100644 --- a/references.bib +++ b/references.bib @@ -118,11 +118,35 @@ @string{LANMAN = PROC # "IEEE Workshopp on Local and Metropolitan Area Networks (LANMAN)"},
+@misc{openni,
+ key = {{OpenNI}},
+ title = {OpenNI},
+ howpublished = {\url{http://www.openni.org/}},
+}
+
+@misc{libfreenect,
+ key = {{OpenKinect: libfreenect}},
+ title = {OpenKinect: libfreenect},
+ howpublished = {\url{http://www.openkinect.org/}},
+}
+
+@misc{kinect-sdk,
+ key = {{Kinect for Windows}},
+ title = {Kinect for Windows},
+ howpublished = {Microsoft Corp., Redmond, WA},
+ howpublished = {\url{http://www.openni.org/}},
+}
+
+@misc{PrimeSense,
+ key = {{PrimeSense}},
+ title = {PrimeSense: Natural Interaction},
+ howpublished = {\url{http://www.primesense.com/}},
+}
@misc{kinect,
+ key = {{Kinect for Xbox 360}},
title = {Kinect for Xbox 360},
- institution = {Microsoft Corp.},
- location = {Redmond, WA}
+ howpublished = {Microsoft Corp., Redmond, WA},
}
@misc{comon,
|
