diff options
| author | Jon Whiteaker <jbw@berkeley.edu> | 2012-03-01 16:20:18 -0800 |
|---|---|---|
| committer | Jon Whiteaker <jbw@berkeley.edu> | 2012-03-01 16:20:18 -0800 |
| commit | cb8ffa38631e47d37002dc0528040d72ec34ccad (patch) | |
| tree | 890d13136b0d45c52868b3593827803eef2e3c41 /experimental.tex | |
| parent | ff62de7b794b5cae8af37135579888c27785ff6b (diff) | |
| download | kinect-cb8ffa38631e47d37002dc0528040d72ec34ccad.tar.gz | |
changing real-world section organization
Diffstat (limited to 'experimental.tex')
| -rw-r--r-- | experimental.tex | 129 |
1 files changed, 57 insertions, 72 deletions
diff --git a/experimental.tex b/experimental.tex index 42d5965..cfdaf78 100644 --- a/experimental.tex +++ b/experimental.tex @@ -1,37 +1,35 @@ -\section{Experiment design} +\section{Real-World Evaluation} We conduct a real-life uncontrolled experiment using the Kinect to test to the -algorithm. First we discuss the signal outputs of the Kinect. Second we -describe the environment in which we collect the data. Finally, we interpret -the data. +algorithm. First we present the manner and environment in which we perform +data collection. Second we describe how the data is processed and classified. +Finally, we discuss the results. -\subsection{Kinect} The Kinect outputs three primary signals in real-time: a -color image stream, a depth image stream, and microphone output. For our -purposes, we focus on the depth image stream. As the Kinect was designed to -interface directly with the Xbox 360~\cite{xbox}, the tools to interact with it -on a PC are limited. Libfreenect~\cite{libfreenect} is a reverse engineered -driver which gives access to the raw depth images from the Kinect. This raw -data could be used to implement the algorithms \eg of -Plagemann~\etal{}~\cite{plagemann:icra10}. Alternatively, -OpenNI~\cite{openni}, a framework sponsored by PrimeSense~\cite{primesense}, -the company behind the technology of the Kinect, offers figure detection and -skeleton fitting algorithms on top of raw access to the data streams. However, -the skeleton fitting algorithm of OpenNI requires each individual to strike a -specific pose for calibration. More recently, the Kinect for Windows -SDK~\cite{kinect-sdk} was released, and its skeleton fitting algorithm operates -in real-time without calibration. Given that the Kinect for Windows SDK is the -state-of-the-art, we use it to perform our data collection. +\subsection{Dataset} -\subsection{Environment} -We collect data using the Kinect SDK over a period of a week in a research -laboratory setting. The Kinect is placed at the tee of a well traversed hallway. -The view of the Kinect is seen in \fref{fig:hallway}, showing the color image, -the depth image, and the fitted skeleton of a single frame. +The Kinect outputs three primary signals in real-time: a color image stream, a +depth image stream, and microphone output. For our purposes, we focus on the +depth image stream. As the Kinect was designed to interface directly with the +Xbox 360, the tools to interact with it on a PC are limited. +Libfreenect~\cite{libfreenect} is a reverse engineered driver which gives +access to the raw depth images from the Kinect. This raw data could be used to +implement the algorithms \eg of Plagemann~\etal{}~\cite{plagemann:icra10}. +Alternatively, OpenNI~\cite{openni}, a framework sponsored by +PrimeSense~\cite{primesense}, the company behind the technology of the Kinect, +offers figure detection and skeleton fitting algorithms on top of raw access to +the data streams. However, the skeleton fitting algorithm of OpenNI requires +each individual to strike a specific pose for calibration. More recently, the +Kinect for Windows SDK~\cite{kinect-sdk} was released, and its skeleton fitting +algorithm operates in real-time without calibration. Given that the Kinect for +Windows SDK is the state-of-the-art, we use it to perform our data collection. -\begin{itemize} -\item 1 week -\item 23 people -\end{itemize} +We collect data using the Kinect SDK over a period of a week in a research +laboratory setting. The Kinect is placed at the tee of a well traversed +hallway. The view of the Kinect is seen in \fref{fig:hallway}, showing the +color image, the depth image, and the fitted skeleton of a person in a single +frame. For each frame where a person is detected and a skeleton is fitted we +collect the 3D coordinates of 20 body joints, and the color image recorded by +the RGB camera. \begin{figure} \begin{center} @@ -41,55 +39,42 @@ the depth image, and the fitted skeleton of a single frame. \label{fig:hallway} \end{figure} -\subsection{Data set} +For some frames, one or several joints are out of the frame or are occluded by +another part of the body. In those cases, the coordinates of these joints are +either absent from the frame or present but tagged as \emph{Inferred} by the +Kinect SDK. Inferred means that even though the joint is not visible in the +frame, the skeleton-fitting algorithm attempts to guess the right location. + + +Ground truth person identification is obtained by manually labelling each run +based on the images captured by the RGB camera of the Kinect. For ease of +labelling, only the runs with people walking toward the camera are kept. These +are the runs where the average distance from the skeleton joints to the camera +is increasing. -The original dataset consists of the sequence of all the frames where -a skeleton was detected by the Kinect SDK. For each frames the -following data is available: -\begin{itemize} -\item the 3D coordinates of 20 body joints, -\item a color picture recorded by the video camera. -\end{itemize} -For some of frames, one or several joints are occluded by another part -of the body. In those cases, the coordinates of these joints are -either absent from the frame or present but tagged as \emph{Inferred} -by the Kinect SDK. It means that even though the joint is not -present on the frame, the skeleton-fitting algorithm is able to guess -its location. +\subsection{Experiment design} -Each frame also has a skeleton ID number. If this numbers stays the +Several reductions are then applied to the data set to extract \emph{features} +from the raw data. First, the lengths of 15 body parts are computed from the +joint coordinates. These are distances between two contiguous joints in the +human body. If one of the two joints of a body part is not present or inferred +in a frame, the corresponding body part is reported as absent for the frame. +Second, the number of features is reduced to 9 by using the vertical symmetry +of the human body: if two body parts are symmetric about the vertical axis, we +bundle them into one feature by averaging their lengths. If only one of them is +present, we take the value of its counterpart. If none of them are present, the +feature is reported as missing for the frame. The resulting nine features are: +Head-ShoulderCenter, ShoulderCenter-Shoulder, Shoulder-Elbow, Elbow-Wrist, +ShoulderCenter-Spine, Spine-HipCenter, HipCenter-HipSide, HipSide-Knee, +Knee-Ankle. Finally, any frame with a missing feature is filtered out. + +Each detected skeleton also has an ID number which identifies which figure +it maps to from the figure detection stage. When there are consecutive number stays the same across several frames, it means that the skeleton-fitting algorithm was able to detect the skeleton in a contiguous way. This allows us to define the concept of a \emph{run}: a sequence of frames with the same skeleton ID. -Ground truth person recognition is obtained by manually labelling each -run based on the images captured by the video camera of the -Kinect. For ease of labelling, only the runs with people walking -toward the camera are kept. These are the runs where the average -distance from the skeleton joints to the camera is increasing. - -Several reductions are then applied to the data set to extract -\emph{features} from the raw data: -\begin{itemize} -\item From the joints coordinates, the lengths of 15 body parts are - computed. These are distances between two contiguous joints in the - human body. If one of the two joints of a body part is not present - or inferred in a frame, the corresponding body part is reported as - absent for the frame. -\item The number of features is then reduced to 9 by using the - vertical symmetry of the human body: if two body parts are symmetric - about the vertical axis, we bundle them into one feature by - averaging their lengths. If only one of them is present, we take the - value of its counterpart. If none of them are present, the feature - is reported as missing for the frame. The resulting nine features - are: Head-ShoulderCenter, ShoulderCenter-Shoulder, Shoulder-Elbow, - Elbow-Wrist, ShoulderCenter-Spine, Spine-HipCenter, - HipCenter-HipSide, HipSide-Knee, Knee-Ankle. -\item Finally, all the frames where any of the 9 features is missing - are filtered out. -\end{itemize} - \begin{table} \begin{center} \begin{tabular}{|l|r||r|r|r|} @@ -108,7 +93,7 @@ in the ordering given by the number of frames.} \label{tab:dataset} \end{table} - +\subsection{Results} |
