changing real-world section organization

author: Jon Whiteaker <jbw@berkeley.edu> 2012-03-01 16:20:18 -0800
committer: Jon Whiteaker <jbw@berkeley.edu> 2012-03-01 16:20:18 -0800
commit: cb8ffa38631e47d37002dc0528040d72ec34ccad (patch)
tree: 890d13136b0d45c52868b3593827803eef2e3c41 /experimental.tex
parent: ff62de7b794b5cae8af37135579888c27785ff6b (diff)
download: kinect-cb8ffa38631e47d37002dc0528040d72ec34ccad.tar.gz
1 files changed, 57 insertions, 72 deletions
diff --git a/experimental.tex b/experimental.tex
index 42d5965..cfdaf78 100644
--- a/experimental.tex
+++ b/experimental.tex
@@ -1,37 +1,35 @@
-\section{Experiment design}
+\section{Real-World Evaluation}
 
 We conduct a real-life uncontrolled experiment using the Kinect to test to the
-algorithm.  First we discuss the signal outputs of the Kinect.  Second we
-describe the environment in which we collect the data. Finally, we interpret
-the data.
+algorithm.  First we present the manner and environment in which we perform
+data collection.  Second we describe how the data is processed and classified.
+Finally, we discuss the results.
 
-\subsection{Kinect} The Kinect outputs three primary signals in real-time: a
-color image stream, a depth image stream, and microphone output.  For our
-purposes, we focus on the depth image stream.  As the Kinect was designed to
-interface directly with the Xbox 360~\cite{xbox}, the tools to interact with it
-on a PC are limited.  Libfreenect~\cite{libfreenect} is a reverse engineered
-driver which gives access to the raw depth images from the Kinect.  This raw
-data could be used to implement the algorithms \eg of
-Plagemann~\etal{}~\cite{plagemann:icra10}.  Alternatively,
-OpenNI~\cite{openni}, a framework sponsored by PrimeSense~\cite{primesense},
-the company behind the technology of the Kinect, offers figure detection and
-skeleton fitting algorithms on top of raw access to the data streams.  However,
-the skeleton fitting algorithm of OpenNI requires each individual to strike a
-specific pose for calibration.  More recently, the Kinect for Windows
-SDK~\cite{kinect-sdk} was released, and its skeleton fitting algorithm operates
-in real-time without calibration.  Given that the Kinect for Windows SDK is the
-state-of-the-art, we use it to perform our data collection.
+\subsection{Dataset} 
 
-\subsection{Environment}
-We collect data using the Kinect SDK over a period of a week in a research
-laboratory setting.  The Kinect is placed at the tee of a well traversed hallway. 
-The view of the Kinect is seen in \fref{fig:hallway}, showing the color image,
-the depth image, and the fitted skeleton of a single frame.
+The Kinect outputs three primary signals in real-time: a color image stream, a
+depth image stream, and microphone output.  For our purposes, we focus on the
+depth image stream.  As the Kinect was designed to interface directly with the
+Xbox 360, the tools to interact with it on a PC are limited.
+Libfreenect~\cite{libfreenect} is a reverse engineered driver which gives
+access to the raw depth images from the Kinect.  This raw data could be used to
+implement the algorithms \eg of Plagemann~\etal{}~\cite{plagemann:icra10}.
+Alternatively, OpenNI~\cite{openni}, a framework sponsored by
+PrimeSense~\cite{primesense}, the company behind the technology of the Kinect,
+offers figure detection and skeleton fitting algorithms on top of raw access to
+the data streams.  However, the skeleton fitting algorithm of OpenNI requires
+each individual to strike a specific pose for calibration.  More recently, the
+Kinect for Windows SDK~\cite{kinect-sdk} was released, and its skeleton fitting
+algorithm operates in real-time without calibration.  Given that the Kinect for
+Windows SDK is the state-of-the-art, we use it to perform our data collection.
 
-\begin{itemize}
-\item 1 week
-\item 23 people
-\end{itemize}
+We collect data using the Kinect SDK over a period of a week in a research
+laboratory setting.  The Kinect is placed at the tee of a well traversed
+hallway.  The view of the Kinect is seen in \fref{fig:hallway}, showing the
+color image, the depth image, and the fitted skeleton of a person in a single
+frame.  For each frame where a person is detected and a skeleton is fitted we
+collect the 3D coordinates of 20 body joints, and the color image recorded by
+the RGB camera.
 
 \begin{figure}
   \begin{center}
@@ -41,55 +39,42 @@ the depth image, and the fitted skeleton of a single frame.
   \label{fig:hallway}
 \end{figure}
 
-\subsection{Data set}
+For some frames, one or several joints are out of the frame or are occluded by
+another part of the body. In those cases, the coordinates of these joints are
+either absent from the frame or present but tagged as \emph{Inferred} by the
+Kinect SDK. Inferred means that even though the joint is not visible in the
+frame, the skeleton-fitting algorithm attempts to guess the right location.
+
+
+Ground truth person identification is obtained by manually labelling each run
+based on the images captured by the RGB camera of the Kinect. For ease of
+labelling, only the runs with people walking toward the camera are kept. These
+are the runs where the average distance from the skeleton joints to the camera
+is increasing.
 
-The original dataset consists of the sequence of all the frames where
-a skeleton was detected by the Kinect SDK. For each frames the
-following data is available:
-\begin{itemize}
-\item the 3D coordinates of 20 body joints,
-\item a color picture recorded by the video camera.
-\end{itemize}
-For some of frames, one or several joints are occluded by another part
-of the body. In those cases, the coordinates of these joints are
-either absent from the frame or present but tagged as \emph{Inferred}
-by the Kinect SDK. It means that even though the joint is not
-present on the frame, the skeleton-fitting algorithm is able to guess
-its location.
+\subsection{Experiment design}
 
-Each frame also has a skeleton ID number. If this numbers stays the
+Several reductions are then applied to the data set to extract \emph{features}
+from the raw data.  First, the lengths of 15 body parts are computed from the
+joint coordinates. These are distances between two contiguous joints in the
+human body. If one of the two joints of a body part is not present or inferred
+in a frame, the corresponding body part is reported as absent for the frame.
+Second, the number of features is reduced to 9 by using the vertical symmetry
+of the human body: if two body parts are symmetric about the vertical axis, we
+bundle them into one feature by averaging their lengths. If only one of them is
+present, we take the value of its counterpart. If none of them are present, the
+feature is reported as missing for the frame. The resulting nine features are:
+Head-ShoulderCenter, ShoulderCenter-Shoulder, Shoulder-Elbow, Elbow-Wrist,
+ShoulderCenter-Spine, Spine-HipCenter, HipCenter-HipSide, HipSide-Knee,
+Knee-Ankle.  Finally, any frame with a missing feature is filtered out.
+
+Each detected skeleton also has an ID number which identifies which figure
+it maps to from the figure detection stage. When there are consecutive number stays the
 same across several frames, it means that the skeleton-fitting
 algorithm was able to detect the skeleton in a contiguous way. This
 allows us to define the concept of a \emph{run}: a sequence of frames
 with the same skeleton ID.
 
-Ground truth person recognition is obtained by manually labelling each
-run based on the images captured by the video camera of the
-Kinect. For ease of labelling, only the runs with people walking
-toward the camera are kept. These are the runs where the average
-distance from the skeleton joints to the camera is increasing.
-
-Several reductions are then applied to the data set to extract
-\emph{features} from the raw data:
-\begin{itemize}
-\item From the joints coordinates, the lengths of 15 body parts are
-  computed. These are distances between two contiguous joints in the
-  human body. If one of the two joints of a body part is not present
-  or inferred in a frame, the corresponding body part is reported as
-  absent for the frame.
-\item The number of features is then reduced to 9 by using the
-  vertical symmetry of the human body: if two body parts are symmetric
-  about the vertical axis, we bundle them into one feature by
-  averaging their lengths. If only one of them is present, we take the
-  value of its counterpart. If none of them are present, the feature
-  is reported as missing for the frame. The resulting nine features
-  are: Head-ShoulderCenter, ShoulderCenter-Shoulder, Shoulder-Elbow,
-  Elbow-Wrist, ShoulderCenter-Spine, Spine-HipCenter,
-  HipCenter-HipSide, HipSide-Knee, Knee-Ankle.
-\item Finally, all the frames where any of the 9 features is missing
-  are filtered out.
-\end{itemize}
-
 \begin{table}
 \begin{center}
 \begin{tabular}{|l|r||r|r|r|}
@@ -108,7 +93,7 @@ in the ordering given by the number of frames.}
 \label{tab:dataset}
 \end{table}
 
-
+\subsection{Results}
author	Jon Whiteaker <jbw@berkeley.edu>	2012-03-01 16:20:18 -0800
committer	Jon Whiteaker <jbw@berkeley.edu>	2012-03-01 16:20:18 -0800
commit	cb8ffa38631e47d37002dc0528040d72ec34ccad (patch)
tree	890d13136b0d45c52868b3593827803eef2e3c41 /experimental.tex
parent	ff62de7b794b5cae8af37135579888c27785ff6b (diff)
download	kinect-cb8ffa38631e47d37002dc0528040d72ec34ccad.tar.gz