\section{Experiment design} We conduct a real-life uncontrolled experiment using the Kinect to test to the algorithm. First we discuss the signal outputs of the Kinect. Second we describe the environment in which we collect the data. Finally, we interpret the data. \subsection{Kinect} The Kinect outputs three primary signals in real-time: a color image stream, a depth image stream, and microphone output. For our purposes, we focus on the depth image stream. As the Kinect was designed to interface directly with the Xbox 360~\cite{xbox}, the tools to interact with it on a PC are limited. Libfreenect~\cite{libfreenect} is a reverse engineered driver which gives access to the raw depth images from the Kinect. This raw data could be used to implement the algorithms \eg of Plagemann~\etal{}~\cite{plagemann:icra10}. Alternatively, OpenNI~\cite{openni}, a framework sponsored by PrimeSense~\cite{primesense}, the company behind the technology of the Kinect, offers figure detection and skeleton fitting algorithms on top of raw access to the data streams. However, the skeleton fitting algorithm of OpenNI requires each individual to strike a specific pose for calibration. More recently, the Kinect for Windows SDK~\cite{kinect-sdk} was released, and its skeleton fitting algorithm operates in real-time without calibration. Given that the Kinect for Windows SDK is the state-of-the-art, we use it to perform our data collection. \subsection{Environment} \begin{itemize} \item 1 week \item 23 people \end{itemize} \subsection{Data set} The original dataset consists of the sequence of all the frames where a skeleton was detected by the Microsoft SDK. For each frames the following data is available: \begin{itemize} \item the 3D coordinates of 20 body joints \item a picture recored by the video camera \end{itemize} For some of frames, one or several joints are occluded by another part of the body. In those cases, the coordinates of these joints are either absent from the frame or present but tagged as \emph{Inferred} by the Microsoft SDK. It means that even though the joint is not present on the frame, the skeleton-fitting algorithm is able to guess its location. Each frame also has a skeleton ID number. If this numbers stays the same across several frames, it means that the skeleton-fitting algorithm was able to detect the skeleton in a contiguous way. This allows us to define the concept of a \emph{run}: a sequence of frames with the same skeleton ID. Ground truth person recognition is obtained by manually labelling each run based on the images captured by the video camera of the Kinect. For ease of labelling, only the runs with people walking toward the camera are kept. These are the runs where the average distance from the skeleton joints to the camera is increasing. Several reductions are then applied to the data set to extract \emph{features} from the raw data: \begin{itemize} \item from the joints coordinates, the lengths of 15 body parts are computed. These are distances between two contiguous joints in the human body. If one of the two joints of a body part is not present or inferred in a frame, the corresponding body part is reported as absent for that frame. \item the number of features is then reduced to 9 by using the vertical symmetry of the human body: if two body parts are symmetric about the vertical axis, we bundle them into one feature by averaging their lengths. If only one of them is present, we take the value of its counterpart. If none of them are present, the feature is reported as missing for this frame. The resulting nine features are: Head-ShoulderCenter, ShoulderCenter-Shoulder, Shoulder-Elbow, Elbow-Wrist, ShoulderCenter-Spine, Spine-HipCenter, HipCenter-HipSide, HipSide-Knee, Knee-Ankle. \item finally, all the frames where one of the 9 features is missing are filtered out. \end{itemize} Table \ref{tab:dataset} summarizes some statistics about the resulting dataset. \begin{table} \begin{tabular}{cc} \end{tabular} \caption{} \label{tab:dataset} \end{table} %%% Local Variables: %%% mode: latex %%% TeX-master: "kinect" %%% End: