diff options
Diffstat (limited to 'experimental.tex')
| -rw-r--r-- | experimental.tex | 27 |
1 files changed, 13 insertions, 14 deletions
diff --git a/experimental.tex b/experimental.tex index 8085321..cc66af4 100644 --- a/experimental.tex +++ b/experimental.tex @@ -45,9 +45,9 @@ hallway. For each frame, the Kinect SDK performs figure detection to identify regions of interest. Then, it fits a skeleton to the identified figures and outputs a set of joints in real world coordinates. The view of the Kinect is seen in \fref{fig:hallway}, showing the color image, the depth image with -figures, and the fitted skeleton of a person in a single frame. Skeletons are +detected figures, and the fitted skeleton of a person in a single frame. Skeletons are fit from roughly 1-5 meters away from the Kinect. For each frame with a -skeleton we record color image and the positions of the joints. +skeleton we record the color image and the positions of the joints. \begin{figure}[t] \begin{center} @@ -78,7 +78,7 @@ Second, we reduce the number of features to nine by using the vertical symmetry of the human body: if two body parts are symmetric about the vertical axis, we bundle them into one feature by averaging their lengths. If only one of them is present, we take its value. If neither of them is present, the feature is -reported as missing for the frame. Finally, any frame with a missing feature is +reported as missing for the frame. Any frame with a missing feature is filtered out. The resulting nine features include the six arm, leg, and pelvis measurements from \xref{sec:uniqueness}, and three additional measurements: spine length, shoulder breadth, and head size. Here we list the nine features as @@ -110,9 +110,9 @@ range of the camera, we only keep the frames of a run that are 2-3 meters away from the Kinect. Ground truth person identification is obtained by manually labelling each run -based on the images captured by the color camera of the Kinect. For ease of -labelling, only the runs with people walking toward the camera are kept. These -are the runs where the average distance from the skeleton joints to the camera +based on the images captured from the color image stream of the Kinect. For ease of +labelling, only the runs with people walking toward the Kinect are kept. These +are the runs where the average distance from the skeleton joints to the Kinect is increasing. We perform five experiments. First, we test the performance of @@ -219,7 +219,7 @@ we reach 90\% accuracy at 60\% recall for a group size of 10 people. In the second experiment, we evaluate skeleton recognition in an online setting. Even though the previous evaluation is standard, it does not properly -reflect reality. A real-life setting could be as follows. The camera is placed +reflect reality. A real-world setting could be as follows. The camera is placed at the entrance of a building. When a person enters the building, his identity is detected based on the electronic key system and a new labeled run is added to the dataset. The identification algorithm is then retrained on the @@ -279,9 +279,7 @@ recognition rates mostly above 90\% for group sizes of 3 and 5. In the third experiment, we compare the performance of skeleton recognition with the performance of face recognition as given by \textsf{face.com}. At the time of writing, this is the best performing face recognition algorithm on the -LFW dataset~\footnote{\url{http://vis-www.cs.umass.edu/lfw/results.html}}. -The results show that face recognition has better accuracy than skeleton -recognition, but not by a large margin. +LFW dataset\footnote{\url{http://vis-www.cs.umass.edu/lfw/results.html}}. We use the publicly available REST API of \textsf{face.com} to do face recognition on our dataset. Due to the restrictions of the API, for this @@ -381,13 +379,14 @@ observation $\bx_i$ is replaced by $\bx_i'$ defined by: \begin{equation} \bx_i' = \bar{\bx}_{y_i} + \frac{\bx_i-\bar{\bx}_{y_i}}{2} \end{equation} -We believe that a reducing factor of 2 for the noise's variance is realistic -given the relative low resolution of the Kinect's infrared camera. +We believe that reducing the noise's variance by half is realistic +given the relatively low resolution of the Kinect's infrared camera. \fref{fig:var} compares the precision-recall curve of \fref{fig:offline:sht} to the curve of the same experiment run on the newly obtained dataset. We observe -a roughly 20\% increase in performace across most thresholds. Note that these -results would significantly outperform face recognition. +a roughly 20\% increase in performace across most thresholds. We +believe these results would significantly outperform face recognition +in a similar setting. %\begin{figure}[t] % \begin{center} |
