summaryrefslogtreecommitdiffstats
path: root/experimental.tex
diff options
context:
space:
mode:
Diffstat (limited to 'experimental.tex')
-rw-r--r--experimental.tex87
1 files changed, 51 insertions, 36 deletions
diff --git a/experimental.tex b/experimental.tex
index dee0626..57189ab 100644
--- a/experimental.tex
+++ b/experimental.tex
@@ -40,10 +40,10 @@ operates in real-time without calibration.
%that it is the state-of-the-art and does not require calibration.
We collect data using the Kinect SDK over a period of a week in a research
-laboratory setting. The Kinect is placed at the tee of a well traversed
+laboratory setting. The Kinect is placed at the tee of a frequently used
hallway. The view of the Kinect is seen in \fref{fig:hallway}, showing the
color image, the depth image, and the fitted skeleton of a person in a single
-frame. Skeletons are fit from \~1-5 meters away from the Kinect. For each
+frame. Skeletons are fit from roughly 1-5 meters away from the Kinect. For each
frame where a person is detected and a skeleton is fit we capture the 3-D
coordinates of 20 body joints, and the color image.
@@ -167,15 +167,25 @@ varying group size $n_p = \{3,5,10,25\}$.
\fref{fig:offline} shows the precision-recall plot as the threshold varies.
Both algrithms perform three times better than the majority class baseline of
-15\% with a recall of 100\% on all people. Several curves are obtained for
-different group sizes: people are ordered based on their frequency of
-appearance (\fref{fig:frames}), and all the frames belonging to people beyond a
-given rank in this ordering are removed. The decrease of performance when
-increasing the number of people in the dataset can be explained by the
-overlaps between skeleton profiles due to the noise, as discussed in
-Section~\ref{sec:uniqueness}, but also by the very few number of runs available
-for the least present people, as seen in \fref{fig:frames}, which does not
-permit a proper training of the algorithm.
+15\% with a recall of 100\% on all people. We make two main observations.
+First, as expected, SHT performs better than MoG because of temporal smoothing.
+Second, performance is inversely proportional to group size. As we test
+against more people, there are more overlaps between skeleton profiles due to
+the noise, as discussed in Section~\ref{sec:uniqueness}. Also, the least
+present people have a small number of frames, as seen in \fref{fig:frames},
+which may not permit a proper training of the algorithm. For 3 and 5
+people (typical family sizes), we see recognition rates mostly above 90\%, and
+we reach 90\% accuracy at 60\% recall for a group size of 10 people.
+
+%Several curves are obtained for
+%different group sizes: people are ordered based on their frequency of
+%appearance (\fref{fig:frames}), and all the frames belonging to people beyond a
+%given rank in this ordering are removed. The decrease of performance when
+%increasing the number of people in the dataset can be explained by the
+%overlaps between skeleton profiles due to the noise, as discussed in
+%Section~\ref{sec:uniqueness}, but also by the very few number of runs available
+%for the least present people, as seen in \fref{fig:frames}, which does not
+%permit a proper training of the algorithm.
\begin{figure*}[t]
\begin{center}
@@ -214,16 +224,21 @@ to the dataset. The identification algorithm is then retrained on the
augmented dataset, and the newly obtained classifier can be deployed in the
building.
-In this setting, the sequential hypothesis testing (SHT) algorithm is more
-suitable than the algorithm used in Section~\ref{sec:experiment:offline}, because it
-accounts for the fact that a person identity does not change across a
-run. The analysis is therefore performed by partitioning the dataset
-into 10 time sequences of equal size. For a given threshold, the algorithm
-is trained and tested incrementally: trained on the first $k$
-sequences (in the chronological order) and tested on the $(k+1)$-th
-sequence. \fref{fig:online} shows the prediction-recall
-curve when averaging the prediction rate of the 10 incremental
-experiments.
+%In this setting, the sequential hypothesis testing (SHT) algorithm is more
+%suitable than the algorithm used in Section~\ref{sec:experiment:offline}, because it
+%accounts for the fact that a person identity does not change across a
+%run.
+We only evaluate SHT in this setting since it already takes consecutive frames
+into account and because it performed better than MoG in the offline setting
+(\ref{sec:experiment:offline}). We partition the dataset into 10 time
+sequences of equal size. For a given threshold, the algorithm is trained and
+tested incrementally: train on the first $k$ sequences (in the chronological
+order) and test on the $(k+1)$-th sequence. \fref{fig:online} shows the
+prediction-recall curve when averaging the prediction rate over the 10
+incremental experiments. Overall performance is worse than in
+\fref{fig:offline:sht} since the system trains on less data than in
+\ref{sec:experiment:offline} in all but the last step. We still see
+recognition rates mostly above 90\% for group sizes of 3 and 5.
\begin{figure}[t]
%\subfloat[Mixture of Gaussians]{
@@ -268,14 +283,14 @@ recognition, but not by a large margin.
We use the publicly available REST API of \textsf{face.com} to do face
recognition on our dataset. Due to the restrictions of the API, for this
experiment we train on one half of the data and test on the remaining half. For
-comparison, MoG algorithm is run with the same training-testing partitioning of
+comparison, the MoG algorithm is run with the same training-testing partitioning of
the dataset. In this setting, SHT is not relevant for the comparison, because
\textsf{face.com} does not give the possibility to mark a sequence of frames as
belonging to the same run. This additional information would be used by the SHT
algorithm and would thus bias the results in favor of skeleton recognition.
-However, this result does not take into account the disparity in the number of
-runs which face recognition and skeleton recognition can classify frames, which
-we discuss in the next experiment.
+%However, this result does not take into account the disparity in the number of
+%runs which face recognition and skeleton recognition can classify frames, which
+%we discuss in the next experiment.
\begin{figure}[t]
\parbox[t]{0.49\linewidth}{
@@ -306,7 +321,7 @@ we discuss in the next experiment.
In the next experiment, we include the runs in which people are walking away
from the Kinect that we could positively identify. The performance of face
-recognition ourperforms skeleton recognition the previous setting. However,
+recognition outperforms skeleton recognition in the previous setting. However,
there are many cases where only skeleton recognition is possible. The most
obvious one is when people are walking away from the camera. Coming back to the
raw data collected during the experiment design, we manually label the runs of
@@ -327,15 +342,15 @@ well as when they are walking towards the camera.
% \label{fig:back}
%\end{figure}
-\fref{fig:back} compares the curve obtained in \xref{sec:experiment:offline}
-with people walking toward the camera, with the curve obtained by running the
+\fref{fig:back} compares the results obtained in \xref{sec:experiment:offline}
+with people walking toward the camera, with the results of the
same experiment on the dataset of runs of people walking away from the camera.
-The two curves are sensibly the same. However, one could argue that as the two
+The two results are similar. However, one could argue that as the two
datasets are completely disjoint, the SHT algorithm is not learning the same
profile for a person walking toward the camera and for a person walking away
from the camera. The third curve of \fref{fig:back} shows the precision-recall
curve when training and testing on the combined dataset of runs toward and away
-from the camera.
+from the camera with similar performance.
\subsection{Reducing the noise}
@@ -355,13 +370,13 @@ observation $\bx_i$ is replaced by $\bx_i'$ defined by:
\begin{equation}
\bx_i' = \bar{\bx}_{y_i} + \frac{\bx_i-\bar{\bx}_{y_i}}{2}
\end{equation}
-We believe that a reducing factor of 2 for the noise's variance is
-realistic given the relative low resolution of the Kinect's infrared
-camera.
+We believe that a reducing factor of 2 for the noise's variance is realistic
+given the relative low resolution of the Kinect's infrared camera.
-\fref{fig:var} compares the Precision-recall curve of
-\fref{fig:offline:sht} to the curve of the same experiment run on
-the newly obtained dataset.
+\fref{fig:var} compares the precision-recall curve of \fref{fig:offline:sht} to
+the curve of the same experiment run on the newly obtained dataset. We observe
+a roughly 20\% increase in performace across most thresholds. Note that these
+results would significantly outperform face recognition.
%\begin{figure}[t]
% \begin{center}