pass on results section

author: Jon Whiteaker <jbw@berkeley.edu> 2012-03-04 19:21:03 -0800
committer: Jon Whiteaker <jbw@berkeley.edu> 2012-03-04 19:21:03 -0800
commit: ebcc22b552387a4ca9f7830aa118d95d047a20fb (patch)
tree: 2f5a9d8da81106507afea242354d66f5dc691023
parent: ed0be68bfe1098830cc860a0bf3862ec8693aa2e (diff)
download: kinect-ebcc22b552387a4ca9f7830aa118d95d047a20fb.tar.gz
1 files changed, 86 insertions, 68 deletions
diff --git a/experimental.tex b/experimental.tex
index 6c8146c..a0c98ae 100644
--- a/experimental.tex
+++ b/experimental.tex
@@ -137,6 +137,7 @@ if the noise from the Kinect is reduced.
 \end{figure}
 
 \subsection{Offline learning setting}
+\label{sec:experiment:offline}
 
 In the first experiment, we study the accuracy of skeleton recognition using
 10-fold cross validation.  The data set is partitioned into 10 continuous time
@@ -174,6 +175,7 @@ permit a proper training of the algorithm.
   \caption{Results with 10-fold cross validation for the top $n_p$ most present people}
 \label{fig:offline}
 \end{center}
+  \vspace{-1.5\baselineskip}
 \end{figure*}
 
 %\begin{figure}[t]
@@ -195,7 +197,7 @@ at the entrance of a building. When a person enters the building, his identity
 is detected based on the electronic key system and a new labeled run is added
 to the data set. The identification algorithm is then retrained on the
 augmented data set, and the newly obtained classifier can be deployed in the
-building. The results of 
+building. 
 
 In this setting, the sequential hypothesis testing (SHT) algorithm is more
 suitable than the algorithm used in the previous paragraph, because it
@@ -208,69 +210,86 @@ subsample. \fref{fig:online} shows the prediction-recall
 curve when averaging the prediction rate of the 10 incremental
 experiments.
 
-\begin{figure*}[t]
+\begin{figure}[t]
+%\subfloat[Mixture of Gaussians]{
+%    \includegraphics[width=0.49\textwidth]{graphics/online-nb.pdf}
+%    \label{fig:online:nb}
+%}
+%\subfloat[Sequential hypothesis testing]{
+\parbox[t]{0.49\linewidth}{
 \begin{center}
-\subfloat[Mixture of Gaussians]{
-    \includegraphics[width=0.49\textwidth]{graphics/online-nb.pdf}
-    \label{fig:online:nb}
-}
-\subfloat[Sequential hypothesis testing]{
     \includegraphics[width=0.49\textwidth]{graphics/online-sht.pdf}
+\end{center}
     \label{fig:online:sht}
+  \vspace{-1.5\baselineskip}
+    \caption{Results for the online setting, where $n_p$ is the size of
+    the group as in Figure~\ref{fig:offline}}
+    \label{fig:online}
 }
-\caption{Results for the online setting, where $n_p$ is the size of
-the group as in Figure~\ref{fig:offline}}
-\label{fig:online}
-\end{center}
-\end{figure*}
+\parbox[t]{0.49\linewidth}{
+  \begin{center}
+    \includegraphics[width=0.49\textwidth]{graphics/face.pdf}
+  \end{center}
+  \vspace{-1.5\baselineskip}
+  \caption{Results for face recognition versus skeleton recognition}
+  \label{fig:face}
+}
+\end{figure}
 
 \subsection{Face recognition}
 
-We then compare the performance of skeleton recognition with the performance of
-face recognition as given by \textsf{face.com}.  At the time of writing, this
-is the best performing face recognition algorithm on the LFW data set
-~\cite{face-com}.
+In the third experiment, we compare the performance of skeleton recognition
+with the performance of face recognition as given by \textsf{face.com}.  At the
+time of writing, this is the best performing face recognition algorithm on the
+LFW data set~\footnote{\url{http://vis-www.cs.umass.edu/lfw/results.html}}.
+The results show that face recognition has better accuracy than skeleton
+recognition, but not by a large margin.
 
 We use the publicly available REST API of \textsf{face.com} to do face
 recognition on our data set.  Due to the restrictions of the API, for this
 experiment we train on one half of the data and test on the remaining half. For
-comparison, the Gaussian mixture algorithm is run with the same
-training-testing partitioning of the data set. In this setting, the Sequential
-Hypothesis Testing algorithm is not relevant for the comparison, because
+comparison, MoG algorithm is run with the same training-testing partitioning of
+the data set. In this setting, SHT is not relevant for the comparison, because
 \textsf{face.com} does not give the possibility to mark a sequence of frames as
 belonging to the same run. This additional information would be used by the SHT
 algorithm and would thus bias the results in favor of skeleton recognition.
+However, this result does not take into account the disparity in the number of
+runs which face recognition and skeleton recognition can classify frames, which
+we discuss in the next experiment.
 
 \begin{figure}[t]
 \parbox[t]{0.49\linewidth}{
   \begin{center}
-    \includegraphics[width=0.49\textwidth]{graphics/face.pdf}
+    \includegraphics[width=0.49\textwidth]{graphics/back.pdf}
   \end{center}
   \vspace{-1.5\baselineskip}
-  \caption{Results for face recognition versus skeleton recognition}
-  \label{fig:face}
+  \caption{Results with people walking away from and toward the camera}
+  \label{fig:back}
 }
 \parbox[t]{0.49\linewidth}{
   \begin{center}
-    \includegraphics[width=0.49\textwidth]{graphics/back.pdf}
+    \includegraphics[width=0.49\textwidth]{graphics/var.pdf}
   \end{center}
   \vspace{-1.5\baselineskip}
-  \caption{Results with people walking away from and toward the camera}
-  \label{fig:back}
-}
+  \caption{Results with and without halving the variance of the noise}
+  \label{fig:var}
+  }
 \end{figure}
 
 \subsection{Walking away}
 
-The performance of face recognition and skeleton recognition are comparable in
-the previous setting.  However, there are many cases where only skeleton
-recognition is possible. The most obvious one is when people are walking away
-from the camera. Coming back to the raw data collected during the experiment
-design, we manually label the runs of people walking away from the camera. In
-this case, it is harder to get the ground truth classification and some of runs
-are dropped because it is not possible to recognize the person. Apart from
-that, the data set reduction is performed exactly as explained in
-Section~\ref{sec:experiment-design}.
+In the next experiment, we include the runs in which people are walking away
+from the Kinect that we could positively identify.  The performance of face
+recognition ourperforms skeleton recognition the previous setting.  However,
+there are many cases where only skeleton recognition is possible. The most
+obvious one is when people are walking away from the camera. Coming back to the
+raw data collected during the experiment design, we manually label the runs of
+people walking away from the camera. In this case, it is harder to get the
+ground truth classification and some of runs are dropped because it is not
+possible to recognize the person. Apart from that, the data set reduction is
+performed exactly as explained in Section~\ref{sec:experiment-design}. Our
+results show that we can identify people walking away from the camera just as
+well as when they are walking towards the camera.
 
 %\begin{figure}[t]
 %  \begin{center}
@@ -282,33 +301,32 @@ Section~\ref{sec:experiment-design}.
 %  \label{fig:back}
 %\end{figure}
 
-\fref{fig:back} compares the curve obtained in the online
-setting with people walking toward the camera, with the curve obtained
-by running the same experiment on the data set of runs of people
-walking away from the camera. The two curves are sensibly the
-same. However, one could argue that as the two data sets are
-completely disjoint, the SHT algorithm is not learning the same
-profile for a person walking toward the camera and for a person
-walking away from the camera. \fref{fig:back} shows the
-Precision-recall curve when training on runs toward the camera and
-testing on runs away from the camera.
+\fref{fig:back} compares the curve obtained in \xref{sec:experiment:offline}
+with people walking toward the camera, with the curve obtained by running the
+same experiment on the data set of runs of people walking away from the camera.
+The two curves are sensibly the same. However, one could argue that as the two
+data sets are completely disjoint, the SHT algorithm is not learning the same
+profile for a person walking toward the camera and for a person walking away
+from the camera. The third curve of \fref{fig:back} shows the precision-recall
+curve when training and testing on the combined dataset of runs toward and away
+from the camera.
 
 \subsection{Reducing the noise} 
 
-Predicting potential improvements of the prediction rate of our algorithm is
-straightforward. The algorithm relies on 9 features only.
-\xref{sec:uniqueness} shows that 6 of these features alone are
-sufficient to perfectly distinguish two different skeletons at a low noise
-level. Therefore, the only source of classification error in our algorithm is
-the dispersion of the observed limbs' lengths away from the exact measurements.
-
-To simulate a possible reduction of the noise level, the data set is
-modified as follows: all the observations for a given person are
-homothetically contracted towards their average so as to divide their
-empirical variance by 2. Formally, if $\bx$ is an observation in the
-9-dimensional feature space for the person $i$, and if $\bar{\bx}$ is
-the average of all the observations available for this person in the
-data set, then $\bx$ is replaced by $\bx'$ defined by:
+For the final experiment, we study what happens when the noise is reduced on
+the Kinect.  
+%Predicting potential improvements of the prediction rate of our
+%algorithm is straightforward. The algorithm relies on 9 features only.
+%\xref{sec:uniqueness} shows that 6 of these features alone are sufficient to
+%perfectly distinguish two different skeletons at a low noise level. Therefore,
+%the only source of classification error in our algorithm is the dispersion of
+%the observed limbs' lengths away from the exact measurements.
+To simulate a reduction of the noise level, the data set is modified as
+follows: we compute the average profile of each person, and for each frame we
+divide the empirical variance from the average by 2. Formally, if $\bx$ is an
+observation in the 9-dimensional feature space for the person $i$, and if
+$\bar{\bx}$ is the average of all the observations available for this person in
+the data set, then $\bx$ is replaced by $\bx'$ defined by:
 \begin{equation}
   \bx' = \bar{\bx} + \frac{\bx-\bar{\bx}}{\sqrt{2}}
 \end{equation}
@@ -317,17 +335,17 @@ realistic given the relative low resolution of the Kinect's infrared
 camera. 
 
 \fref{fig:var} compares the Precision-recall curve of
-\fref{fig:sequential} to the curve of the same experiment run on
+\fref{fig:offline:sht} to the curve of the same experiment run on
 the newly obtained data set.
 
-\begin{figure}[t]
-  \begin{center}
-    \includegraphics[width=0.49\textwidth]{graphics/var.pdf}
-  \end{center}
-  \vspace{-1.5\baselineskip}
-  \caption{Results with and without halving the variance of the noise}
-  \label{fig:var}
-\end{figure}
+%\begin{figure}[t]
+%  \begin{center}
+%    \includegraphics[width=0.49\textwidth]{graphics/var.pdf}
+%  \end{center}
+%  \vspace{-1.5\baselineskip}
+%  \caption{Results with and without halving the variance of the noise}
+%  \label{fig:var}
+%\end{figure}
 
 %%% Local Variables: 
 %%% mode: latex
author	Jon Whiteaker <jbw@berkeley.edu>	2012-03-04 19:21:03 -0800
committer	Jon Whiteaker <jbw@berkeley.edu>	2012-03-04 19:21:03 -0800
commit	ebcc22b552387a4ca9f7830aa118d95d047a20fb (patch)
tree	2f5a9d8da81106507afea242354d66f5dc691023
parent	ed0be68bfe1098830cc860a0bf3862ec8693aa2e (diff)
download	kinect-ebcc22b552387a4ca9f7830aa118d95d047a20fb.tar.gz