1 files changed, 60 insertions, 48 deletions
diff --git a/experimental.tex b/experimental.tex
index a0c98ae..d46270a 100644
--- a/experimental.tex
+++ b/experimental.tex
@@ -100,13 +100,14 @@ the same ID, it means that the skeleton-fitting algorithm was able to detect
 the skeleton in a contiguous way. This allows us to define the concept of a
 \emph{run}: a sequence of frames with the same skeleton ID.
 
-We perform five experiments.  First, we test the performance of skeleton
-recognition using traditional 10-fold cross validation, to represent an offline
-setting.  Second, we run our algorithms in an online setting by training and
-testing the data over time.  Third, we pit skeleton recognition against the
-state-of-the-art in face recognition.  Next, we test how our solution performs
-when people are walking away from the camera.  Finally, we study what happens
-if the noise from the Kinect is reduced.
+We perform five experiments.  First, we test the performance of
+skeleton recognition using traditional 10-fold cross validation, to
+represent an offline learning setting.  Second, we run our algorithms
+in an online learning setting by training and testing the data over
+time.  Third, we pit skeleton recognition against the state-of-the-art
+in face recognition.  Next, we test how our solution performs when
+people are walking away from the camera.  Finally, we study what
+happens if the noise from the Kinect is reduced.
 
 %\begin{table}
 %\begin{center}
@@ -131,7 +132,7 @@ if the noise from the Kinect is reduced.
     \includegraphics[]{graphics/frames.pdf}
   \end{center}
   \vspace{-1.5\baselineskip}
-  \caption{Distribution of the frame ratio of each individual in the
+  \caption{Distribution of the frequency of each individual in the
   data set}
   \label{fig:frames}
 \end{figure}
@@ -139,22 +140,23 @@ if the noise from the Kinect is reduced.
 \subsection{Offline learning setting}
 \label{sec:experiment:offline}
 
-In the first experiment, we study the accuracy of skeleton recognition using
-10-fold cross validation.  The data set is partitioned into 10 continuous time
-sequences of equal size. For a given recall threshold, the algorithm is trained
-on 9 continuous time sequences and trained on the last one. This is repeated
-for the 10 possible testing subsamples. Averaging the prediction rate over
-these 10 training-testing experiments yields the prediction rate for the chosen
-threshold. We test the mixture of Gaussians (MoG) and sequential hypothesis
-testing (SHT) models, and find that SHT generally performs better than MoG, and
-that accuracy increases as group size decreases.
+In the first experiment, we study the accuracy of skeleton recognition
+using 10-fold cross validation.  The data set is partitioned into 10
+continuous time sequences of equal size. For a given recall threshold,
+the algorithm is trained on 9 sequences and tested on the last
+one. This is repeated for all 10 possible testing sequences. Averaging
+the prediction rate over these 10 training-testing experiments yields
+the prediction rate for the chosen threshold. We test the mixture of
+Gaussians (MoG) and sequential hypothesis testing (SHT) models, and
+find that SHT generally performs better than MoG, and that accuracy
+increases as group size decreases.
 
 
 \fref{fig:offline} shows the precision-recall plot as the threshold varies.
-Both algrithms perform better than three times the majority class baseline of
+Both algrithms perform three times better than the majority class baseline of
 15\% with a recall of 100\% on all people.  Several curves are obtained for
 different group sizes: people are ordered based on their frequency of
-appearance (\fref{fig:frames}, and all the frames belonging to people beyond a
+appearance (\fref{fig:frames}), and all the frames belonging to people beyond a
 given rank in this ordering are removed.  The decrease of performance when
 increasing the number of people in the data set can be explained by the
 overlaps between skeleton profiles due to the noise, as discussed in
@@ -168,7 +170,7 @@ permit a proper training of the algorithm.
     \includegraphics[]{graphics/offline-nb.pdf}
     \label{fig:offline:nb}
 }
-\subfloat[Sequential Hypothesis Learning]{
+\subfloat[Sequential Hypothesis Testing]{
     \includegraphics[]{graphics/offline-sht.pdf}
     \label{fig:offline:sht}
 }
@@ -200,13 +202,13 @@ augmented data set, and the newly obtained classifier can be deployed in the
 building. 
 
 In this setting, the sequential hypothesis testing (SHT) algorithm is more
-suitable than the algorithm used in the previous paragraph, because it
+suitable than the algorithm used in Section~\ref{sec:experiment:offline}, because it
 accounts for the fact that a person identity does not change across a
 run. The analysis is therefore performed by partitioning the dataset
-into 10 subsamples of equal size. For a given threshold, the algorithm
+into 10 time sequences of equal size. For a given threshold, the algorithm
 is trained and tested incrementally: trained on the first $k$
-subsamples (in the chronological order) and tested on the $(k+1)$-th
-subsample. \fref{fig:online} shows the prediction-recall
+sequences (in the chronological order) and tested on the $(k+1)$-th
+sequence. \fref{fig:online} shows the prediction-recall
 curve when averaging the prediction rate of the 10 incremental
 experiments.
 
@@ -220,17 +222,22 @@ experiments.
 \begin{center}
     \includegraphics[width=0.49\textwidth]{graphics/online-sht.pdf}
 \end{center}
-    \label{fig:online:sht}
-  \vspace{-1.5\baselineskip}
-    \caption{Results for the online setting, where $n_p$ is the size of
-    the group as in Figure~\ref{fig:offline}}
-    \label{fig:online}
 }
 \parbox[t]{0.49\linewidth}{
   \begin{center}
     \includegraphics[width=0.49\textwidth]{graphics/face.pdf}
   \end{center}
-  \vspace{-1.5\baselineskip}
+}
+\end{figure}
+\begin{figure}
+\vspace{-1.5\baselineskip}
+\parbox[t]{0.48\linewidth}{
+    \caption{Results for the online setting, where $n_p$ is the size of
+    the group as in Figure~\ref{fig:offline}}
+    \label{fig:online}
+}
+\hspace{0.02\linewidth}
+\parbox[t]{0.48\linewidth}{
   \caption{Results for face recognition versus skeleton recognition}
   \label{fig:face}
 }
@@ -259,21 +266,27 @@ we discuss in the next experiment.
 
 \begin{figure}[t]
 \parbox[t]{0.49\linewidth}{
-  \begin{center}
-    \includegraphics[width=0.49\textwidth]{graphics/back.pdf}
-  \end{center}
-  \vspace{-1.5\baselineskip}
-  \caption{Results with people walking away from and toward the camera}
-  \label{fig:back}
+\begin{center}
+  \includegraphics[width=0.49\textwidth]{graphics/back.pdf}
+\end{center}
 }
 \parbox[t]{0.49\linewidth}{
-  \begin{center}
-    \includegraphics[width=0.49\textwidth]{graphics/var.pdf}
-  \end{center}
-  \vspace{-1.5\baselineskip}
-  \caption{Results with and without halving the variance of the noise}
-  \label{fig:var}
-  }
+\begin{center}
+  \includegraphics[width=0.49\textwidth]{graphics/var.pdf}
+\end{center}
+}
+\end{figure}
+\begin{figure}
+\vspace{-1.5\baselineskip}
+\parbox[t]{0.48\linewidth}{
+\caption{Results with people walking away from and toward the camera}
+\label{fig:back}
+}
+\hspace{0.02\linewidth}
+\parbox[t]{0.48\linewidth}{
+\caption{Results with and without halving the variance of the noise}
+\label{fig:var}
+}
 \end{figure}
 
 \subsection{Walking away}
@@ -323,12 +336,11 @@ the Kinect.
 %the observed limbs' lengths away from the exact measurements.
 To simulate a reduction of the noise level, the data set is modified as
 follows: we compute the average profile of each person, and for each frame we
-divide the empirical variance from the average by 2. Formally, if $\bx$ is an
-observation in the 9-dimensional feature space for the person $i$, and if
-$\bar{\bx}$ is the average of all the observations available for this person in
-the data set, then $\bx$ is replaced by $\bx'$ defined by:
+divide the empirical variance from the average by 2. Formally, using
+the same notations as in Section~\ref{sec:mixture of Gaussians}, each
+observation $\bx_i$ is replaced by $\bx_i'$ defined by:
 \begin{equation}
-  \bx' = \bar{\bx} + \frac{\bx-\bar{\bx}}{\sqrt{2}}
+  \bx_i' = \bar{\bx}_{y_i} + \frac{\bx_i-\bar{\bx}_{y_i}}{2}
 \end{equation}
 We believe that a reducing factor of 2 for the noise's variance is
 realistic given the relative low resolution of the Kinect's infrared