1 files changed, 36 insertions, 39 deletions
diff --git a/uniqueness.tex b/uniqueness.tex
index 854e4c2..927421a 100644
--- a/uniqueness.tex
+++ b/uniqueness.tex
@@ -1,9 +1,9 @@
 \section{Skeleton uniqueness}
 \label{sec:uniqueness}
 
-The most obvious concern raised by trying to use skeletons as a recognizable
-biometric is their uniqueness. Are skeletons consistently and sufficiently
-distinct to use them for person recognition?
+The most obvious concern raised by trying to use skeleton measurements as a
+recognizable biometric is their uniqueness. Are skeletons consistently and
+sufficiently distinct to use them for person recognition?
 
 \subsection{Face recognition benchmark}
 
@@ -13,41 +13,39 @@ problem}. In this problem you are given two measurements of the metric
 and you want to decide whether they come from the same individual
 (matched pair) or from two different individuals (unmatched pair).
 
-The \emph{Labeled Faces in the Wild} \cite{lfw} database is
-specifically suited to study the face pair matching problem and has
-been used to benchmark several face recognition algorithms. Raw data
-of this benchmark is publicly available and has been derived as
-follows: the database is split into 10 subsets. From each of these
-subsets, 300 matched pairs and 300 unmatched pairs are randomly
-chosen. Each algorithm runs 10 separate leave-one-out cross-validation
-experiments on these sets of pairs. Averaging the number of true
-positives and false positives across the 10 experiments for a given
-threshold then yields one point on the receiver operating
-characteristic curve (ROC curve: this is the curve of the
-true-positive rate vs. the false-positive rate as the threshold of the
-algorithm varies). Note that in this benchmark the identity
-information of the individuals appearing in the pairs is not
-available, which means that the algorithms cannot form additional
-image pairs from the input data. This is referred to as the
-\emph{Image-restricted} setting in the LFW benchmark.
+This benchmark is standard for face recognition using the \emph{Labeled Faces
+in the Wild} \cite{lfw} database.  Raw data of this benchmark is publicly
+available and has been derived as follows: the database is split into 10
+subsets. From each of these subsets, 300 matched pairs and 300 unmatched pairs
+are randomly chosen. Each algorithm runs 10 separate leave-one-out
+cross-validation experiments on these sets of pairs. Averaging the number of
+true positives and false positives across the 10 experiments for a given
+threshold then yields one point on the receiver operating characteristic (ROC)
+curve, which plots the true-positive rate against the false-positive rate as
+the threshold of the algorithm varies. Note that in this benchmark the identity
+information of the individuals appearing in the pairs is not available, which
+means that the algorithms cannot form additional image pairs from the input
+data. This is referred to as the \emph{Image-restricted} setting in the LFW
+benchmark.
 
 \subsection{Experiment design}
 
 In order to run an experiment similar to the one used in the face pair-matching
 problem, we use the Goldman Osteological Dataset \cite{deadbodies}. This
-dataset consists of osteometric measurements of 1538 skeletons dating from
-throughout the Holocene. Given the way these data were collected, only
-a partial view of the skeleton is available, we keep six measurements: the lengths of four
-bones (radius, humerus, femur, and tibia) and the breadth and height of the pelvis.
-Because of missing values, this reduces the size of the dataset to 1191.
+dataset consists of skeletal measurements of 1538 skeletons uncovered around
+the world and dating from throughout the last several thousand years. Given the
+way these data were collected, only a partial view of the skeleton is
+available, we keep six measurements: the lengths of four bones (radius,
+humerus, femur, and tibia) and the breadth and height of the pelvis.  Because
+of missing values, this reduces the size of the dataset to 1191.
 
 From this dataset, 1191 matched pairs and 1191 unmatched pairs are generated.
-With exact measurements, all skeletons are distinct and therefore every pair is
-correctly classified.  In practice, the exact measurements of the bones of
-living subjects are not directly accessible. Therefore, measurements are
-likely to have an error rate, whose variance depends on the method of collection 
-(\eg measuring limbs over clothing versus on bare skin). We simulate this error
-by adding independent random Gaussian noise to each measurement of the pairs.
+In practice, the exact measurements of the bones of living subjects are not
+directly accessible. Therefore, measurements are likely to have an error rate,
+whose variance depends on the method of collection (\eg measuring limbs over
+clothing versus on bare skin). Since there is only one sample per skeleton, we
+simulate this error by adding independent random Gaussian noise to each
+measurement of the pairs.
 
 \subsection{Results}
 
@@ -70,7 +68,7 @@ defined as:
   \begin{center}
     \includegraphics[width=10cm]{graphics/roc.pdf}
   \end{center}
-  \caption{Receiver operating characteristic (true positive rate
+  \caption{ROC curve (true positive rate
   vs. false positive rate) for several standard deviations of the
   noise and for the state-of-the-art \emph{Associate-Predict} face
   detection algorithm}
@@ -90,23 +88,22 @@ than 1cm with 99.9\% probability. Even with a standard
 deviation of 5mm, it is still possible to detect 90\% of the matched
 pairs with a false positive rate of 6\%.
 
-\todo{We should unify the language here with that in the related work (and intro)}
 This experiment gives an idea of the noise variance level above which
-it is not possible to consistently distinguish skeletons. This noise
-level can be interpreted as follows in the person recognition
-problem. For this problem, a classifier can be built be first learning
+it is not possible to consistently distinguish skeletons. 
+For this problem, a classifier can be built by first learning
 a \emph{skeleton profile} for each individual from all the
 measurements in the training set. Then, given a new skeleton
 measurement, the algorithm classifies it to the individual whose
 skeleton profile is closest to the new measurement. In this case,
 there are two distinct sources of noise:
 \begin{itemize}
-\item the absolute deviation of the estimator: how far is the
-  estimated profile from the exact skeleton profile of the person.
+\item the absolute deviation of the estimator: how far is the estimated profile
+  from the exact skeleton profile of the person due to figure position or
+  motion (\ie from walking).
 \item the noise of the new measurement: this comes from the device
   doing the measurement.
 \end{itemize}
-the combination of these two noises is what can be compared to the
+The combination of these two noise sources is what can be compared to the
 noise represented on the ROC curves. Section \label{sec:kinect} will
 give more insight on the structure of the noise.