uniqueness.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119

\section{Skeleton uniqueness}
\label{sec:uniqueness}

\begin{figure*}[t]
  \centering
  \includegraphics[width=0.99\textwidth]{graphics/limbs.pdf}
\vspace{-\baselineskip}
  \caption{Histograms of differences between 9 skeleton measurements
  $x_k$ (Section~\ref{sec:experiment}) and their expectation given the
  class $y$. In red, the p.d.f. of a normal distribution with mean and
  variance equal to the empirical mean and variance of the measurement}
  \label{fig:error marginals}
\end{figure*}

The most obvious concern raised by trying to use skeleton measurements as a
recognizable biometric is their uniqueness. Are skeletons consistently and
sufficiently distinct to use them for person recognition?

\subsection{Face recognition benchmark}
\label{sec:frb}

A good way to understand the uniqueness of a metric is to look at how
well an algorithm based on it performs in the \emph{pair-matching
problem}. In this problem you are given two measurements of the metric
and you want to decide whether they come from the same individual
(matched pair) or from two different individuals (unmatched pair).

This benchmark is standard for face recognition, which uses the \emph{Labeled
Faces in the Wild} \cite{lfw} database for the pairs.  Raw data of this
benchmark is publicly available and has been derived as follows: the database
is split into 10 subsets. From each of these subsets, 300 matched pairs and 300
unmatched pairs are randomly chosen. Each algorithm runs 10 separate
leave-one-out cross-validation experiments on these sets of pairs. The average
of the false-positive rates and the true-positive rates across the 10
experiments for a given threshold gives one operating point on the receiver
operating characteristic (ROC) curve (Figure~\ref{fig:roc}). Note that in this
benchmark, the identity information of the individuals appearing in the pairs
is not available, which means that the algorithms cannot form additional image
pairs from the input data. This is referred to as the \emph{image-restricted}
setting in the LFW benchmark.

\subsection{Experiment design}

In order to run an experiment similar to the one used in the face pair-matching
problem (Section~\ref{sec:frb}), we use the \emph{Goldman Osteological Dataset}
\cite{deadbodies}. This dataset consists of skeletal measurements of 1,538
skeletons uncovered around the world and dating from the modern geological era.
Given the way this data was collected, only a partial view of the skeleton is
available.  We keep six measurements: the lengths of four bones (radius,
humerus, femur, and tibia) and the breadth and height of the pelvis.  Because
of missing values, this reduces the size of the dataset to 1,191.

From this dataset, 1,191 matched pairs and 1,191 unmatched pairs are
generated.  In practice, the exact measurements of the bones of living
subjects are not directly accessible. Therefore, measurements are
likely to have an error rate, whose variance depends on the method of
collection (\eg measuring limbs over clothing versus on bare
skin). Since each skeleton appears only once in the dataset, we
simulate this error by adding independent random Gaussian noise to
each measurement of the pairs.

\subsection{Results}

We evaluate the performance of the pair-matching problem on the
dataset by using a nearest neighbor algorithm: for a given
threshold, a pair will be classified as \emph{matched} if the
Euclidean distance between the two skeletons is lower than the
threshold, and \emph{unmatched} otherwise. Formally, let
$(\bs_1,\bs_2)$ be an input pair of the algorithm
($\bs_i\in\mathbf{R}_+^{6}$, are the six bone measurements), the
output of the algorithm for the threshold $\delta$ is defined as:
\begin{equation}
  A_\delta(\bs_1,\bs_2) = \begin{cases}
    1 & \text{if $d(\bs_1,\bs_2) < \delta$}\\
    0 & \text{otherwise}
  \end{cases}
\end{equation}

\begin{figure}[t]
  \begin{center}
    \includegraphics[width=0.49\textwidth]{graphics/roc.pdf}
  \end{center}
\vspace{-1.5\baselineskip}
  \caption{ROC curve for several standard deviations of the noise and
  for the state-of-the-art \emph{Associate-Predict} face detection
  algorithm. The standard deviation $\sigma$ is shown in millimeters}
  \label{fig:roc}
\end{figure}

Figure \ref{fig:roc} shows the ROC curve of the nearest neighbor algorithm for
different values of the standard deviation of the noise.  The results show that
with a standard deviation of 3 millimeters, nearest neighbor performs quite
similarly to face detection at low false-positive rate. At this noise level,
the error is smaller than 1 centimeter with 99.9\% probability. Even with a
standard deviation of 5 millimeters, it is still possible to detect 90\% of the
matched pairs with a false positive rate of 6\%.

This experiment gives an idea of the noise variance level above which
it is not possible to consistently distinguish skeletons. If the noise
is small, a highly accurate classifier can be built by first learning
a \emph{skeleton profile} for each individual from all the
measurements in the training set. Then, given a new skeleton
measurement, the algorithm classifies it to the individual whose
skeleton profile is closest to the new measurement. In this case,
there are two distinct sources of noise:
\begin{itemize}
\item the absolute deviation of the estimator: how far is the
  estimated profile from the exact skeleton profile of the person due
  to figure position or motion (\ie from walking).
\item the noise of the new measurement: this comes from the device
  doing the measurement.
\end{itemize}
In \xref{sec:experiment} we show that we can learn good models despite this
noise.

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "kinect"
%%% End: