diff options
| -rwxr-xr-x | data/pair-matching/roc.py | 10 | ||||
| -rw-r--r-- | experimental.tex | 50 | ||||
| -rw-r--r-- | graphics/roc.pdf | bin | 18167 -> 18648 bytes | |||
| -rw-r--r-- | related.tex | 12 | ||||
| -rw-r--r-- | uniqueness.tex | 36 |
5 files changed, 53 insertions, 55 deletions
diff --git a/data/pair-matching/roc.py b/data/pair-matching/roc.py index c0d7d83..4b26dcb 100755 --- a/data/pair-matching/roc.py +++ b/data/pair-matching/roc.py @@ -47,11 +47,11 @@ def gen_pairs(var,sk_data): if __name__ == "__main__": plt.figure(figsize=(3,2.2)) - ap = np.loadtxt("associatepredict.txt",delimiter=",") - indices = [i for i in range(ap.shape[0]) if ap[i,1]<0.1] - ap_false = ap[:,1][indices] - ap_true = ap[:,0][indices] - plt.plot(100*ap_false,100*ap_true,label="Face recognition") + #ap = np.loadtxt("associatepredict.txt",delimiter=",") + #indices = [i for i in range(ap.shape[0]) if ap[i,1]<0.1] + #ap_false = ap[:,1][indices] + #ap_true = ap[:,0][indices] + #plt.plot(100*ap_false,100*ap_true,label="Face recognition") plt.xlabel("False positive rate [%]") plt.ylabel("True positive rate [%]") np.random.seed() diff --git a/experimental.tex b/experimental.tex index f59be72..2c537df 100644 --- a/experimental.tex +++ b/experimental.tex @@ -43,11 +43,11 @@ We collect data using the Kinect SDK over a period of a week in a research laboratory setting. The Kinect is placed at the tee of a frequently used hallway. For each frame, the Kinect SDK performs figure detection to identify regions of interest. Then, it fits a skeleton to the identified figures and -outputs a set of joints in real world coordinates. The view of the Kinect is -seen in \fref{fig:hallway}, showing the color image, the depth image with -detected figures, and the fitted skeleton of a person in a single frame. Skeletons are -fit from roughly 1-5 meters away from the Kinect. For each frame with a -skeleton we record the color image and the positions of the joints. +outputs a set of joints in real world coordinates. The view from the Kinect +SDK is seen in \fref{fig:hallway}, showing the color image, the depth image +with detected figures, and the fitted skeleton of a person in a single frame. +Skeletons are fit from roughly 1-5 meters away from the Kinect. For each frame +with a skeleton we record the color image and the positions of the joints. \begin{figure*}[t] \begin{center} @@ -64,6 +64,7 @@ another part of the body. In those cases, the coordinates of these joints are either absent from the frame or present but tagged as \emph{Inferred} by the Kinect SDK. Inferred means that even though the joint is not visible in the frame, the skeleton-fitting algorithm attempts to guess the right location. +Note that in the experiment design we exclude inferred data points. \subsection{Experiment design} @@ -269,19 +270,20 @@ with the performance of face recognition as given by \textsf{face.com}. At the time of writing, this is the best performing face recognition algorithm on the LFW dataset\footnote{\url{http://vis-www.cs.umass.edu/lfw/results.html}}. -We use the publicly available REST API of \textsf{face.com} to do face -recognition on our dataset. Due to the restrictions of the API, for this -experiment we set $n_p = 5$ and train on one half of the data and test on the -remaining half. For comparison, the MoG algorithm is run with the same -training-testing partitioning of the dataset. In this setting, SHT is not -relevant for the comparison, because \textsf{face.com} does not give the -possibility to mark a sequence of frames as belonging to the same run. This -additional information would be used by the SHT algorithm and would thus bias -the experiment in favor of skeleton recognition. The results are shown in -\fref{fig:face}. Face recognition outperforms skeleton recognition, but by -less than 10\% at most thresholds. -%These results are promising, given that -%\textsf{face.com} is the state-of-the-art in face recognition. +We use the REST API of \textsf{face.com} to do face recognition on our dataset. +Due to the restrictions of the API, for this experiment we set $n_p = 5$ and +train on one half of the data and test on the remaining half. For comparison, +the MoG algorithm is run with the same training-testing partitioning of the +dataset. In this setting, SHT is not relevant for the comparison, because +\textsf{face.com} does not give the possibility to mark a sequence of frames as +belonging to the same run. This additional information would be used by the SHT +algorithm and would thus bias the experiment in favor of skeleton recognition. +The results are shown in \fref{fig:face}. Skeleton recognition performs +within 10\% of face recognition at most thresholds. +%outperforms +%skeleton recognition, but by less than 10\% at most thresholds. +%These results are promising, given that \textsf{face.com} is the +%state-of-the-art in face recognition. %However, this result does not take into account the disparity in the number of %runs which face recognition and skeleton recognition can classify frames, @@ -337,7 +339,9 @@ datasets are completely disjoint, the SHT algorithm is not learning the same profile for a person walking toward the camera and for a person walking away from the camera. The third curve of \fref{fig:back} shows the precision-recall curve when training and testing on the combined dataset of runs toward and away -from the camera with similar performance. +from the camera with similar performance. Note that while we could not obtain +enough labeled data for a full comparison when it is dark, manual experiments +show similar performance when there is no visible light. \subsection{Reducing the noise} @@ -353,10 +357,10 @@ resolution by artificially reducing the noise from our Kinect dataset. %the only source of classification error in our algorithm is the dispersion of %the observed limbs' lengths away from the exact measurements. To simulate a reduction of the noise level, the dataset is modified as follows: -we measure the average skeletal profile of each person, and for each frame -we divide the empirical variance from the average by 2. Formally, using the -same notations as in Section~\ref{sec:mixture of Gaussians}, each observation -$\bx_i$ is replaced by $\bx_i'$ defined by: +we measure the average skeletal profile of each person across the entire +dataset, and for each frame we divide the empirical variance from the average +by 2. Formally, using the same notations as in Section~\ref{sec:mixture of +Gaussians}, each observation $\bx_i$ is replaced by $\bx_i'$ defined by: \begin{equation} \bx_i' = \bar{\bx}_{y_i} + \frac{\bx_i-\bar{\bx}_{y_i}}{2} \end{equation} diff --git a/graphics/roc.pdf b/graphics/roc.pdf Binary files differindex ae0c809..b27c23b 100644 --- a/graphics/roc.pdf +++ b/graphics/roc.pdf diff --git a/related.tex b/related.tex index a7fd7e3..1ac25d9 100644 --- a/related.tex +++ b/related.tex @@ -30,12 +30,12 @@ image~\cite{gait-body1,gait-body2}. Furthermore, behavioral traits typically are more characteristic as opposed to unique, and are subject to change with time and based on the observed activity~\cite{gait-survey2}. -On the other hand, face recognition, as a physiological biometric, is a more -static feature. Face recognition can be broken down into three parts: face -detection, feature extraction, and classification; these three parts are -studied both individually and together~\cite{face-survey}. The Xbox 360 uses -face recognition with the Kinect as part of its user recognition algorithm, in -addition to the height inferred from the Kinect~\cite{kinect-identity}. +Face recognition, as a physiological biometric, is a more static feature. Face +recognition can be broken down into three parts: face detection, feature +extraction, and classification; these three parts are studied both individually +and combined~\cite{face-survey}. The Xbox 360 uses face recognition with the +Kinect as part of its user recognition algorithm, in addition to the height +inferred from the Kinect~\cite{kinect-identity}. In this paper, we propose using skeleton measurements as a biometric separate from face and gait biometrics. According to Jain~\etal{}~\cite{bio-survey}, a diff --git a/uniqueness.tex b/uniqueness.tex index ca89e85..429e7de 100644 --- a/uniqueness.tex +++ b/uniqueness.tex @@ -30,15 +30,14 @@ setting in the LFW benchmark. \subsection{Experiment design} -In order to run an experiment similar to the one used in the face -pair-matching problem (Section~\ref{sec:frb}), we use the Goldman -Osteological Dataset \cite{deadbodies}. This dataset consists of -skeletal measurements of 1,538 skeletons uncovered around the world and dating -from the modern geological era. Given the way this data was collected, only a -partial view of the skeleton is available. We keep six measurements: the -lengths of four bones (radius, humerus, femur, and tibia) and the breadth and -height of the pelvis. Because of missing values, this reduces the size of the -dataset to 1,191. +In order to run an experiment similar to the one used in the face pair-matching +problem (Section~\ref{sec:frb}), we use the \emph{Goldman Osteological Dataset} +\cite{deadbodies}. This dataset consists of skeletal measurements of 1,538 +skeletons uncovered around the world and dating from the modern geological era. +Given the way this data was collected, only a partial view of the skeleton is +available. We keep six measurements: the lengths of four bones (radius, +humerus, femur, and tibia) and the breadth and height of the pelvis. Because +of missing values, this reduces the size of the dataset to 1,191. From this dataset, 1,191 matched pairs and 1,191 unmatched pairs are generated. In practice, the exact measurements of the bones of living @@ -76,18 +75,13 @@ output of the algorithm for the threshold $\delta$ is defined as: \label{fig:roc} \end{figure} -Figure \ref{fig:roc} shows the ROC curve of the nearest neighbor -algorithm for different values of the standard deviation of the noise, -as well as the ROC of the best performing face detection algorithm in -the image-restricted LFW benchmark: \emph{Associate-Predict} -\cite{associate}. - -The results show that with a standard deviation of 3 millimeters, nearest -neighbor performs quite similarly to face detection at low -false-positive rate. At this noise level, the error is smaller than -1 centimeter with 99.9\% probability. Even with a standard deviation of 5 millimeters, it -is still possible to detect 90\% of the matched pairs with a false -positive rate of 6\%. +Figure \ref{fig:roc} shows the ROC curve of the nearest neighbor algorithm for +different values of the standard deviation of the noise. The results show that +with a standard deviation of 3 millimeters, nearest neighbor performs quite +similarly to face detection at low false-positive rate. At this noise level, +the error is smaller than 1 centimeter with 99.9\% probability. Even with a +standard deviation of 5 millimeters, it is still possible to detect 90\% of the +matched pairs with a false positive rate of 6\%. This experiment gives an idea of the noise variance level above which it is not possible to consistently distinguish skeletons. If the noise |
