CNNs for Face Detection and Recognition

CNNs for Face Detection and RecognitionYicheng AnDepartment of Electrical EngineeringStanford WuDepartment of Electrical EngineeringStanford YueDepartment of Electrical EngineeringStanford face Detection method is becoming a more andmore important technique in our social lives. From face de-tection technology implemented in our cheap cameras to in-telligent agencies sophisticated global skynet surveillancesystem, such techniques have been widely used in a largenumber of areas and the market is still growing with a highspeed. Face Detection has been an active research area withmany successful traditional and deep learning methods. Inour project we are introducing new convolutional neuralnetwork method to tackle the face Detection to maintain ahigh accuracy while running in real Problem StatementIn our imaginary scenario, our technique could be usedby companies and facilities with security levels: customerscould put all personnel s info into the dataset, and then putthe camera at the desired position like the office front people coming in are not in the dataset, there would be analarm; otherwise, everything should be , our method has mainly 2 steps and 1 require-ment, the first step should be localization, which means thesystem could localize faces and then circle them out in pho-tos or videos.

The second step could be the classificationprocess, once faces get circled out, we need to tell whetherthis person belongs to our dataset, if so, who he/she is. Ifnot, just classify him/her as unknown and sound the our largest requirement would be to complete bothlocalization and classification processes in real time, other-wise it would be no use as a security camera PlanBased on the steps and requirement mentioned above, wesplit the whole project into several sections. The first partwould be finding the regions that would potentially containfaces. There are a lot of methods doing this job but we needto come up with a new one by ourselves to maintain the realtime processing the second section would be classifying these obtainedpotential regions to get identification numbers of people inthese regions. Then the third section would be testing its de-tection and classification accuracies and its processing timein order to make sure our algorithm could do the task inreal time.

The final section would be tuning the parametersto make the accuracy even higher, which was a pretty timeconsuming Related WorksFace Detection has been an active research area since thedevelopment of computer vision, and many classical anddeep learning approaches have been applied in this , similar to many other fields in computer vision,deep learning approach using neural network has achievedsignificant success in tackling face Detection as a subclassof object classification, localization, and Detection . Appar-ently, the evolve of face Detection correlates closely with thedevelopment of object classification, localization and detec-tion Sliding WindowIn the early development of face Detection , researcherstended to treat it as a repetitive task of object classifica-tion, by imposing sliding windows and performing objectclassification with the neural networks on the window re-gion. Vaillant et al.[6] had proposed a method to train a1convolutional neural network to detect the presence or ab-sence of a human inside an image sliding window area andscan the whole image with the neural network on the slid-ing window region for all possible locations.

This methodcan be easily tackled by today s state-of-the-art techniquesbut it can be viewed as an early approach of utilizing thesliding window technique to detect the human face. Thenface Detection techniques gradually evolved to extend forrotation invariant face Detection with a network to estimatethe face orientation in order to apply the proper detectornetwork with the corresponding face orientation [7]. Thenthe trend got shifted to Convolutional Neural Network af-ter CNNs have achieved significant breakthrough on imageclassification and object Detection [8], and the mainstreamface Detection methods have all turned to CNN-based ob-ject Detection algorithms. Nevertheless, the sliding windowapproach still needs to apply CNN on many different slid-ing windows and it is still a repetition of performing imageclassification on local regions; as a result, it is extremelycomputationally expensive with repetitive computations Detection with ProposalsIn order to handle the expensive computation problem,instead of performing CNN computation many times on ev-ery sliding window location, people tried to find a way toreduce the candidate locations of the sliding window.

Asa result, region proposal method was developed to find po-tential regions that have a high possibility containing ob-jects[10], through which the number of potential regions isreduced compared with the sliding window of the most significant breakthrough on object detec-tion with Region Proposals is the R-CNN developed by Gir-shick et al. [9]. First R-CNN generates approximately 2000 Regions of Interest (RoI) using the Region Proposal methodon the input image, then it warps each RoI into standard in-put size for the neural network and forward them into theCNNs dedicated for image classification and localizationand output the class category as well as the bounding boxcoordinates and sizes. However, R-CNN method still hasmany problems even after it used the region proposals. Forexample, the training time is slow, which takes 84 hours onthe PASCAL VOC datasets and it consumes many memoryspace [9]. Moreover, during the testing, the Detection is alsoslow; in average, it takes 47 seconds per image with VGG16model [11].

More advancements based on R-CNN network occurred todeal with the expensive slow run time problem, such as FastR-CNN [12] and Faster R-CNN [13]. Fast R-CNN forwardthe whole image through the CNN at the beginning so thatit is only performed once instead of many times in R-CNN[12]; Faster R-CNN performed RoI pooling and make theCNN to do the Region Proposal, which inserts the RegionFigure 1. The architecture of the R-CNNs with Region Proposals[9]Proposal Network (RPN) as part of the layers in the CNNmodel to predict the possibility of objectiveness in the re-gion [13].While they achieve significant reduce in training time andtest time compared with the R-CNN methods [12] [13], theruntime is still dominated by region proposals Detection without ProposalsSome Detection methods were introduced without theRegion Proposal method in order to achieve smaller trainingand testing time. YOLO [1] and SSD [5] are two detectionmethods without Region Proposals. The input image goesdirectly to one big convolutional neural network.

Insidethe network, the input image at first is divided into manygrid cells, and the classification scores and the boundingbox coordinates and scales are determined on each gridcell. And the overall object classes and bounding boxes arecalculated based on the results obtained from each grid two approaches further reduce the training and testtime time but the accuracy is compromised compared withthe method using Region Proposals [14]. ComparisonOverall, based on the previous related work, we foundout that the sliding windows technique has the lowest dif-ficulty in implementation since it is essentially a repetitionof performing image classification task but it would be ex-tremely slow during the training and testing time, while theaccuracy relies on the maturity of the network that performsthe image classification. The region proposal method has areduce on the training and testing time compared with thesliding window techniques and helps increase the detection2accuracy; in fact, faster R-CNN achieves the highest accu-racy compared with other methods [14].

In terms of train-ing and testing time, SSD is significantly faster than othermethods since it gets rid of the Region Proposal method, butwith a cost of reduced accuracy compared with those withRegion MethodsFor our project, we developed our face Detection meth-ods using the following approaches:First, we developed a model called Two Stream CNN,which specializes in classification and localization for a sin-gle face Detection . With an input image, this Two StreamCNN method is capable to output whether it contains a hu-man face or not, and if there is a human face, it would alsooutput the identity of that human (classification) as well asthe coordinate and the size of the bounding box (localiza-tion).We also try to perform our multi-object Detection with a cas-cade of Region of Interest Network and Two Stream Region of Interest Network helps reduce the number Two Stream CNNFor our face Detection problem, we first tried to simplifyit into a simpler problem as a single face Detection would construct our network that is capable to detecta single human face and output the coordinate and size ofthe bounding box as well as the class (human identificationor no human).

That is, we first simplify the face detectionproblem as a classification and localization model utilizes CNN to do classification and localiza-tion in a single evaluation. Our model consists of 6 primarymodules; each module has one convolution layer, one maxpooling layer and one leaky ReLU layer except the verylast module. Besides, we also added a few dropout layersbetween modules for regularization. In order to make pre-diction, we feed the result of the last module into two setsof fully connected layers, one for predicting the locationand the size (center, width, height) of human faces , anotherone for predicting the identity of the person. Our networkis intelligent enough to tell whether there are people in thescene, and if true, who the person is. Since our objectiveof this Two Stream CNN is to classify and localize singlehuman face, in case of multiple people showing up, the net-work selects the nearest one to the Cascade CNNW hile our Two Stream CNN dedicates to perform singleface Detection , it is essentially a classification and localiza-tion on single face only and is unable to tackle the imagewith multiple faces .

As a result, inspired by the region pro-posal method and sliding window method, we would du-Figure 2. The basic architecture of each moduleplicate this single face Detection algorithm cross candidatelocation of the , we perform sliding window across the whole imageand each sliding window is 48 pixels by 48 pixels, as shownon Figure 4. We choose this number due to the sizes of thehuman faces in our dataset; this window with such size iscapable to cover most human faces in our dataset. The win-dow is slided across the whole image with stride 24. Thecontent of each window is fed into a convolutional layer forbinary classification which can detect whether there is hu-man face or not on the image and in this stage, we only careabout whether there is a human face exist inside our win-dow. We set a threshold on the output score of the layerand if the score of the sliding window content exceeds thethreshold, we set that sliding window as our candidate re-gion for our next stage. In our second stage, non-maximumsuppression(NMS) is used to eliminate highly overlappeddetection region, in order to reduce the repetition of opera-tions.

CNNs for Face Detection and Recognition

Tags:

Information

Transcription of CNNs for Face Detection and Recognition

Related search queries

CNNs for Face Detection and Recognition

Tags:

Information

Documents from same domain

Related documents

Related search queries