Example: confidence

Robust Classification of Objects, Faces, and Flowers Using ...

Robust Classification of Objects, faces , and FlowersUsing natural image StatisticsChristopher Kanan and Garrison CottrellDepartment of Computer Science and EngineeringUniversity of California, San of images in many category datasets hasrapidly improved in recent years. However, systems thatperform well on particular datasets typically have one ormore limitations such as a failure to generalize across vi-sual tasks ( , requiring a face detector or extensive re-tuning of parameters), insufficient translation invariance,inability to cope with partial views and occlusion, or sig-nificant performance degradation as the number of classesis we attempt to overcome these challenges Using amodel that combines sequential visual attention Using fixa-tions with sparse coding.

Robust Classification of Objects, Faces, and Flowers Using Natural Image Statistics Christopher Kanan and Garrison Cottrell Department of Computer Science and Engineering University of California, San Diego {ckanan,gary}@ucsd.edu Abstract Classification of images in many category datasets has

Tags:

  Using, Statistics, Image, Natural, Faces, Flowers, Object, And flowers using, And flowers using natural image statistics

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Robust Classification of Objects, Faces, and Flowers Using ...

1 Robust Classification of Objects, faces , and FlowersUsing natural image StatisticsChristopher Kanan and Garrison CottrellDepartment of Computer Science and EngineeringUniversity of California, San of images in many category datasets hasrapidly improved in recent years. However, systems thatperform well on particular datasets typically have one ormore limitations such as a failure to generalize across vi-sual tasks ( , requiring a face detector or extensive re-tuning of parameters), insufficient translation invariance,inability to cope with partial views and occlusion, or sig-nificant performance degradation as the number of classesis we attempt to overcome these challenges Using amodel that combines sequential visual attention Using fixa-tions with sparse coding.

2 The model s biologically-inspiredfilters are acquired Using unsupervised learning applied tonatural image patches. Using only a single feature type,our approach achieves accuracy on Caltech-101 on the 102 Flowers dataset when trained on 30 in-stances per class and it achieves accuracy on theAR Face database with 1 training instance per same features and parameters are used across thesedatasets to illustrate its Robust IntroductionBestowing an artificial vision system with a fraction ofthe abilities we primates enjoy has been the goal of manycomputational vision researchers. While steady progresshas been made toward this objective, the gap between thecapabilities of the primate visual system and state-of-the-artobject recognition systems remains vast.

3 Humans are capa-ble of accurately recognizing thousands of object categoriesand the primate visual system copes very well with transla-tion, scale, and rotation variance [31]. This has motivatedmany vision researchers to study human vision in order toextract operating principles that may improve object recog-nition techniques [31, 24, 28, 29].Aspects of these biologically inspired approaches, per-Figure 1. Two images and their corresponding bottom-up saliencymaps, which indicate interesting features in an image . Values withhigh salience are red and low are blue. Saliency maps can be usedin object recognition by Using them as interest point inadvertently, have been incorporated into state-of-the-art methods in object recognition.

4 For example, the compu-tation of SIFT descriptors [21] begins by Using difference ofGaussian (DOG) filters at multiple scales, which is similarto the sort of computation done in the vertebrate retina [10].Another example is spatial pooling of features or boxcarfiltering [17, 31, 28, 29, 38], a computation performed bymany neurons such as complex cells in primary visual cor-tex (V1) [31]. Sparse coding algorithms were pioneered inthe computational neuroscience community with Indepen-dent Component Analysis (ICA) [4, 36] and efficient codingmodels [7], but it has recently been adopted by the computervision community [33, 38, 30].However, one aspect of primate vision that has been1mostly ignored by computer vision researchers is visual at-tention [13].

5 Because we cannot process an entire visualscene at once, we sequentially look at, orfixate, salient lo-cations of an object or in a scene. The fixated region isanalyzed and then attention is redirected to other salient re-gions Using saccades, ballistic eye movements that are anovert manifestation of attention. We actively move our eyesto direct our highest resolution of visual processing towardsinteresting things over 170,000 times per day or about 3 sac-cades per second [32].Saliency mapsare a successful andbiologically plausible technique for modeling visual atten-tion (see figure 1).In this paper we propose an approach based upon twofacets of the visual system: sparse visual features that cap-ture the statistical regularities in natural scenes and sequen-tial fixation-based visual attention.

6 Our method is empiri-cally validated Using large object , face, and flower Using natural image StatisticsComputer vision has traditionally used features thathave been hand-designed, such as Haar wavelets, DOG fil-ters [21, 39], Gabor filters [28, 29, 1], histogram of ori-ented gradient (HOG) descriptors [25], SIFT descriptors[21, 38, 17, 27], and many others. An alternative is to useSelf-taught learning[30]: unsupervised learning appliedto unlabeled natural images to learn basis vectors/filters thatare good for representing natural images. The training datais generallydistinctfrom the datasets the system will beevaluated on. Self-taught learning works well because itrepresents natural scenes efficiently, while not overfitting toa particular dataset.

7 The space of possible images is incred-ibly large. However, natural scenes make up a relativelysmall portion of this space, so an efficient system shoulduse the statistics of natural scenes to its advantage [7]. Inobject recognition research self-taught learning algorithmstypically do this by employing sparse coding [30, 33].A sparse code is one in which only a small fraction ofthe units (binary values, neurons, etc.) are active at anyparticular time on average [7]. This is in contrast to a lo-cal code in which only a single unit is activated to indicatepresence or absence or a dense code that represents a signalby having many highly active units. Sparse codes forge acompromise between these two approaches that has yieldedmany useful algorithms.

8 When sparse coding is applied tonatural images, localized, oriented, and bandpass filters aretypically learned (see figure 4). These properties are sharedby neurons in V1, which exhibit very sparse activity [7]. Visual AttentionA saliency map is a topologically organized map that in-dicates interesting regions in an image based on the spatialorganization of the features and an agent s current goal [13].These maps can be entirely stimulus driven, or bottom-up, ifthe model lacks a specific goal. An example is provided infigure 1. There are numerous areas of the primate brain thatcontain putative saliency maps such as the frontal eye fields,superior colliculus, and lateral intraparietal sulcus [9].

9 There are many computational models designed to pro-duce saliency maps. Typically these algorithms producemaps that assign high saliency to regions with rare featuresor features that differ from their surroundings. What consti-tutes rare varies across the models. One way to representrare features is to determine how frequently they occur. Byfitting a distributionP(F), whereFrepresents image fea-tures, rare features can be immediately found by computingP(F) 1for an image . Some of the best models for predict-ing human eye movements while viewing natural scenes usethis approach [39, 3]. Sequential object RecognitionWhile many algorithms for saliency maps have beenused to predict the location of human eye movements, lit-tle work has been done on how they can be used to recog-nize individual objects.

10 There are a few notable exceptions[1, 23, 27, 15] and these approaches have several similar-ities. All of them begin by extracting features across theimage along with a saliency map to determine regions of in-terest. The saliency map may be based on the features usedfor classification or it may be created Using other small window representing a fixation is extracted fromthe features at the location picked from the saliency extracted fixation is then classified and subsequent fix-ations are then made according to the saliency map. Themechanisms used to combine information across fixationsvary across the sharing similar frameworks, these approachesalso have implementation similarities.


Related search queries