Transcription of Object Recognition from Local Scale-Invariant Features
1 Object Recognition from Local Scale-Invariant FeaturesDavid G. LoweComputer Science DepartmentUniversity of British ColumbiaVancouver, , V6T 1Z4, of the International Conference onComputer Vision,Corfu (Sept. 1999)An Object Recognition system has been developed that uses anew class of Local image Features . The Features are invariantto image scaling, translation, and rotation, and partially in-variant to illumination changes and affine or 3D Features share similar properties with neurons in in-ferior temporal cortex that are used for Object recognitionin primate vision. Features are efficiently detected througha staged filtering approach that identifies stable points inscale space.
2 Image keys are created that allow for Local ge-ometric deformations by representing blurred image gradi-ents in multiple orientation planes and at multiple keys are used as input to a nearest-neighbor indexingmethod that identifies candidate Object matches. Final veri-fication of each match is achieved byfinding a low-residualleast-squares solution for the unknown model results show that robust Object recognitioncan be achieved in cluttered partially-occluded images witha computation time of under 2 IntroductionObject Recognition in cluttered real-world scenes requireslocal image Features that are unaffected by nearby clutter orpartial occlusion.
3 The Features must be at least partially in-variant to illumination, 3D projective transforms, and com-mon Object variations. On the other hand, the Features mustalso be sufficiently distinctive to identify specific objectsamong many alternatives. The difficulty of the Object recog-nition problem is due in large part to the lack of success infinding such image Features . However, recent research onthe use of dense Local Features ( , Schmid & Mohr [19])has shown that efficient Recognition can often be achievedby using Local image descriptors sampled at a large numberof repeatable paper presents a new method for image feature gen-eration called the Scale Invariant Feature Transform (SIFT).
4 This approach transforms an image into a large collectionof Local feature vectors, each of which is invariant to imagetranslation, scaling, and rotation, and partially invariant toillumination changes and affine or 3D projection. Previousapproaches to Local feature generation lacked invariance toscale and were more sensitive to projective distortion andillumination change. The SIFT Features share a number ofproperties in common with the responses of neurons in infe-rior temporal (IT) cortex in primate vision. This paper alsodescribes improved approaches to indexing and model Scale-Invariant Features are efficiently identified byusing a staged filtering approach.
5 The first stage identifieskey locations in scale space by looking for locations thatare maxima or minima of a difference-of-Gaussian point is used to generate a feature vector that describesthe Local image region sampled relative to its scale-space co-ordinate frame. The Features achieve partial invariance tolocal variations, such as affine or 3D projections, by blur-ring image gradient locations. This approach is based on amodel of the behavior of complex cells in the cerebral cor-tex of mammalian vision. The resulting feature vectors arecalled SIFT keys. In the current implementation, each im-age generates on the order of 1000 SIFT keys, a process thatrequires less than 1 second of computation SIFT keys derived from an image are used in anearest-neighbour approach to indexing to identify candi-date Object models.
6 Collections of keys that agree on a po-tential model pose are first identified through a Hough trans-form hash table, and then through a least-squares fit to a finalestimate of model parameters. When at least 3 keys agreeon the model parameters with low residual, there is strongevidence for the presence of the Object . Since there may bedozens of SIFT keys in the image of a typical Object , it ispossible to have substantial levels of occlusion in the imageand yet retain high levels of current Object models are represented as 2D loca-tions of SIFT keys that can undergo affine projection. Suf-ficient variation in feature location is allowed to recognizeperspective projection of planar shapes at up to a 60 degreerotation away from the camera or to allow up to a 20 degreerotation of a 3D Related researchObject Recognition is widely used in the machine vision in-dustry for the purposes of inspection, registration, and ma-nipulation.
7 However, current commercial systems for objectrecognition depend almost exclusively on correlation-basedtemplate matching . While very effective for certain engi-neered environments, where Object pose and illuminationare tightly controlled, template matching becomes computa-tionally infeasible when Object rotation, scale, illumination,and 3D pose are allowed to vary, and even more so whendealing with partial visibility and large model alternative to searching all image locations formatches is to extract Features from the image that are atleast partially invariant to the image formation process andmatching only to those Features .
8 Many candidate featuretypes have been proposed and explored, including line seg-ments [6], groupings of edges [11, 14], and regions [2],among many other proposals. While these Features haveworked well for certain Object classes, they are often not de-tected frequently enough or with sufficient stability to forma basis for reliable has been recent work on developing much densercollections of image Features . One approach has been touse a corner detector (more accurately, a detector of peaksin Local image variation) to identify repeatable image loca-tions, around which Local image properties can be al.
9 [23] used the Harris corner detector to iden-tify feature locations for epipolar alignment of images takenfrom differing viewpoints. Rather than attempting to cor-relate regions from one image against all possible regionsin a second image, large savings in computation time wereachieved by only matching regions centered at corner pointsin each the Object Recognition problem, Schmid & Mohr[19] also used the Harris corner detector to identify in-terest points, and then created a Local image descriptor ateach interest point from an orientation-invariant vector ofderivative-of-Gaussian image measurements.
10 These imagedescriptors were used for robust Object Recognition by look-ing for multiple matching descriptors that satisfied Object -based orientation and location constraints. This work wasimpressive both for the speed of Recognition in a largedatabase and the ability to handle cluttered corner detectors used in these previous approacheshave a major failing, which is that they examine an imageat only a single scale. As the change in scale becomes sig-nificant, these detectors respond to different image , since the detector does not provide an indication of theobject scale, it is necessary to create image descriptors andattempt matching at a large number of scales.