Example: confidence

Object Detection with Discriminatively Trained Part Based ...

1. Object Detection with Discriminatively Trained part Based Models Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester and Deva Ramanan Abstract We describe an Object Detection system Based on mixtures of multiscale deformable part models. Our system is able to represent highly variable Object classes and achieves state-of-the-art results in the PASCAL Object Detection challenges. While deformable part models have become quite popular, their value had not been demonstrated on difficult benchmarks such as the PASCAL datasets.

ations in appearance, a single deformable model is often not expressive enough to represent a rich object category. Consider the problem of modeling the appearance of bi-cycles in photographs. People build bicycles of different types (e.g., mountain bikes, tandems, and 19th-century cycles with one big wheel and a small one) and view

Tags:

  With, Model, Part, Trained, Object, Detection, Object detection with discriminatively trained part, Discriminatively

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Object Detection with Discriminatively Trained Part Based ...

1 1. Object Detection with Discriminatively Trained part Based Models Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester and Deva Ramanan Abstract We describe an Object Detection system Based on mixtures of multiscale deformable part models. Our system is able to represent highly variable Object classes and achieves state-of-the-art results in the PASCAL Object Detection challenges. While deformable part models have become quite popular, their value had not been demonstrated on difficult benchmarks such as the PASCAL datasets.

2 Our system relies on new methods for discriminative training with partially labeled data. We combine a margin- sensitive approach for data-mining hard negative examples with a formalism we call latent SVM. A latent SVM is a reformulation of MI-SVM in terms of latent variables. A latent SVM is semi-convex and the training problem becomes convex once latent information is specified for the positive examples. This leads to an iterative training algorithm that alternates between fixing latent values for positive examples and optimizing the latent SVM objective function.

3 Index Terms Object Recognition, Deformable Models, Pictorial Structures, Discriminative Training, Latent SVM. F. 1 I NTRODUCTION it has been difficult to establish their value in practice. Object recognition is one of the fundamental challenges On difficult datasets deformable part models are often in computer vision. In this paper we consider the prob- outperformed by simpler models such as rigid templates lem of detecting and localizing generic objects from [10] or bag-of-features [44]. One of the goals of our work categories such as people or cars in static images.

4 This is to address this performance gap. is a difficult problem because objects in such categories While deformable models can capture significant vari- can vary greatly in appearance. Variations arise not only ations in appearance, a single deformable model is often from changes in illumination and viewpoint, but also not expressive enough to represent a rich Object category. due to non-rigid deformations, and intraclass variability Consider the problem of modeling the appearance of bi- in shape and other visual properties.

5 For example, peo- cycles in photographs. People build bicycles of different ple wear different clothes and take a variety of poses types ( , mountain bikes, tandems, and 19th-century while cars come in a various shapes and colors. cycles with one big wheel and a small one) and view We describe an Object Detection system that represents them in various poses ( , frontal versus side views). highly variable objects using mixtures of multiscale de- The system described here uses mixture models to deal formable part models. These models are Trained using with these more significant variations.

6 A discriminative procedure that only requires bounding We are ultimately interested in modeling objects using boxes for the objects in a set of images. The resulting visual grammars . Grammar Based models ( [16], system is both efficient and accurate, achieving state-of- [24], [45]) generalize deformable part models by rep- the-art results on the PASCAL VOC benchmarks [11] resenting objects using variable hierarchical structures. [13] and the INRIA Person dataset [10]. Each part in a grammar Based model can be defined Our approach builds on the pictorial structures frame- directly or in terms of other parts.

7 Moreover, grammar work [15], [20]. Pictorial structures represent objects by Based models allow for, and explicitly model , structural a collection of parts arranged in a deformable configu- variations. These models also provide a natural frame- ration. Each part captures local appearance properties of work for sharing information and computation between an Object while the deformable configuration is charac- different Object classes. For example, different models terized by spring-like connections between certain pairs might share reusable parts.

8 Of parts. Although grammar Based models are our ultimate Deformable part models such as pictorial structures goal, we have adopted a research methodology under provide an elegant framework for Object Detection . Yet which we gradually move toward richer models while maintaining a high level of performance. Improving Felzenszwalb is with the Department of Computer Science, University performance by enriched models is surprisingly difficult. of Chicago. E-mail: Simple models have historically outperformed sophis- Girshick is with the Department of Computer Science, University of ticated models in computer vision, speech recognition, Chicago.

9 E-mail: D. McAllester is with the Toyota Technological Institute at Chicago. E- machine translation and information retrieval. For ex- mail: ample, until recently speech recognition and machine D. Ramanan is with the Department of Computer Science, UC Irvine. translation systems Based on n-gram language models E-mail: outperformed systems Based on grammars and phrase 2. structure. In our experience maintaining performance seems to require gradual enrichment of the model . One reason why simple models can perform better in practice is that rich models often suffer from difficulties in training.

10 For Object Detection , rigid templates and bag- of-features models can be easily Trained using discrimi- native methods such as support vector machines (SVM). Richer models are more difficult to train, in particular because they often make use of latent information. Consider the problem of training a part - Based model from images labeled only with bounding boxes around the objects of interest. Since the part locations are not labeled, they must be treated as latent (hidden) variables during training. More complete labeling might support better training, but it can also result in inferior training if the labeling used suboptimal parts.


Related search queries