Gradient-Based Learning Applied to Document Recognition

Gradient-Based Learning Appliedto Document RecognitionYANN LECUN,MEMBER, IEEE,L EON BOTTOU, YOSHUA BENGIO,ANDPATRICK HAFFNERI nvited PaperMultilayer neural networks trained with the back-propagationalgorithm constitute the best example of a successful Gradient-Based Learning technique. Given an appropriate networkarchitecture, Gradient-Based Learning algorithms can be usedto synthesize a complex decision surface that can classifyhigh-dimensional patterns, such as handwritten characters, withminimal preprocessing. This paper reviews various methodsapplied to handwritten character Recognition and compares themon a standard handwritten digit Recognition task.

Convolutionalneural networks, which are specifically designed to deal withthe variability of two dimensional (2-D) shapes, are shown tooutperform all other Document Recognition systems are composed of multiplemodules including field extraction, segmentation, Recognition ,and language modeling. A new Learning paradigm, called graphtransformer networks (GTN s), allows such multimodule systemsto be trained globally using Gradient-Based methods so as tominimize an overall performance systems for online handwriting Recognition are demonstrate the advantage of global training, andthe flexibility of graph transformer graph transformer network for reading a bank check isalso described.

It uses convolutional neural network characterrecognizers combined with global training techniques to providerecord accuracy on business and personal checks. It is deployedcommercially and reads several million checks per Convolutional neural networks, Document recog-nition, finite state transducers, Gradient-Based Learning , graphtransformer networks, machine Learning , neural networks, opticalcharacter Recognition (OCR).NOMENCLATUREGTG raph transformer Markov received November 1, 1997; revised April 17, LeCun, L. Bottou, and P. Haffner are with the Speech and ImageProcessing Services Research Laboratory, AT&T Labs-Research, RedBank, NJ 07701 Bengio is with the D epartement d Informatique et de RechercheOp erationelle, Universit e de Montr eal, Montr eal, Qu ebec H3C 3J7 Item Identifier S 0018-9219(98) character component basis Reduced-set support vector displacement neural vector delay neural Virtual support vector INTRODUCTIONOver the last several years, machine Learning techniques,particularly when Applied to NN s, have played an increas-ingly important role in the design of pattern recognitionsystems.

In fact, it could be argued that the availabilityof Learning techniques has been a crucial factor in therecent success of pattern Recognition applications such ascontinuous speech Recognition and handwriting main message of this paper is that better patternrecognition systems can be built by relying more on auto-matic Learning and less on hand-designed heuristics. Thisis made possible by recent progress in machine learningand computer technology. Using character Recognition as acase study, we show that hand-crafted feature extraction canbe advantageously replaced by carefully designed learningmachines that operate directly on pixel images.

Usingdocument understanding as a case study, we show that thetraditional way of building Recognition systems by manuallyintegrating individually designed modules can be replacedby a unified and well-principled design paradigm, calledGTN s, which allows training all the modules to optimizea global performance the early days of pattern Recognition it has beenknown that the variability and richness of natural data,be it speech, glyphs, or other types of patterns, make italmost impossible to build an accurate Recognition systementirely by hand. Consequently, most pattern recognitionsystems are built using a combination of automatic learningtechniques and hand-crafted algorithms.

The usual method0018 9219/98$ 1998 IEEE2278 PROCEEDINGS OF THE IEEE, VOL. 86, NO. 11, NOVEMBER 1998 Fig. pattern Recognition is performed with twomodules: a fixed feature extractor and a trainable recognizing individual patterns consists in dividing thesystem into two main modules shown in Fig. 1. The firstmodule, called the feature extractor, transforms the inputpatterns so that they can be represented by low-dimensionalvectors or short strings of symbols that: 1) can be easilymatched or compared and 2) are relatively invariant withrespect to transformations and distortions of the input pat-terns that do not change their nature.

The feature extractorcontains most of the prior knowledge and is rather specificto the task. It is also the focus of most of the design effort,because it is often entirely hand crafted. The classifier,on the other hand, is often general purpose and of the main problems with this approach is that therecognition accuracy is largely determined by the ability ofthe designer to come up with an appropriate set of turns out to be a daunting task which, unfortunately,must be redone for each new problem. A large amount ofthe pattern Recognition literature is devoted to describingand comparing the relative merits of different feature setsfor particular , the need for appropriate feature extractorswas due to the fact that the Learning techniques usedby the classifiers were limited to low-dimensional spaceswith easily separable classes [1].

A combination of threefactors has changed this vision over the last decade. First,the availability of low-cost machines with fast arithmeticunits allows for reliance on more brute-force numerical methods than on algorithmic refinements. Second, the avail-ability of large databases for problems with a large marketand wide interest, such as handwriting Recognition , hasenabled designers to rely more on real data and less onhand-crafted feature extraction to build Recognition third and very important factor is the availabilityof powerful machine Learning techniques that can handlehigh-dimensional inputs and can generate intricate decisionfunctions when fed with these large data sets.

It can beargued that the recent progress in the accuracy of speechand handwriting Recognition systems can be attributed inlarge part to an increased reliance on Learning techniquesand large training data sets. As evidence of this fact, a largeproportion of modern commercial OCR systems use someform of multilayer NN trained with back this study, we consider the tasks of handwrittencharacter Recognition (Sections I and II) and compare theperformance of several Learning techniques on a benchmarkdata set for handwritten digit Recognition (Section III).While more automatic Learning is beneficial, no learningtechnique can succeed without a minimal amount of priorknowledge about the task.

In the case of multilayer NN s,a good way to incorporate knowledge is to tailor its archi-tecture to the task. Convolutional NN s [2], introduced inSection II, are an example of specialized NN architectureswhich incorporate knowledge about the invariances of two-dimensional (2-D) shapes by using local connection patternsand by imposing constraints on the weights. A comparisonof several methods for isolated handwritten digit recogni-tion is presented in Section III. To go from the recognitionof individual characters to the Recognition of words andsentences in documents, the idea of combining multiplemodules trained to reduce the overall error is introducedin Section IV.

Gradient-Based Learning Applied to Document Recognition

Tags:

Information

Transcription of Gradient-Based Learning Applied to Document Recognition

Related search queries

Gradient-Based Learning Applied to Document Recognition

Tags:

Information

Documents from same domain

Related documents

Related search queries