Transcription of Supervised Sequence Labelling with Recurrent Neural …
1 Technische Universit at M unchenFakult at f ur InformatikLehrstuhl VI: Echtzeitsysteme und RobotikSupervised Sequence Labellingwith Recurrent Neural NetworksAlex GravesVollst andiger Abdruck der von der Fakult at f ur Informatik der TechnischenUniversit at M unchen zur Erlangung des akademischen Grades einesDoktors der Naturwissenschaften (Dr. rer. nat.)genehmigten Neural networks are powerful Sequence learners. They are ableto incorporate context information in a flexible way, and are robust to lo-calised distortions of the input data. These properties make them well suitedto Sequence Labelling , where input sequences are transcribed with streams short-term memoryis an especially promising Recurrent archi-tecture, able to bridge long time delays between relevant input and outputevents, and thereby access long range context.
2 The aim of this thesis is toadvance the state-of-the-art in Supervised Sequence Labelling with recurrentnetworks in general, and long short-term memory in particular. Its twomain contributions are (1) a new type of output layer that allows recurrentnetworks to be trained directly for Sequence Labelling tasks where the align-ment between the inputs and the labels is unknown, and (2) an extensionof long short-term memory to multidimensional data, such as images andvideo sequences. Experimental results are presented on speech recognition,online and offline handwriting recognition, keyword spotting, image segmen-tation and image classification, demonstrating the advantages of advancedrecurrent networks over other sequential algorithms, such as hidden would like to thank my supervisor J urgen Schmidhuber for his guidanceand support.
3 I would also like to thank my co-authors Santi, Tino, Nicoleand Doug, and everyone else at IDSIA for making it a stimulating andcreative place to work. Thanks to Tom Schaul for proofreading the thesis,and Marcus Hutter for his mathematical assistance during the connectionisttemporal classification chapter. I am grateful to Marcus Liwicki and HorstBunke for their expert collaboration on handwriting recognition. A specialmention goes to Fred and Matteo and all the other Idsiani who helped mefind the good times in Lugano. Most of all, I would like to thank my familyand my wife Alison for their constant encouragement, love and research was supported in part by the Swiss National Foundation,under grants 200020-100249, 200020-107534/1 and 200021-111968 of TablesviiList of FiguresviiiList of Algorithmsx1 Contributions.
4 Overview of Thesis ..32 Supervised Sequence Supervised Learning .. Pattern Classification .. Classification .. Probabilistic Classifiers .. and Discriminative Models .. Sequence Labelling .. Taxonomy of Sequence Labelling Tasks .. Classification .. Classification .. Classification .. 123 Neural Multilayer Perceptrons .. Pass .. Layers .. Functions .. Pass .. Recurrent Neural Networks .. Pass .. Pass .. RNNs .. Jacobian .. Network Training .. Descent Algorithms .. Representation .. Initialisation.
5 314 Long Short-Term The LSTM Architecture .. Influence of Preprocessing .. Gradient Calculation .. Architectural Enhancements .. LSTM Equations .. Pass .. Pass .. 395 Framewise Phoneme Experimental Setup .. Network Architectures .. Complexity .. of Context .. Layers .. Network Training .. Results .. with Previous Work .. of Increased Context .. Error .. 486 Hidden Markov Model Background .. Experiment: Phoneme Recognition .. Setup .. 537 Connectionist Temporal Motivation.
6 From Outputs to Labellings .. of the Blank Labels .. CTC Forward-Backward Algorithm .. Scale .. CTC Objective Function .. Decoding .. Path Decoding .. Search Decoding .. Decoding .. Experiments .. Recognition .. Recognition with Reduced Label Set .. Spotting .. Handwriting Recognition .. Handwriting Recognition .. Discussion .. 808 Multidimensional Recurrent Background .. The MDRNN architecture .. MDRNNs .. Long Short-Term Memory .. Experiments .. Freight Data .. Data.
7 989 Conclusions and Future Work100 Bibliography102 List of Phoneme classification error rate on TIMIT .. Comparison of BLSTM with previous Neural network Phoneme error rate on TIMIT .. Phoneme error rate on TIMIT .. Folding the 61 phonemes in TIMIT onto 39 categories .. Reduced phoneme error rate on TIMIT .. Keyword error rate on Verbmobil .. CTC Character error rate on IAM-OnDB .. Word error rate on IAM-OnDB .. Word error rate on IAM-DB .. Image error rate on MNIST .. 98viiList of Sequence Labelling .. Taxonomy of Sequence Labelling tasks.
8 Importance of context in segment classification .. Multilayer perceptron .. Neural network activation functions .. Recurrent Neural network .. Standard and bidirectional RNNs .. Sequential Jacobian for a bidirectional RNN .. Overfitting on training data .. Vanishing gradient problem for RNNs .. LSTM memory block with one cell .. Preservation of gradient information by LSTM .. Various networks classifying an excerpt from TIMIT .. Framewise phoneme classification results on TIMIT .. Learning curves on TIMIT .. BLSTM network classifying the utterance one oh five.
9 CTC and framewise classification .. CTC forward-backward algorithm .. Evolution of the CTC error signal during training .. Problem with best path decoding .. Prefix search decoding .. CTC outputs for keyword spotting on Verbmobil .. Sequential Jacobian for keyword spotting on Verbmobil .. CTC network Labelling an excerpt from IAM-OnDB .. CTC Sequential Jacobian from IAM-OnDB .. CTC Sequential Jacobian from IAM-OnDB .. 2D RNN forward pass .. 2D RNN backward pass .. Sequence ordering of 2D data .. 89viiiLIST OF Context available to a 2D RNN with a single hidden layer.
10 Axes used by the hidden layers in a multidirectional 2D RNN Context available to a multidirectional 2D RNN .. Frame from the Air Freight database .. MNIST image before and after deformation .. 2D RNN applied to an image from the Air Freight database . Sequential Jacobian of a 2D RNN for an image from MNIST99 List of BRNN Forward Pass .. BRNN Backward Pass .. Prefix Search Decoding Algorithm.. CTC Token Passing Algorithm .. MDRNN Forward Pass .. MDRNN Backward Pass .. Multidirectional MDRNN Forward Pass .. Multidirectional MDRNN Backward Pass.