Introduction to Digital Speech Processing

The essence of knowledgeFnTSIG 1:1-2 Introduction to Digital Speech ProcessingLawrence R. Rabiner and Ronald W. SchaferIntroduction to Digital Speech ProcessingLawrence R. Rabiner and Ronald W. SchaferIntroduction to Digital Speech Processinghighlights the central role of DSP techniques inmodern Speech communication research and applications. It presents a comprehensiveoverview of Digital Speech Processing that ranges from the basic nature of the Speech signal ,through a variety of methods of representing Speech in Digital form, to applications in voicecommunication and automatic synthesis and recognition of to Digital Speech Processing provides the reader with a practical Introduction tothe wide range of important concepts that comprise the field of Digital Speech serves as an invaluable reference for students embarking on Speech research as well asthe experienced researcher already working in the field, who can utilize the book as areference book is originally published asFoundations and Trends in signal ProcessingVolume 1 Issue 1-2 (2007), ISSN: and Trends inSignal Processing1:1-2 (2007) Introduction to DigitalSpeech ProcessingLawrence R.

Rabiner and Ronald W. 11/20/2007 3:02 PM Page 1 Introduction to DigitalSpeech ProcessingIntroduction to DigitalSpeech ProcessingLawrence R. RabinerRutgers University and University of CaliforniaSanta W. SchaferHewlett-Packard LaboratoriesPalo Alto, CAUSAB oston DelftFoundations and TrendsR inSignal ProcessingPublished, sold and distributed by:now Publishers Box 1024 Hanover, MA 02339 USATel. North America:now Publishers Box 1792600 AD DelftThe NetherlandsTel. +31-6-51115274 The preferred citation for this publication is L. R. Rabiner and R. W. Schafer, Intro-duction to Digital Speech Processing , Foundations and TrendsR in signal Process-ing, vol 1, no 1 2, pp 1 194, 2007 ISBN: 978-1-60198-070-0c 2007 L. R. Rabiner and R. W. SchaferAll rights reserved. No part of this publication may be reproduced, stored in a retrievalsystem, or transmitted in any form or by any means, mechanical, photocopying, recordingor otherwise, without prior written permission of the In the USA: This journal is registered at the Copyright Clearance Cen-ter, Inc.

, 222 Rosewood Drive, Danvers, MA 01923. Authorization to photocopy items forinternal or personal use, or the internal or personal use of specific clients, is granted bynow Publishers Inc for users registered with the Copyright Clearance Center (CCC). The services for users can be found on the internet at: those organizations that have been granted a photocopy license, a separate systemof payment has been arranged. Authorization does not extend to other kinds of copy-ing, such as that for general distribution, for advertising or promotional purposes, forcreating new collective works, or for resale. In the rest of the world: Permission to pho-tocopy must be obtained from the copyright owner. Please apply to now Publishers Inc.,PO Box 1024, Hanover, MA 02339, USA; Tel. +1-781-871-0245; Publishers Inc. has an exclusive license to publish this material worldwide. Permissionto use this content must be obtained from the copyright license holder.

Please apply to nowPublishers, PO Box 179, 2600 AD Delft, The Netherlands, ; and TrendsR inSignal ProcessingVolume 1 Issue 1 2, 2007 Editorial BoardEditor-in-Chief:Robert M. GrayDept of Electrical EngineeringStanford University350 Serra MallStanford, CA Alwan (UCLA)John Apostolopoulos (HP Labs)Pamela Cosman (UCSD)Michelle Effros (California Instituteof Technology)Yonina Eldar (Technion)Yariv Ephraim (George MasonUniversity)Sadaoki Furui (Tokyo Instituteof Technology)Vivek Goyal (MIT)Sinan Gunturk (Courant Institute)Christine Guillemot (IRISA)Sheila Hemami (Cornell)Lina Karam (Arizona StateUniversity)Nick Kingsbury (CambridgeUniversity)Alex Kot (Nanyang TechnicalUniversity)Jelena Kovacevic (CMU) Manjunath (UCSB)Urbashi Mitra (USC)Thrasos Pappas (NorthwesternUniversity)Mihaela van der Shaar (UCLA)Luis Torres (Technical Universityof Catalonia)Michael Unser (EPFL) Vaidyanathan (CaliforniaInstitute of Technology)Rabab Ward (Universityof British Columbia)Susie Wee (HP Labs)Clifford J.

Weinstein (MIT LincolnLaboratories)Min Wu (University of Maryland)Josiane Zerubia (INRIA)Editorial ScopeFoundations and TrendsR in signal Processingwill publish sur-vey and tutorial articles on the foundations, algorithms, methods, andapplications of signal Processing including the following topics: Adaptive signal Processing Audio signal Processing Biological and biomedical signalprocessing Complexity in signal Processing Digital and multirate signalprocessing Distributed and network signalprocessing Image and video Processing Linear and nonlinear filtering Multidimensional signal Processing Multimodal signal Processing Multiresolution signal Processing Nonlinear signal Processing Randomized algorithms in signalprocessing Sensor and multiple source signalprocessing, source separation signal decompositions, subbandand transform methods, sparserepresentations signal Processing forcommunications signal Processing for security andforensic analysis, biometric signalprocessing signal quantization, sampling,analog-to- Digital conversion,coding and compression signal reconstruction, Digital -to-analog conversion,enhancement.

Decoding andinverse problems Speech /audio/image/videocompression Speech and spoken languageprocessing Statistical/machine learning Statistical signal Processing Classification and detection Estimation and regression Tree-structured methodsInformation for LibrariansFoundations and TrendsR in signal Processing , 2007, Volume 1, 4 issues. ISSN paper version 1932-8346. ISSN online version 1932-8354. Also available as acombined paper and online and TrendsR inSignal ProcessingVol. 1, Nos. 1 2 (2007) 1 194c 2007 L. R. Rabiner and R. W. SchaferDOI: to Digital Speech ProcessingLawrence R. Rabiner1and Ronald W. Schafer21 Rutgers University and University of California, Santa Barbara, Laboratories, Palo Alto, CA, USAA bstractSince even before the time of Alexander Graham Bell s revolution-ary invention, engineers and scientists have studied the phenomenonof Speech communication with an eye on creating more efficient andeffective systems of human-to-human and human-to-machine communi-cation.

Starting in the 1960s, Digital signal Processing (DSP), assumeda central role in Speech studies, and today DSP is the key to realizingthe fruits of the knowledge that has been gained through decades ofresearch. Concomitant advances in integrated circuit technology andcomputer architecture have aligned to create a technological environ-ment with virtually limitless opportunities for innovation in speechcommunication applications. In this text, we highlight the central roleof DSP techniques in modern Speech communication research and appli-cations. We present a comprehensive overview of Digital Speech process-ing that ranges from the basic nature of the Speech signal , through avariety of methods of representing Speech in Digital form, to applica-tions in voice communication and automatic synthesis and recognitionof Speech . The breadth of this subject does not allow us to discuss anyaspect of Speech Processing to great depth; hence our goal is to pro-vide a useful Introduction to the wide range of important concepts thatcomprise the field of Digital Speech Processing .

A more comprehensivetreatment will appear in the forthcoming book,Theory and Applicationof Digital Speech Processing [101].Contents1 The Speech Applications of Digital Speech Our Goal for this Text142 The Speech Phonetic Representation of Models for Speech More Refined Models233 Hearing and Auditory The Human Perception of Critical Pitch Auditory Complete Model of Auditory Processing324 Short-Time Analysis of Short-Time Energy and Zero-Crossing Short-Time Autocorrelation Function (STACF) Short-Time Fourier Transform (STFT) Sampling the STFT in Time and The Speech Relation of STFT to Short-Time Fourier Short-Time Analysis is Fundamental to our Thinking535 Homomorphic Speech Definition of the Cepstrum and Complex The Short-Time Computation of the Short-Time Homomorphic Filtering of Application to Pitch Applications to Pattern The Role of the Cepstrum726 Linear Predictive Linear Prediction and the Speech Computing the Prediction The Levinson Durbin LPC Equivalent The Role of Linear Prediction967 Digital Speech Sampling and Quantization of Speech (PCM) Digital Speech Closed-Loop Open-Loop Frequency-Domain Evaluation of Coders1368 Text-to- Speech Synthesis Text Evolution of Speech Synthesis Unit Selection TTS TTS Future Needs1609 Automatic Speech Recognition (ASR)

The Problem of Automatic Speech Building a Speech Recognition The Decision Processes in Representative Recognition Challenges in ASR Technology183 Conclusion185 Acknowledgments187 References189 Supplemental References1971 IntroductionThe fundamental purpose of Speech is communication, , the trans-mission of messages. According to Shannon s information theory [116],a message represented as a sequence of discrete symbols can be quanti-fied by itsinformation contentin bits, and the rate of transmission ofinformation is measured in bits/second (bps). In Speech production, aswell as in many human-engineered electronic communication systems,the information to be transmitted is encoded in the form of a contin-uously varying (analog) waveform that can be transmitted, recorded,manipulated, and ultimately decoded by a human listener. In the caseof Speech , the fundamental analog form of the message is an acous-tic waveform, which we call thespeech signal .

Speech signals, as illus-trated in Figure , can be converted to an electrical waveform bya microphone, further manipulated by both analog and Digital signalprocessing, and then converted back to acoustic form by a loudspeaker,a telephone handset or headphone, as desired. This form of Speech pro-cessing is, of course, the basis for Bell s telephone invention as well astoday s multitude of devices for recording, transmitting, and manip-ulating Speech and audio signals. Although Bell made his inventionwithout knowing the fundamentals of information theory, these ideas12 IntroductionFig. A Speech waveform with phonetic labels for the text message Should we chase. have assumed great importance in the design of sophisticated moderncommunications systems. Therefore, even though our main focus willbe mostly on the Speech waveform and its representation in the form ofparametric models, it is nevertheless useful to begin with a discussionof how information is encoded in the Speech The Speech ChainFigure shows the complete process of producing and perceivingspeech from the formulation of a message in the brain of a talker, tothe creation of the Speech signal , and finally to the understanding ofthe message by a listener.

Introduction to Digital Speech Processing

Tags:

Information

Advertisement

Transcription of Introduction to Digital Speech Processing

Related search queries

Introduction to Digital Speech Processing

Tags:

Information

Advertisement

Documents from same domain

Related documents

Related search queries