Example: stock market

An Overview of HTK V3 - ukspeech.org.uk

An Overview of HTK Woodland& Cambridge HTK University Engineering DepartmentUK Speech Meeting UEA, 3rd July 2015 Phil Woodland & Cambridge HTK team: An Overview of HTK is HTKIS peech Recognition architectureIHTK Main FeaturesIDeep Neural Network acoustic modelsIRecurrent Neural Network language modelsIHTK for Deep Neural Network acoustic modelsILattice rescoring with recurrent Neural Network language modelsIOverview of key featuresISome recent ASR Systems built with HTKIBOLT Mandarin conversational telephone speechIMGB challenge (multi-genre broadcast data)

Phil Woodland & Cambridge HTK team: An Overview of HTK V3.5 Statistical ASR System I Statistical speech models using context-dependent hidden Markov Models I Decision tree state tying I Gaussian mixture models (or Neural Networks) I probabilities of word sequences (N-gram)

Tags:

  Overview, Context, An overview of htk

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of An Overview of HTK V3 - ukspeech.org.uk

1 An Overview of HTK Woodland& Cambridge HTK University Engineering DepartmentUK Speech Meeting UEA, 3rd July 2015 Phil Woodland & Cambridge HTK team: An Overview of HTK is HTKIS peech Recognition architectureIHTK Main FeaturesIDeep Neural Network acoustic modelsIRecurrent Neural Network language modelsIHTK for Deep Neural Network acoustic modelsILattice rescoring with recurrent Neural Network language modelsIOverview of key featuresISome recent ASR Systems built with HTKIBOLT Mandarin conversational telephone speechIMGB challenge (multi-genre broadcast data)

2 ISummary and PlansCambridge UniversityEngineering DepartmentUK Speech Meeting, UEA, 3rd July 20151 / 31 Phil Woodland & Cambridge HTK team: An Overview of HTK ContributorsIHTK book has authors:Steve Young, Gunnar Evermann, Mark Gales,Thomas Hain, Dan Kershaw, Xunying (Andrew) Liu,Gareth Moore, Julian Odell, Dave Ollason,Dan Povey, Valtcho Valtchev, Phil WoodlandIMajor additions in HTK will be primarily due toIChao Zhang (HTK-ANN extension) IXunying Liu (Language model interface / RNNLM decoding).IAdditional input from Anton Ragni, Kate Knill, Mark Gales,Jeff Chen and many others at Cambridge.

3 See also:C. Zhang & Woodland A General Artificial Neural NetworkExtension for HTK ,To appear, Interspeech 2015 Cambridge UniversityEngineering DepartmentUK Speech Meeting, UEA, 3rd July 20152 / 31 Phil Woodland & Cambridge HTK team: An Overview of HTK OverviewIWhat is HTK?IHidden Markov Model ToolkitIset of tools for training and evaluating HMMs:primarily speech recognition but also speech synthesis (HTS)Iimplementation in ANSI CIapprox 400 page manual tutorial and system build examplesImodular structure simplifies extensionsIHistory (1989-)IInitially developed at Cambridge University (up to ) then Entropic.

4 (up to )ISince 2000 back at Cambridge (V3 onwards)IFree to download from web, more than 100,000 registered usersILatest released version is (in 2009 ..)IUsed extensively for research (& teaching) at CUIB uilt large vocabulary systems for NIST evaluations using UniversityEngineering DepartmentUK Speech Meeting, UEA, 3rd July 20153 / 31 Phil Woodland & Cambridge HTK team: An Overview of HTK ASR SystemIStatistical speech models usingcontext-dependent hiddenMarkov ModelsIDecision tree state tyingIGaussian mixture models (orNeural Networks)Iprobabilities of word sequences(N-gram)IEstimate the models from alarge amount of dataIFind most probable wordsequence using the modelssearch (decoding) problemStandard Approach Create statistical models of speech!

5 Acoustic variations of individual sounds (hidden Markov Models: generator models)! probabilities of word sequences (N-gram)! Estimate the models from a large amount of data! Find most probable word sequence using the models! search (decoding) problem6 Woodland: Speech TranslationSource-ChannelModelsBoth ASR and SMT can be formulated using a Source-Channel - an utteranceAOutput - a transcriptioncWcW=argmaxWP(W|A)=argmaxWP (A|W)P(W)P(A)=argmaxWP(A|W)|{z}AcousticM odelP(W)|{z}SourceLanguage ModelTranslationInput - a foreign sentenceFOutput - an English sentencebEbE=argmaxEP(E|F)=argmaxEP(F|E) P(E)P(F)=argmaxEP(F|E)|{z}TranslationMod elP(E)

6 |{z}SourceLanguage ModelBoth rely on searching for Maximum A Posteriori probability strings using modelsestimated from UniversityEngineering DepartmentEngineering Connections: Machines that make iy k ae nCambridge UniversityEngineering DepartmentUK Speech Meeting, UEA, 3rd July 20154 / 31 Phil Woodland & Cambridge HTK team: An Overview of HTK ArchitectureIHTK includes components for all stages of the speechrecognition processStandard Architecture Typical architecture for training / testAutomatic Speech Recognition7 Speech CorpusTRAININGTRAININGT ranscriptionTranscriptionAcoustic TrainingFeature ExtractionText CorpusNormalizationLanguageModelingAdapt ationText OutputRECOGNITIONRECOGNITIONR ecognitionSearchLanguage ModelSpeech InputLexiconAcoustic ModelsFeature ExtractionAutomatic Speech Recognition7 Speech

7 CorpusTRAININGTRAININGT ranscriptionTranscriptionAcoustic TrainingFeature ExtractionText CorpusNormalizationLanguageModelingAdapt ationText OutputRECOGNITIONRECOGNITIONR ecognitionSearchLanguage ModelSpeech InputLexiconAcoustic ModelsFeature ExtractionAutomatic Speech Recognition7 Speech CorpusTRAININGTRAININGT ranscriptionTranscriptionAcoustic TrainingFeature ExtractionText CorpusNormalizationLanguageModelingAdapt ationText OutputRECOGNITIONRECOGNITIONR ecognitionSearchLanguage ModelSpeech InputLexiconAcoustic ModelsFeature iy k ae nCambridge UniversityEngineering DepartmentUK Speech Meeting, UEA, 3rd July 20155 / 31 Phil Woodland & Cambridge HTK team.

8 An Overview of HTK FeaturesILPC, mel filterbank, MFCC and PLP frontendsIcepstral mean/variance normalisation + vocal tract length discrete and (semi-)continuous HMMsIdiagonal and full covariance modelsIcross-word triphones & decision tree state clusteringI(embedded) Baum-Welch trainingIViterbi recognition and forced-alignmentIsupport for N-grams and finite state grammarsIIncludes N-gram generation tools for large datasetsIN-best and lattice generation/manipulationI(C)MLLR speaker/channel adaptation & adaptive training (SAT)IFrom vocabulary decoder HDecode: separate licenseIDiscriminative training tools, MMI and MPE HMMIRestCambridge UniversityEngineering DepartmentUK Speech Meeting, UEA, 3rd July 20156 / 31 Phil Woodland & Cambridge HTK team.

9 An Overview of HTK ArchitectureIHTK is structured asIa set of librariesIa set of toolsITools have ToolI/OI/OHLML abelsHLabelAPPLICATIONSOFHTK buildsmallHMMsystemsforteaching/tutorial sidealforpractialsandlabexercisesaboutHM Mssmall/mediumvocabspeechrecognitiontrai ninganddevelopmentforLVCSR speakeridentificationHTKFEATURESLPC,MFCC andPLPfrontendssupportsdiscreteand(semi- )continuousHMMscontextdependentcross-wor dtriphonesdecisiontreeclusteringforstate tying(embedded)Baum-WelchtrainingViterbi recognitionandforced-alignmentsupportfor N-gramsandfinitestategrammarsN-bestandla tticegenerationcepstralmean/variancenorm alisationVocalTractLengthNormalisationML LR speaker/channeladaptationTYPICALUSAGE toolsusedindevelopment/evaluationcycle.

10 TranscriptionsSpeechHLEDHLSTATSHSLABHCOP YHLISTHQUANTHCOMPV, HINIT, HRESTHEREST, HSMOOTH, HHEDHMMsNetworksDictionaryHDMANHBUILDHPA RSEHVITET ranscriptionsHRESULTST rainTestAnalyseEXAMPLEHMM sdescribedintextfiles(easytomanipulate)t yingispossibleatmanydifferentlevels:..hm m= potential tie points 11 1c 22 2c ~u~m~vetc{ }s{d }l~w~ds3s2s 1~s{a }ij~t~iStream 1 Stream 2 Stream 3etcexamplemodeldefinition:~o<STREAMINFO>139<VECSIZE>39<PLP_D_A_Z_0> <DIAGC>~h"m"<BEGINHMM> <NUMSTATES>5<STATE>2~s"m_2"<STATE>3<MEAN>39~u"m_mu_1"<VARIANCE>39~v"var_1"<STATE>4~s"m_2"<TRANSP> <ENDHMM>~s"m_2"<NUMMIXES>2<MIXTURE> <MEAN> <VARIANCE> <MIXTURE> <MEAN> etcOllasonsent-end$digit=ONE|TWO|THREE|.


Related search queries