Example: air traffic controller

Hung-yi Lee 李宏毅 - 國立臺灣大學

Hung-yi Lee 1 Sophisticated Input Input is a vector Input is a set of vectorsModelScalar or ClassModelScalars or Classes(may change length)2 Vector Set as Input this is a cat dogcatrabbitjumprunflowertreeapple = [ 1 0 0 0 0 .. ]bag = [ 0 1 0 0 0 .. ]cat = [ 0 0 1 0 0 .. ]dog = [ 0 0 0 1 0 .. ]elephant = [ 0 0 0 0 1 .. ]One-hot EncodingWord EmbeddingTo learn more: (in Mandarin)3 Vector Set as Input 10ms25ms400 sample points (16 KHz)39-dim MFCC80-dim filter bank outputframe1s 100 frames4 Vector Set as Input Graph is also a set of vectors (consider each nodeas a vector) profile is a vector5 Vector Set as Input Graph is also a set of vectors (consider each nodeas a vector) vectorH = [ 1 0 0 0 0.]

Vector Set as Input 10ms 25ms 400 sample points (16KHz) 39-dim MFCC 80-dim filter bank output frame 1s →100 frames 4

Tags:

  Hung

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Hung-yi Lee 李宏毅 - 國立臺灣大學

1 Hung-yi Lee 1 Sophisticated Input Input is a vector Input is a set of vectorsModelScalar or ClassModelScalars or Classes(may change length)2 Vector Set as Input this is a cat dogcatrabbitjumprunflowertreeapple = [ 1 0 0 0 0 .. ]bag = [ 0 1 0 0 0 .. ]cat = [ 0 0 1 0 0 .. ]dog = [ 0 0 0 1 0 .. ]elephant = [ 0 0 0 0 1 .. ]One-hot EncodingWord EmbeddingTo learn more: (in Mandarin)3 Vector Set as Input 10ms25ms400 sample points (16 KHz)39-dim MFCC80-dim filter bank outputframe1s 100 frames4 Vector Set as Input Graph is also a set of vectors (consider each nodeas a vector) profile is a vector5 Vector Set as Input Graph is also a set of vectors (consider each nodeas a vector) vectorH = [ 1 0 0 0 0.]

2 ]C = [ 0 1 0 0 0 .. ]O = [ 0 0 1 0 0 .. ]..6 ModelWhat is the output? Each vector has a label. NNI saw a sawNVDETN aabbHW2buybuynotExample ApplicationsPOS tagging7 ModelWhat is the output? Each vector has a label. The whole sequence has a label. NNModelthis is goodpositivespeakerHW4hydrophilicityExam ple Applications8 Sentiment analysisModelWhat is the output? Each vector has a label. NN Model decides the number of labels ModelTranslation (HW5) The whole sequence has a label. Modelseq2seqfocus of this lecture9 Sequence Labeling FCFCFCFCIs it possible to consider the context?IsawasawFCFully-connectedFC can consider the neighbor How to consider the whole sequence? windowa window covers the whole sequence?

3 10 FCFCFCFCSelf-attentionwith contextSelf-attention11 FCFCFCFCSelf-attentionSelf-attentionFCFC FCFCA ttention is all you Can be either inputor a hidden layer13 Self-attention relevant? Find the relevant vectors in a sequence 14 Self-attention Dot-product = Additive + 15 Self-attention = = = = 1,2 1,3 1,4 1,2= querykeyattention score 1,3= 1,4= 16 Self-attention = = = = 1,2 1,3 1,4 = 1,1 Soft-max 1,1 1,2 1,3 1,4 1, = 1, / 1, 17 Self-attention 1,1 1,2 1,3 1,4 = 1, Extract information based on attention scores = = = = 18 Self-attention Can be either inputor a hidden layerparallel19 Self-attention 2,1 2,2 2,3 2,4 = 2.

4 20 Self-attention = = = = = = III 21 Self-attention 1,1= 1,2= 1,3= 1,4= = 1,1 1,2 1,3 1,4 1,1 1,2 1,3 1,422 Self-attention = 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 3,1 3,2 3,3 3,4 4,1 4,2 4,3 4,4 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 3,1 3,2 3,3 3,4 4,1 4,2 4,3 4,4 1,1= 1,2= 1,3= 1,4= = 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,423softmax 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 3,1 3,2 3,3 3,4 4,1 4,2 4,3 4,4 Self-attention 1 1 1 2 2 2 3 3 3 4 4 4 = O 24 1,1 1,2 1,3 1,4 Self-attention= = = QKV QAA A V=IO=IIAttention MatrixParameters to be learned 25 Multi-head Self-attention (2 heads as example) , , , , , , , , , , , , , = , = ,1 , = ,2 26 Different types of relevance Multi-head Self-attention , , , , , , , , , , , , , , = , = ,1 , = ,2 27(2 heads as example)Different types of relevance Multi-head Self-attention , , = , , , , , , , , , , , , = 28(2 heads as example)

5 Different types of relevance Positional Encoding No position information in self-attention. Each position has a unique positional vector hand-crafted learned from data +Each column represents a positional vector -1129 applications ..Transformer used in Natural Langue Processing (NLP)!31 Self-attention for SpeechTruncated Self-attention Attention in a rangeSpeech is a very long vector Attention MatrixIf input sequence is length for ImageSource of image: is a imagecan also be considered as a vector GANDE tectionTransformer(DETR) CNN3510000101001000110010001001001000101 0100001010010001100100010010010001010100 001010010001100100010010010001010 CNN: self-attention that can only attends in a receptive fieldSelf-attention: CNN with learnable receptive field CNN is simplified self-attention.

6 Self-attention is the complex version of CNNOn the Relationship between Self-Attention and Convolutional Image is Worth 16x16 Words: Transformers for Image Recognition at ScaleSelf-attentionCNNGood for less dataGood for more data37 Self-attention RNNRNNFCRNNFCRNNFCRNNFC parallelnonparallelhard to considereasy to are RNNs: Fast Autoregressive Transformers with Linear AttentionSelf-attentionmemory38 Recurrent Neural Network (RNN)To learn more about RNN ..39 (in Mandarin) (in English)Self-attention for GraphConsider edge: only attention to connected nodesAttention Matrix182354671234567812345678 This is one type of Graph Neural Network (GNN).400 Self-attention for Graph To learn more about GNN.

7 (in Mandarin)(in Mandarin)41To Learn More .. Transformers: A SurveyLong Range Arena: A Benchmark for Efficient


Related search queries