Multimodal Deep Learning

Multimodal deep Learning Jiquan Ngiam1 Aditya Khosla1 Mingyu Kim1 Juhan Nam1 Honglak Lee2 Andrew Y. Ng1 1. Computer Science Department, Stanford University, Stanford, CA 94305, USA. 2. Computer Science and Engineering Division, University of Michigan, Ann Arbor, MI 48109, USA. Abstract mation on the place of articulation and muscle move- ments (Summerfield, 1992) which can often help to dis- deep networks have been successfully applied ambiguate between speech with similar acoustics ( , to unsupervised feature Learning for single the unvoiced consonants /p/ and /k/ ). modalities ( , text, images or audio). In this work, we propose a novel application of Multimodal Learning involves relating information deep networks to learn features over multiple from multiple sources. For example, images and 3-d modalities.

To train a multimodal model, a direct approach is to train a RBM over the concatenated audio and video data (Figure 2c). While this approach jointly mod-els the distribution of the audio and video data, it is limited as a shallow model. In particular, since the cor-relations between the audio and video data are highly

Fullscreen Download

Tags:

Model, Approach, Learning, Deep, Multimodal, Multimodal deep learning

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Spam in document Broken preview Other abuse

Transcription of Multimodal Deep Learning

Related search queries

Approach, Con guration, Model, Based

PDF4PRO ^⚡AMP

Modern search engine that looking for books and documents around the web

Multimodal Deep Learning

Tags:

Information

Transcription of Multimodal Deep Learning

Related search queries

Multimodal Deep Learning

Tags:

Information

Documents from same domain

Related documents

Related search queries