Transcription of Distributed Representations of Sentences and Documents
{{id}} {{{paragraph}}}
Distributed Representations of Sentences and DocumentsQuoc Inc, 1600 Amphitheatre Parkway, Mountain View, CA 94043 AbstractMany machine learning algorithms require theinput to be represented as a fixed-length featurevector. When it comes to texts, one of the mostcommon fixed-length features is their popularity, bag-of-words featureshave two major weaknesses: they lose the order-ing of the words and they also ignore semanticsof the words. For example, powerful, strong and Paris are equally distant. In this paper, weproposeParagraph vector , an unsupervised algo-rithm that learns fixed-length feature representa-tions from variable-length pieces of texts, such assentences, paragraphs, and Documents . Our algo-rithm represents each document by a dense vec-tor which is trained to predict words in the doc-ument.
unique vector, represented by a column in matrix W. The paragraph vector and word vectors are averaged or concate-nated to predict the next word in a context. In the experi-ments, we use concatenation as the method to combine the vectors. More formally, the only change in this model compared to the word vector framework is in equation 1, where h is
Domain:
Source:
Link to this page:
Please notify us if you found a problem with this document:
{{id}} {{{paragraph}}}