Distributed Representations of Sentences and Documents
Distributed Representations of Sentences and DocumentsQuoc Inc, 1600 Amphitheatre Parkway, Mountain View, CA 94043AbstractMany machine learning algorithms require theinput to be represented as a fixed-length featurevector. When it comes to texts, one of the mostcommon fixed-length features is their popularity, bag-of-words featureshave two major weaknesses: they lose the order-ing of the words and they also ignore semanticsof the words. For example, powerful, strong and Paris are equally distant. In this paper, weproposeParagraph Vector, an unsupervised algo-rithm that learns fixed-length feature representa-tions from variable-length pieces of texts, such assentences, paragraphs, and Documents .
Distributed Representations of Sentences and Documents example, “powerful” and “strong” are close to each other, whereas “powerful” and “Paris” are more distant.
Download Distributed Representations of Sentences and Documents
Information
Domain:
Source:
Link to this page:
Please notify us if you found a problem with this document: