Transcription of Latent Dirichlet Allocation
{{id}} {{{paragraph}}}
JournalofMachineLearningResearch3 CA94720,USAA ndrewY. UniversityStanford, CA94305,USAM ichaelI. CA94720,USAE ditor:JohnLaffertyAbstractWe describelatentDirichletallocation(LDA),a generative is a three-level hierarchicalBayesianmodel,inwhicheachite mofa collectionis modeledasa finitemixtureover ,inturn, ,thetopicprobabilitiesprovideanexplicitr epresentationofa reportresultsindocumentmodeling,textclas sification,andcollaborative filtering,comparingtoa collectionthatenableefficientprocessingo flargecollectionswhilepreservingtheessen tialstatisticalrelationshipsthatareusefu lforbasictaskssuchasclassification,novel tydetection,summarization, (IR)(Baeza-YatesandRibeiro-Neto,1999).Th ebasicmethodologyproposedbyIRresearchers fortextcorpora amethodologysuccessfullydeployedinmodern Internetsearchengines reduceseachdocumentinthecorpustoa vectorofrealnumbers, (SaltonandMcGill,1983),a basicvocabularyof words or terms is chosen,and,foreachdocumentinthecorpus,a countis ,thistermfrequency countiscomparedtoaninversedocumentfreque nc
3. Latent Dirichlet allocation Latent Dirichlet allocation (LDA) is a generative probabilistic model of a corpus. The basic idea is that documents are represented as random mixtures over latent topics, where each topic is charac-terized by a distribution over words.1 LDA assumes the following generative process for each document w in a corpus D: 1.
Domain:
Source:
Link to this page:
Please notify us if you found a problem with this document:
{{id}} {{{paragraph}}}