Transcription of Latent Dirichlet Allocation
{{id}} {{{paragraph}}}
JournalofMachineLearningResearch3 CA94720,USAA ndrewY. UniversityStanford, CA94305,USAM ichaelI. CA94720,USAE ditor:JohnLaffertyAbstractWe describelatentDirichletallocation(LDA),a generative is a three-level hierarchicalBayesianmodel,inwhicheachite mofa collectionis modeledasa finitemixtureover ,inturn, ,thetopicprobabilitiesprovideanexplicitr epresentationofa reportresultsindocumentmodeling,textclas sification,andcollaborative filtering,comparingtoa collectionthatenableefficientprocessingo flargecollectionswhilepreservingtheessen tialstatisticalrelationshipsthatareusefu lforbasictaskssuchasclassification,novel tydetection,summarization, (IR)(Baeza-YatesandRibeiro-Neto,1999).Th ebasicmethodologyproposedbyIRresearchers fortextcorpora amethodologysuccessfullydeployedinmodern Internetsearchengines reduceseachdocumentinthecorpustoa vectorofrealnumbers, (SaltonandMcGill,1983),a basicvocabularyof words or terms is chosen,and,foreachdocumentinthecorpus,a countis ,thistermfrequency countiscomparedtoaninversedocumentfreque ncy count,whichmeasuresthenumberofoccurrence sofac ,Andrew Y.
Andrew Y. Ng ANG@CS.STANFORD.EDU Computer Science Department Stanford University Stanford, CA 94305, USA Michael I. Jordan JORDAN@CS.BERKELEY.EDU Computer Science Division and Department of Statistics University of California Berkeley, CA 94720, USA Editor: John Lafferty Abstract
Domain:
Source:
Link to this page:
Please notify us if you found a problem with this document:
{{id}} {{{paragraph}}}