Example: confidence

DMQL: A Data Mining Query Language for …

DMQL: ADataMiningQueryLanguageforRelation alDatabases ,DMQL,forminingdi eldwith ourishingR&Dactivitiesandsuccessfulsyste msreportedrecently[17,5].Sincethetasksan dapplicationsofdataminingarebroadanddive rse,itisexpectedthatvariouskindsof exible, erentgraphicaluserinterfacesincommercial relationalsys-tems,itsunderlying\core"re lationalquerylanguagesetsasolidfoundatio nforresearchanddevelopmentofrelationalsy stems,facilitatesinformationexchangeandt echnologytransfer,andpromotescommerciali zation,broadapplication, ,thesuccessoftherelationalsys-temsshould becreditedinparttothestandardizationofre lationalquerylanguages,whichwasdoneatthe earlystageinthedevelopmentofthe eld[20].

DMQL: A Data Mining Query Language for Relational Databases Jia w ei Han Y ong jian F u W ei W ang Krzysztof Kop erski Osma r Zaiane Database Systems Resea

Tags:

  Database, Language, Mining, Relational, Query, Mining query language for, Mining query language for relational databases

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of DMQL: A Data Mining Query Language for …

1 DMQL: ADataMiningQueryLanguageforRelation alDatabases ,DMQL,forminingdi eldwith ourishingR&Dactivitiesandsuccessfulsyste msreportedrecently[17,5].Sincethetasksan dapplicationsofdataminingarebroadanddive rse,itisexpectedthatvariouskindsof exible, erentgraphicaluserinterfacesincommercial relationalsys-tems,itsunderlying\core"re lationalquerylanguagesetsasolidfoundatio nforresearchanddevelopmentofrelationalsy stems,facilitatesinformationexchangeandt echnologytransfer,andpromotescommerciali zation,broadapplication, ,thesuccessoftherelationalsys-temsshould becreditedinparttothestandardizationofre lationalquerylanguages,whichwasdoneatthe earlystageinthedevelopmentofthe eld[20].

2 Therecentstandardizationactivitiesindata basesystems,suchastheworkrelatedtoSQL-3, OMGandODMG[3],showagaintheimportanceofas tandarddatabase ResearchwassupportedinpartbythegrantNSER C-A3723fromtheNaturalSciencesandEngineer ingResearchCouncilofCanada,thegrantNCE:I RIS/PRECARN-HMI-5fromtheNetworksofCentre sofExcellenceofCanada,andgrantsfromBC/Ad vancedSystemsInstitute,theMPRT eltechLtd., ,fromdatasummarizationtominingassociatio nrules,dataclassi cation,or ndingsomespeci ,therearemanygraphicaluserinterfacesford i ,wefeelthatitisimportanttounderstandtheu nderlyingmechanismsofdi ,weexaminethegeneralphilosophieswhichin uencethedesignofsuchadataminingquerylang uageandpresentstep-by-stepatentativelyde signedDataMiningQueryLanguage.

3 PhilosophyThephilosophyofdataminingmayst ronglyin ,adataminershouldbeabletoworkonanyspeci ,asupersetofthedatacanbecollected, ,adatamininglanguageshouldtakeaquerylang uageasitssubtaskinthespeci , ,sinceminingcanbeperformedinmanydi erentwaysonanyspeci csetofdata,hugeamountsanddi erentkindsofknowledgemaybegeneratedbyung uided,autonomousdiscovery,whereasmuchofs uchdiscoveredknowledgecouldbeoutofuser' ,weproposecommand-drivendatamining,which speci (suchasconceptualhierarchyinformation,et c.) ,theavailabilityofrelativelystrongback-g roundknowledgenotonlyimprovesthee ciencyofadiscoveryprocessbutalsoexpresse suser'spref-erenceforguidedgeneralizatio n,whichmayleadtoane ,discoveredknowl-edgeisexpressedintermso fprimitivedata(datastoredinthedatabases)

4 ,oftenintheformoffunc-tionalormultivalue ddependencyrules,primitivelevelassociati onrules, ,withconceptgeneralization,discov-eredkn owledgecanbeexpressedintermsofconcise,ex pressive,andhigh-levelormultiple-levelab strac-tion,intheformofgeneralizedrulesor generalizedconstraints, , ed exiblyto exiblyand/orinteractivelyspecifyvariousk indsofthresholdswhichcanbeusedtoselectde sired,interestingrulesand erentkindsofrulesBasedontheaboveconsider ations,adataminingquerylanguage,DMQL, cationsoffourmajorprimitivesindatamining :(1)thesetofdatainrelevancetoadatamining process,(2)thekindofknowledgetobediscove red,(3)thebackgroundknowledge,and(4)thej usti cationoftheinterestingnessoftheknowledge ( ,thresholds).

5 The rstprimitive,thesetofrelevantdata,canbes peci edinawaysimilartothatofarelationalquery, ,thekindofknowledgetobediscovered,mayinc ludegeneralizedrelations,charac-teristic rules,discriminantrules,classi cationrules,as-sociationrules,etc., erentkindsofrulesorbeviewedathighconcept levelsfromdi edbyallormostoftheexamplesintheclassunde rgoingexamination(calledthetar-getclass) .Forexample,thesymptomsofaspeci (thetargetclass)fromotherclasses(calledc ontrastingclasses).Forexample,todistingu ishonediseasefromothers, cationruleisasetofruleswhichclassi estherelevantsetofdata,whichisusuallyobt ainedby rstclassifyingthedata( ,obtainingapreferredclassi cationscheme) , (patterns).

6 Forexample, ,thebackgroundknowledge, ,theinterestingnessorsigni -canceoftheknowledgetobediscoveredcanbes peci edasasetofdi erentminingthresholdsdependingonthekinds ofrulestobemined, erentkindsofrulesDMQL adoptsanSQL-likesyntaxtofacilitatehighle veldataminingandnaturalintegrationwithre lationalquerylanguage, nedinanextendedBNFgrammar,where\[]"repre sents0oroneoccurrence,\fg"represents0orm oreoccurrences,andwordsinsansseriffontre presentkeywords, ::=usedatabasehdatabasenameifusehierarch yhhierarchynameiforhattributeighrulespec irelatedtohattroragglistifromhrelation(s )i[wherehconditioni][orderbyhorderlisti] fwith[hkindsofi]threshold=hthresholdvalu ei[forhattribute(s)i]gInhDMQLi,\usedatab asehdatabasenamei"directstheminingtaskto aspeci cdatabase\hdatabasenamei",andtheoptional statement,\usehierarchyhhierarchyiforhat tributei",assignshhierarchyitoaparticula rat-tributehattributei(otherwise,adefaul thierarchyisused).

7 Thestatement,hrulespeci,isthespeci ,therulespeci cationshouldbeindi erentformats, ,\relatedtohattoragglisti", \from"and\where"clauses,\fromhrelation(s )i[wherehconditioni], \orderby"clausesimplyspeci \with-threshold"statementspeci cation, ::=generalizedata[intohrelationnamei] ::= ndcharacteristicrules[ashrulenamei] ::= nddiscriminantrules[ashrulenamei]forhcla ss1iwithhcondition1ifromhrelation(s)1iin contrasttohclass2iwithhcondition2ifromhr elation(s)2ifincontrasttohclassiiwithhco nditioniifromrelation(s) cationandminingclassi ::= ndclassi cationrules[ashrulenamei][accordingtohat tributesi] ::= ndassociationrules[ashrulenamei] cationofinterestingnessandthresholdsAdat aminingtaskmayneedtospecifyasetofthresho ldstocontrolitsdataminingprocess,includi ngguidinganinductionprocess,constraining searchforinterestingknowledge,testingthe interestingnessorsigni canceofthediscoveredknowledge, ,asetofdataminingthresholds, erentkindsofruleminingmayneedtospecifydi erentkindsofthresholdswhichcanbecategori zedintoatleastthreeclasses, (support) ,thisthresholdiscalledtheminimumsupport[ 1],andthepatternspassingthissupportthres holdarecalledlarge(orfrequent)dataitems; whereasinminingcharacteristicrules,itisc allednoisethreshold[7], !

8 B, ,ProbfBjAg,mustpassthisthresholdtomakesu rethattheimplicationrelationshipisreason ablystrong[1]. [19].Thesyntaxofthethresholdspeci cationisasbelow,with[hkindsofi]threshold =hthresholdvaluei[forhattribute(s)i]wher ehkindsoficanbesupport,con dence,noise,redundancy,etc., (name,sno,status,major,gpa,birthdate,bir thplace,address)course(cno,title,departm ent)grading(sno,cno,instructor,semester, grade) \usedatabaseuniversitydatabase" ,(q1),isto ndthegeneralcharacteristicsofthegraduate studentsincomputingscienceinrelevancetoa ttributesgpa,birthplaceandaddress,forthe studentsborninCanada.(q1): ndcharacteristicrulerelatedtogpa,birthpl ace,address,count(*)%fromstudentwheresta tus=\graduate"andmajor=\cs"andbirthplace =\Canada"withnoisethreshold= rstretrievedatafromthedatabaseusingatran sformedSQLquery,wherethehighlevelconstan ts\Canada"and\graduate"aretransformedint olowlevelprimitiveconceptsinthedatabasea ccordingtotheprovided(default) ndingcharacteristicrules[7] ,birthplaceandaddressarepre-sented,assoc iatedwiththecorrespondingcount( )%( ,thecountoftuplesinthecorrespondinggroup inproportiontothetotalnumberoftuples).

9 (q2)isto ndthediscriminantfeaturestocomparegradua testudentsversusundergraduatestudentsinc omputingscienceinrelevancetoattributesgp a,birthplaceandad-dress,forthestudentsbo rninCanada.(q2): nddiscriminantruleforcsgradswithstatus=\ graduate"incontrasttocsundergradswithsta tus=\undergraduate"relatedtogpa,birthpla ce,address,count(*)%fromstudentwheremajo r=\cs"andbirthplace=\Canada"Thisdatamini ngquerywill rstretrievedataintotwoclasses,\csgrads"a nd\csundergrads",usingatransformedSQLque rywhichmapsthehighlevelconstantsin(q2) ndingdiscriminantrules[7] (q3)istoclassifystudentsaccordingtotheir gpa'sand ndtheirclassi cationrulesforthosemajoringincomputingsc ienceandborninCanada,withtheattributesbi rthplaceandaddressinconsideration.

10 (q3): ndclassi cationrulesforcsstudentsaccordingtogpare latedtobirthplace,addressfromstudentwher emajor=\cs"andbirthplace=\Canada"Thisque rywill rstcollecttherelevantsetofdata,andthenex ecutesomedataclassi cationalgorithm,suchas[14,21]toclassifys tudentsaccordingtotheirgpa' (q4)isto ndstrongassociationrelationshipsforthose studentsmajoringincomputingscienceandbor ninCanada,inrelevancetotheattributesgpa, birthplaceandaddress.(q4): ndassociationrulesrelatedtogpa,birthplac e,addressfromstudentwheremajor=\cs"andbi rthplace=\Canada"withsupportthreshold= dencethreshold= rstcollecttherelevantsetofdataandthenexe cuteanassociationminingalgorithm,suchas[ 1]or[8],to dencethresholdsarespeci ed(otherwiseusingdefaultvalues) cationofconcepthierarchiesConcepthierarc hy(orlattice)providesusefulback-groundkn owledgeforexpressingdataminingresultsinc oncise, edbasedondatabaseattributerelationships, particulargroupingoperations, , ,address(num;street;city;province;countr y) ,date(day;month.)