Example: biology

DMQL: A Data Mining Query Language for …

DMQL: ADataMiningQueryLanguageforRelation alDatabases ,DMQL,forminingdi eldwith ourishingR&Dactivitiesandsuccessfulsyste msreportedrecently[17,5].Sincethetasksan dapplicationsofdataminingarebroadanddive rse,itisexpectedthatvariouskindsof exible, erentgraphicaluserinterfacesincommercial relationalsys-tems,itsunderlying\core"re lationalquerylanguagesetsasolidfoundatio nforresearchanddevelopmentofrelationalsy stems,facilitatesinformationexchangeandt echnologytransfer,andpromotescommerciali zation,broadapplication, ,thesuccessoftherelationalsys-temsshould becreditedinparttothestandardizationofre lationalquerylanguages,whichwasdoneatthe earlystageinthedevelopmentofthe eld[20].Therecentstandardizationactiviti esindatabasesystems,suchastheworkrelated toSQL-3,OMGandODMG[3],showagaintheimport anceofastandarddatabase ResearchwassupportedinpartbythegrantNSER C-A3723fromtheNaturalSciencesandEngineer ingResearchCouncilofCanada,thegrantNCE:I RIS/PRECARN-HMI-5fromtheNetworksofCentre sofExcellenceofCanada,andgrantsfromBC/Ad vancedSystemsInstitute,theMPRT eltechLtd.

DMQL: A Data Mining Query Language for Relational Databases Jia w ei Han Y ong jian F u W ei W ang Krzysztof Kop erski Osma r Zaiane Database Systems Resea

Tags:

  Database, Language, Mining, Relational, Query, Mining query language for, Mining query language for relational databases

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of DMQL: A Data Mining Query Language for …

1 DMQL: ADataMiningQueryLanguageforRelation alDatabases ,DMQL,forminingdi eldwith ourishingR&Dactivitiesandsuccessfulsyste msreportedrecently[17,5].Sincethetasksan dapplicationsofdataminingarebroadanddive rse,itisexpectedthatvariouskindsof exible, erentgraphicaluserinterfacesincommercial relationalsys-tems,itsunderlying\core"re lationalquerylanguagesetsasolidfoundatio nforresearchanddevelopmentofrelationalsy stems,facilitatesinformationexchangeandt echnologytransfer,andpromotescommerciali zation,broadapplication, ,thesuccessoftherelationalsys-temsshould becreditedinparttothestandardizationofre lationalquerylanguages,whichwasdoneatthe earlystageinthedevelopmentofthe eld[20].Therecentstandardizationactiviti esindatabasesystems,suchastheworkrelated toSQL-3,OMGandODMG[3],showagaintheimport anceofastandarddatabase ResearchwassupportedinpartbythegrantNSER C-A3723fromtheNaturalSciencesandEngineer ingResearchCouncilofCanada,thegrantNCE:I RIS/PRECARN-HMI-5fromtheNetworksofCentre sofExcellenceofCanada,andgrantsfromBC/Ad vancedSystemsInstitute,theMPRT eltechLtd.

2 , ,fromdatasummarizationtominingassociatio nrules,dataclassi cation,or ndingsomespeci ,therearemanygraphicaluserinterfacesford i ,wefeelthatitisimportanttounderstandtheu nderlyingmechanismsofdi ,weexaminethegeneralphilosophieswhichin uencethedesignofsuchadataminingquerylang uageandpresentstep-by-stepatentativelyde signedDataMiningQueryLanguage, , :philosophyThephilosophyofdataminingmays tronglyin ,adataminershouldbeabletoworkonanyspeci ,asupersetofthedatacanbecollected, ,adatamininglanguageshouldtakeaquerylang uageasitssubtaskinthespeci , ,sinceminingcanbeperformedinmanydi erentwaysonanyspeci csetofdata,hugeamountsanddi erentkindsofknowledgemaybegeneratedbyung uided,autonomousdiscovery,whereasmuchofs uchdiscoveredknowledgecouldbeoutofuser' ,weproposecommand-drivendatamining,which speci (suchasconceptualhierarchyinformation,et c.)

3 ,theavailabilityofrelativelystrongback-g roundknowledgenotonlyimprovesthee ciencyofadiscoveryprocessbutalsoexpresse suser'spref-erenceforguidedgeneralizatio n,whichmayleadtoane ,discoveredknowl-edgeisexpressedintermso fprimitivedata(datastoredinthedatabases) ,oftenintheformoffunc-tionalormultivalue ddependencyrules,primitivelevelassociati onrules, ,withconceptgeneralization,discov-eredkn owledgecanbeexpressedintermsofconcise,ex pressive,andhigh-levelormultiple-levelab strac-tion,intheformofgeneralizedrulesor generalizedconstraints, , ed exiblyto exiblyand/orinteractivelyspecifyvariousk indsofthresholdswhichcanbeusedtoselectde sired,interestingrulesand erentkindsofrulesBasedontheaboveconsider ations,adataminingquerylanguage,DMQL, cationsoffourmajorprimitivesindatamining :(1)thesetofdatainrelevancetoadatamining process,(2)thekindofknowledgetobediscove red,(3)thebackgroundknowledge,and(4)thej usti cationoftheinterestingnessoftheknowledge ( ,thresholds).

4 The rstprimitive,thesetofrelevantdata,canbes peci edinawaysimilartothatofarelationalquery, ,thekindofknowledgetobediscovered,mayinc ludegeneralizedrelations,charac-teristic rules,discriminantrules,classi cationrules,as-sociationrules,etc., erentkindsofrulesorbeviewedathighconcept levelsfromdi edbyallormostoftheexamplesintheclassunde rgoingexamination(calledthetar-getclass) .Forexample,thesymptomsofaspeci (thetargetclass)fromotherclasses(calledc ontrastingclasses).Forexample,todistingu ishonediseasefromothers, cationruleisasetofruleswhichclassi estherelevantsetofdata,whichisusuallyobt ainedby rstclassifyingthedata( ,obtainingapreferredclassi cationscheme) , (patterns).Forexample, ,thebackgroundknowledge, ,theinterestingnessorsigni -canceoftheknowledgetobediscoveredcanbes peci edasasetofdi erentminingthresholdsdependingonthekinds ofrulestobemined, erentkindsofrulesDMQL adoptsanSQL-likesyntaxtofacilitatehighle veldataminingandnaturalintegrationwithre lationalquerylanguage, nedinanextendedBNFgrammar,where\[]"repre sents0oroneoccurrence,\fg"represents0orm oreoccurrences,andwordsinsansseriffontre presentkeywords.

5 =usedatabasehdatabasenameifusehierarchyh hierarchynameiforhattributeighrulespecir elatedtohattroragglistifromhrelation(s)i [wherehconditioni][orderbyhorderlisti]fw ith[hkindsofi]threshold=hthresholdvaluei [forhattribute(s)i]gInhDMQLi,\usedatabas ehdatabasenamei"directstheminingtasktoas peci cdatabase\hdatabasenamei",andtheoptional statement,\usehierarchyhhierarchyiforhat tributei",assignshhierarchyitoaparticula rat-tributehattributei(otherwise,adefaul thierarchyisused).Thestatement,hrulespec i,isthespeci ,therulespeci cationshouldbeindi erentformats, ,\relatedtohattoragglisti", \from"and\where"clauses,\fromhrelation(s )i[wherehconditioni], \orderby"clausesimplyspeci \with-threshold"statementspeci cation, ::=generalizedata[intohrelationnamei] ::= ndcharacteristicrules[ashrulenamei] ::= nddiscriminantrules[ashrulenamei]forhcla ss1iwithhcondition1ifromhrelation(s)1iin contrasttohclass2iwithhcondition2ifromhr elation(s)2ifincontrasttohclassiiwithhco nditioniifromrelation(s) cationandminingclassi ::= ndclassi cationrules[ashrulenamei][accordingtohat tributesi].

6 = ndassociationrules[ashrulenamei] cationofinterestingnessandthresholdsAdat aminingtaskmayneedtospecifyasetofthresho ldstocontrolitsdataminingprocess,includi ngguidinganinductionprocess,constraining searchforinterestingknowledge,testingthe interestingnessorsigni canceofthediscoveredknowledge, ,asetofdataminingthresholds, erentkindsofruleminingmayneedtospecifydi erentkindsofthresholdswhichcanbecategori zedintoatleastthreeclasses, (support) ,thisthresholdiscalledtheminimumsupport[ 1],andthepatternspassingthissupportthres holdarecalledlarge(orfrequent)dataitems; whereasinminingcharacteristicrules,itisc allednoisethreshold[7], !B, ,ProbfBjAg,mustpassthisthresholdtomakesu rethattheimplicationrelationshipisreason ablystrong[1]. [19].Thesyntaxofthethresholdspeci cationisasbelow,with[hkindsofi]threshold =hthresholdvaluei[forhattribute(s)i]wher ehkindsoficanbesupport,con dence,noise,redundancy,etc.

7 , (name,sno,status,major,gpa,birthdate,bir thplace,address)course(cno,title,departm ent)grading(sno,cno,instructor,semester, grade) \usedatabaseuniversitydatabase" ,(q1),isto ndthegeneralcharacteristicsofthegraduate studentsincomputingscienceinrelevancetoa ttributesgpa,birthplaceandaddress,forthe studentsborninCanada.(q1): ndcharacteristicrulerelatedtogpa,birthpl ace,address,count(*)%fromstudentwheresta tus=\graduate"andmajor=\cs"andbirthplace =\Canada"withnoisethreshold= rstretrievedatafromthedatabaseusingatran sformedSQLquery,wherethehighlevelconstan ts\Canada"and\graduate"aretransformedint olowlevelprimitiveconceptsinthedatabasea ccordingtotheprovided(default) ndingcharacteristicrules[7] ,birthplaceandaddressarepre-sented,assoc iatedwiththecorrespondingcount( )%( ,thecountoftuplesinthecorrespondinggroup inproportiontothetotalnumberoftuples).

8 (q2)isto ndthediscriminantfeaturestocomparegradua testudentsversusundergraduatestudentsinc omputingscienceinrelevancetoattributesgp a,birthplaceandad-dress,forthestudentsbo rninCanada.(q2): nddiscriminantruleforcsgradswithstatus=\ graduate"incontrasttocsundergradswithsta tus=\undergraduate"relatedtogpa,birthpla ce,address,count(*)%fromstudentwheremajo r=\cs"andbirthplace=\Canada"Thisdatamini ngquerywill rstretrievedataintotwoclasses,\csgrads"a nd\csundergrads",usingatransformedSQLque rywhichmapsthehighlevelconstantsin(q2) ndingdiscriminantrules[7] (q3)istoclassifystudentsaccordingtotheir gpa'sand ndtheirclassi cationrulesforthosemajoringincomputingsc ienceandborninCanada,withtheattributesbi rthplaceandaddressinconsideration.(q3): ndclassi cationrulesforcsstudentsaccordingtogpare latedtobirthplace,addressfromstudentwher emajor=\cs"andbirthplace=\Canada"Thisque rywill rstcollecttherelevantsetofdata,andthenex ecutesomedataclassi cationalgorithm,suchas[14,21]toclassifys tudentsaccordingtotheirgpa' (q4)isto ndstrongassociationrelationshipsforthose studentsmajoringincomputingscienceandbor ninCanada,inrelevancetotheattributesgpa, birthplaceandaddress.

9 (q4): ndassociationrulesrelatedtogpa,birthplac e,addressfromstudentwheremajor=\cs"andbi rthplace=\Canada"withsupportthreshold= dencethreshold= rstcollecttherelevantsetofdataandthenexe cuteanassociationminingalgorithm,suchas[ 1]or[8],to dencethresholdsarespeci ed(otherwiseusingdefaultvalues) cationofconcepthierarchiesConcepthierarc hy(orlattice)providesusefulback-groundkn owledgeforexpressingdataminingresultsinc oncise, edbasedondatabaseattributerelationships, particulargroupingoperations, , ,address(num;street;city;province;countr y) ,date(day;month;year) neconcepthierarchyi::=de nehierarchyforhattrnamei[(hhiernamei)]:h attrseti< nehierarchyforaddress:fcity,province,cou ntryg<fprovince, neconcepthierarchyi::=de nehierarchyforhattrnamei[(hhiernamei)]:h constantseti< ,Alberta,Manitoba,Saskatchewang<fWestern Canadagde nehierarchyforaddress:fWesternCanada,Cen tralCanada,MaritimeProvincesg< ::=inserthconceptnameiunderhconceptnamei tohierarchy[(hhiernamei)]forhattrnameijd eletehconceptnameiunderhconceptnameifrom hierarchy[(hhiernamei)] ,concepthierarchiesmaynotalwaysbeprovide dbyspeci ,itisoftennecessarytoprovideprimitivesto chooseahierarchyotherthanthedefaultonefr omasetofavailableones,dynamicallyadjusta hierarchy.

10 =usehierarchyhhiernameiforhattrnameijdis playhierarchy[hhiernamei]forhattrnameijd ynamicallyadjusthierarchy[hhiernamei]for hattrnameijgeneratehierarchy[hhiernamei] ed exiblyusingtheprimitivesdiscussedabove,i tisdi ,interactivere ningofaminingtaskorminingresultsbecomese ssentialfore ningofaminingtaskoftenrequireseasymodi cationofaquerycondition,thresholds,relev antattributes,selectedhierarchies,orlett ingahierarchybedynamicallyadjusted, ed(butnotsoconveniently) ningofdataminingresults,oneshoulddisplay theresultsusingrulevisualizationtools[12 ]orindi erentoutputforms,includinggeneralizedrel ations,projectedstatisticaltables,barcha rts,piecharts,curves,surfaces,quantitati verules, ::=displayinhresultformiwherethehresultf ormicouldbeprojectedstatisticaltables,ba rcharts,curves, ,withtheavailabilityofconcepthierarchies ,knowledgecanbeexpressedatdi erentconceptlev-elsandfromdi \roll-up"and\drill-down"operations[11].


Related search queries