Example: tourism industry

Data Mining Classification: Basic Concepts and Techniques

data Mining Classification: Basic Concepts and TechniquesLecture Notes for Chapter 3 Introduction to data Mining , 2ndEditionbyTan, Steinbach, Karpatne, Kumar2/1/2021 Introduction to data Mining , 2ndEdition1 Classification: DefinitionlGiven a collection of records (training set ) Each record is by characterized by a tuple (x,y), where x is the attribute set and y is the class label x: attribute, predictor, independent variable, input y: class, response, dependent variable, outputlTask: Learn a model that maps each attribute set x into one of the predefined class labels y2/1/2021 Introduction to data Mining , 2ndEdition212 Examples of Classification TaskTa s kAttribute set, xClass label, yCategorizing email messagesFeatures extracted from email message header and contentspam or non-spamIdentifying tumor cellsFeatu

Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 Introduction to Data Mining, 2nd Edition by Tan, Steinbach, Karpatne, Kumar 2/1/2021 Introduction to Data Mining, 2nd Edition 1 Classification: Definition l Given a collection of records (training set ) – Each record is by characterized by a tuple

Tags:

  Data, Mining, Data mining

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Data Mining Classification: Basic Concepts and Techniques

1 data Mining Classification: Basic Concepts and TechniquesLecture Notes for Chapter 3 Introduction to data Mining , 2ndEditionbyTan, Steinbach, Karpatne, Kumar2/1/2021 Introduction to data Mining , 2ndEdition1 Classification: DefinitionlGiven a collection of records (training set ) Each record is by characterized by a tuple (x,y), where x is the attribute set and y is the class label x: attribute, predictor, independent variable, input y: class, response, dependent variable, outputlTask: Learn a model that maps each attribute set x into one of the predefined class labels y2/1/2021 Introduction to data Mining , 2ndEdition212 Examples of Classification TaskTa s kAttribute set, xClass label, yCategorizing email messagesFeatures extracted from email message header and contentspam or non-spamIdentifying tumor cellsFeatures extracted from x-rays or MRI scansmalignant or benign cellsCataloging galaxiesFeatures extracted from telescope imagesElliptical, spiral, or irregular-shaped galaxies2/1/2021 Introduction to data Mining .

2 2ndEdition3 General Approach for Building Classification Model2/1/2021 Introduction to data Mining , 2ndEdition434 Classification Techniques Base Classifiers Decision Tree based Methods Rule-based Methods Nearest-neighbor Na ve Bayes and Bayesian Belief Networks Support Vector Machines Neural Networks, Deep Neural Nets Ensemble Classifiers Boosting, Bagging, Random Forests2/1/2021 Introduction to data Mining , 2ndEdition5 Example of a Decision TreeID Home Owner Marital Status Annual Income Defaulted Borrower 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 Home OwnerMarStIncomeYESNONONOYe sN oMarriedSingle, Divorced< 80K> 80 KSplitting AttributesTraining DataModel.

3 Decision Tree2/1/2021 Introduction to data Mining , 2ndEdition656 Apply Model to Test DataHome OwnerMarStIncomeYESNONONOYe sN oMarriedSingle, Divorced< 80K> 80 KHome Owner Marital Status Annual Income Defaulted Borrower No Married 80K ? 10 Test DataStart from the root of to data Mining , 2ndEdition7 Apply Model to Test DataMarStIncomeYESNONONOYe sN oMarriedSingle, Divorced< 80K> 80 KHome Owner Marital Status Annual Income Defaulted Borrower No Married 80K ? 10 Test DataHome Owner2/1/2021 Introduction to data Mining , 2ndEdition878 Apply Model to Test DataMarStIncomeYESNONONOYe sNoMarriedSingle, Divorced< 80K> 80 KHome Owner Marital Status Annual Income Defaulted Borrower No Married 80K ?

4 10 Test DataHome Owner2/1/2021 Introduction to data Mining , 2ndEdition9 Apply Model to Test DataMarStIncomeYESNONONOYe sNoMarriedSingle, Divorced< 80K> 80 KHome Owner Marital Status Annual Income Defaulted Borrower No Married 80K ? 10 Test DataHome Owner2/1/2021 Introduction to data Mining , 2ndEdition10910 Apply Model to Test DataMarStIncomeYESNONONOYe sNoMarried Single, Divorced< 80K> 80 KHome Owner Marital Status Annual Income Defaulted Borrower No Married 80K ? 10 Test DataHome Owner2/1/2021 Introduction to data Mining , 2ndEdition11 Apply Model to Test DataMarStIncomeYESNONONOYe sNoMarried Single, Divorced< 80K> 80 KHome Owner Marital Status Annual Income Defaulted Borrower No Married 80K ?

5 10 Test DataAssign Defaulted to No Home Owner2/1/2021 Introduction to data Mining , 2ndEdition121112 Another Example of Decision TreeMarStHome OwnerIncomeYESNONONOYe sNoMarriedSingle, Divorced< 80K> 80 KThere could be more than one tree that fits the same data !ID Home Owner Marital Status Annual Income Defaulted Borrower 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 2/1/2021 Introduction to data Mining .

6 2ndEdition13 Decision Tree Classification TaskApply ModelLearn ModelTid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes 10 Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10 Decision Tree2/1/2021 Introduction to data Mining , 2ndEdition141314 Decision Tree Induction Many Algorithms: Hunt s Algorithm (one of the earliest) CART ID3, SLIQ,SPRINT2/1/2021 Introduction to data Mining , 2ndEdition15 General Structure of Hunt s AlgorithmlLet Dtbe the set of training records that reach a node tlGeneral Procedure.

7 If Dtcontains records that belong the same class yt, then t is a leaf node labeled as yt If Dtcontains records that belong to more than one class, use an attribute test to split the data into smaller subsets. Recursively apply the procedure to each Home Owner Marital Status Annual Income Defaulted Borrower 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 2/1/2021 Introduction to data Mining , 2ndEdition161516 Hunt s Algorithm(3,0)(4,3)(3,0)(1,3)(3,0)(3,0)( 1,0)(0,3)(3,0)(7,3)

8 ID Home Owner Marital Status Annual Income Defaulted Borrower 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 2/1/2021 Introduction to data Mining , 2ndEdition17 Hunt s Algorithm(3,0)(4,3)(3,0)(1,3)(3,0)(3,0)( 1,0)(0,3)(3,0)(7,3) ID Home Owner Marital Status Annual Income Defaulted Borrower 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 2/1/2021 Introduction to data Mining , 2ndEdition181718 Hunt s Algorithm(3,0)(4,3)(3,0)(1,3)(3,0)(3,0)( 1,0)(0,3)(3,0)(7,3)

9 ID Home Owner Marital Status Annual Income Defaulted Borrower 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 2/1/2021 Introduction to data Mining , 2ndEdition19 Hunt s Algorithm(3,0)(4,3)(3,0)(1,3)(3,0)(3,0)( 1,0)(0,3)(3,0)(7,3) ID Home Owner Marital Status Annual Income Defaulted Borrower 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 2/1/2021 Introduction to data Mining , 2ndEdition201920 Design Issues of Decision Tree InductionlHow should training records be split?

10 Method for expressing test condition depending on attribute types Measure for evaluating the goodness of a test conditionlHow should the splitting procedure stop? Stop splitting if all the records belong to the same class or have identical attribute values Early termination 2/1/2021 Introduction to data Mining , 2ndEdition21 Methods for Expressing Test ConditionslDepends on attribute types Binary Nominal Ordinal Continuous2/1/2021 Introduction to data Mining , 2ndEdition222122 Test Condition for Nominal Attributes Multi-way split: Use as many partitions as distinct values.


Related search queries