Correlation-based Feature Selection for Machine Learning

Department of Computer ScienceHamilton, NewZealandCorrelation- based Feature Selection forMachine LearningMark A. HallThis thesis is submitted in partial fulfilment of the requirementsfor the degree of Doctor of Philosophy at The University of 1999c 1999 Mark A. HalliiAbstractA central problem in Machine Learning is identifying a representative set of features fromwhich to construct a classification model for a particular task. This thesis addresses theproblem of Feature Selection for Machine Learning through acorrelation based central hypothesis is that good Feature sets contain features that are highly correlatedwith the class, yet uncorrelated with each other. A Feature evaluation formula, basedon ideas from test theory, provides an operational definition of this hypothesis. CFS(Correlation based Feature Selection ) is an algorithm thatcouples this evaluation formulawith an appropriate correlation measure and a heuristic search was evaluated by experiments on artificial and natural datasets.

Three Machine learn-ing algorithms were used: (a decision tree learner), IB1 (an instance based learner),and naive Bayes. Experiments on artificial datasets showed that CFS quickly identifiesand screens irrelevant, redundant, and noisy features, andidentifies relevant features aslong as their relevance does not strongly depend on other features. On natural domains,CFS typically eliminated well over half the features. In most cases, classification accuracyusing the reduced Feature set equaled or bettered accuracy using the complete Feature Selection degraded Machine Learning performance in cases where some featureswere eliminated which were highly predictive of very small areas of the instance experiments compared CFS with a wrapper a well known approach to featureselection that employs the target Learning algorithm to evaluate Feature sets. In many casesCFS gave comparable results to the wrapper, and in general, outperformed the wrapperon small datasets.

CFS executes many times faster than the wrapper, which allows it toscale to larger methods of extending CFS to handle Feature interaction are presented and exper-imentally evaluated. The first considers pairs of features and the second incorporatesiiifeature weights calculated by the RELIEF algorithm. Experiments on artificial domainsshowed that both methods were able to identify interacting features. On natural domains,the pairwise method gave more reliable results than using weights provided by and foremost I would like to acknowledge the tireless and prompt help of my super-visor, Lloyd Smith. Lloyd has always allowed me complete freedom to define and exploremy own directions in research. While this proved difficult and somewhat bewildering tobegin with, I have come to appreciate the wisdom of his way itencouraged me to thinkfor myself, something that is unfortunately all to easy to avoid as an and the Department of Computer Science have provided me with much appreciatedfinancial support during my degree.

They have kindly provided teaching assistantshippositions and travel funds to attend thank Geoff Holmes, Ian Witten and Bill Teahan for providing valuable feedback andreading parts of this thesis. Stuart Inglis (super-combo!), Len Trigg, and Eibe Frankdeserve thanks for their technical assistance and helpful comments. Len convinced me(rather emphatically)notto use MS Word for writing a thesis. Thanks go to RichardLittin and David McWha for kindly providing the University of Waikato thesis style andassistance with thanks must also go to my family and my partner Bernadette. They have providedunconditional support and encouragement through both the highs and lows of my time ingraduate of FiguresxvList of Tablesxx1 .. statement .. Overview ..52 Supervised Machine Learning : Concepts and Classification Task .. Representation .. Algorithms .. Bayes .. Decision Tree Generator.

based Learner .. Evaluation .. Discretization .. of Discretization ..193 Feature Selection for Machine Selection in Statistics and Pattern Recognition .. of Feature Selection Algorithms .. Search .. Filters .. Driven Filters .. Selection Through Discretization .. One Learning Algorithm as a Filter for Another .. Information Theoretic Feature Filter .. Instance based Approach to Feature Selection .. Wrappers .. for Decision Tree Learners .. for Instance based Learning .. for Bayes Classifiers .. of Improving the Wrapper .. Weighting Algorithms .. Summary ..494 Correlation-based Feature .. Nominal Features .. Uncertainty .. in Correlation Measures between Nominal Features .. Measurement of Bias .. the Level of Attributes .. the Sample Size .. Correlation-based Feature Selector .. Summary ..745 Datasets Used in .. Methodology ..80viii6 Evaluating CFS with 3 ML Domains.

Attributes .. Attributes .. s problems .. Domains .. Summary .. 1197 Comparing CFS to the Feature Selection .. Summary .. 1288 Extending CFS: Higher Order Work .. Features .. RELIEF into CFS .. 1439 .. Work .. 147 AppendicesA Graphs for Chapter 4151B Curves for Concept A3 with Added Redundant Attributes153C Results for CFS-UC, CFS-MDL, and CFS-Relief on 12 Natural Domains157D 5 2cv Pairedttest Results159ixE CFS Merit Versus Accuracy163F CFS Applied to 37 UCI Domains167 Bibliography171xList of decision tree for the Golf dataset. Branches correspond to the valuesof attributes; leaves indicate classifications.. and wrapper Feature selectors.. subset space for the golf dataset.. effects on the correlation between an outside variable and a compos-ite variable(rzc)of the number of components(k), the inter-correlationsamong the components(rii), and the correlations between the compo-nents and the outside variable(rzi).

Effects of varying the attribute and class level on symmetrical uncer-tainty (a & b), symmetrical relief (c & d), and normalized symmetricalMDL (e & f) when attributes are informative (graphs on the left) and non-informative (graphs on the right). Curves are shown for2,5, and10classes. effect of varying the training set size on symmetrical uncertainty (a& b), symmetricalrelief(c & d), and normalized symmetrical MDL (e &f) when attributes are informative and non-informative. The number ofclasses is2; curves are shown for2,10, and20valued attributes.. components of CFS. Training and testing data is reduced to containonly the features selected by CFS. The dimensionally reduced data canthen be passed to a Machine Learning algorithm for inductionand prediction. of CFS Feature Selection on accuracy of naive Bayes show results that are statistically significant .. Learning curve for IB1 on the dataset A2with17added irrelevantattributes.

Of irrelevant attributes selected on concept A1 (with added irrel-evant features) by CFS-UC, CFS-MDL, and CFS-Relief as a function oftraining set size.. of relevant attributes selected on concept A1 (with added irrel-evant features) by CFS-UC, CFS-MDL, and CFS-Relief as a function oftraining set size.. curves for IB1, CFS-UC-IB1, CFS-MDL-IB1, and CFS-Relief-IB1 on concept A1 (with added irrelevant features) .. of irrelevant attributes selected on concept A2 (with added irrel-evant features) by CFS-UC, CFS-MDL, and CFS-Relief as a function oftraining set size. Note: CFS-UC and CFS-Relief produce the same of relevant attributes selected on concept A2(with added irrel-evant features) by CFS-UC, CFS-MDL, and CFS-Relief as a function oftraining set size.. curves for IB1, CFS-UC-IB1, CFS-MDL-IB1, and CFS-Relief-IB1 on concept A2(with added irrelevant features).. of irrelevant attributes selected on concept A3(with added irrel-evant features) by CFS-UC, CFS-MDL, and CFS-Relief as a function oftraining set size.

Of relevant attributes selected on concept A3(with added irrel-evant features) by CFS-UC, CFS-MDL, and CFS-Relief as a function oftraining set size.. of irrelevant multi-valued attributes selectedon concept A3(withadded irrelevant features) by CFS-UC, CFS-MDL, and CFS-Relief as afunction of training set size.. Learning curves for IB1, CFS-UC-IB1, CFS-MDL-IB1, andCFS-Relief-IB1 on concept A3(with added irrelevant features).. Number of redundant attributes selected on concept A1(with added re-dundant features) by CFS-UC, CFS-MDL, and CFS-Relief as a functionof training set size.. Number of relevant attributes selected on concept A1 (with added redun-dant features) by CFS-UC, CFS-MDL, and CFS-Relief as a function oftraining set size.. Number of multi-valued attributes selected on conceptA1 (with addedredundant features) by CFS-UC, CFS-MDL, and CFS-Relief as afunctionof training set size.. Number of noisy attributes selected on concept A1(with added redun-dant features) by CFS-UC, CFS-MDL, and CFS-Relief as a function oftraining set size.

Learning curves for nbayes (naive-Bayes), CFS-UC-nbayes, CFS-MDL-nbayes, and CFS-Relief-nbayes on concept A1(with added redundantfeatures).. Number of redundant attributes selected on concept A2(with added re-dundant features) by CFS-UC, CFS-MDL, and CFS-Relief as a functionof training set size.. Number of relevant attributes selected on concept A2(with added redun-dant features) by CFS-UC, CFS-MDL, and CFS-Relief as a function oftraining set size.. Learning curves for nbayes (naive Bayes), CFS-UC-nbayes, CFS-MDL-nbayes, and CFS-Relief-nbayes on concept A2(with added redundantfeatures).. Learning curves for nbayes (naive Bayes), CFS-UC-nbayes, CFS-MDL-nbayes, and CFS-Relief-nbayes on concept A3(with added redundantfeatures).. Number of natural domains for which CFS improved accuracy (left) anddegraded accuracy (right) for naive Bayes (a), IB1 (b), and (c).. Effect of Feature Selection on the size of the trees induced by on thenatural domains.

Correlation-based Feature Selection for Machine Learning

Tags:

Information

Advertisement

Transcription of Correlation-based Feature Selection for Machine Learning

Related search queries

Correlation-based Feature Selection for Machine Learning

Tags:

Information

Advertisement

Documents from same domain

Related documents

Related search queries