Data Mining: Concepts and Techniques

Data Mining: Concepts and TechniquesData Mining: Concepts and TechniquesSecond EditionJiawei HanandMicheline KamberUniversity of Illinois at Urbana-ChampaignA M S T E R D A M B O S T O NH E I D E L B E R G L O N D O NN E W Y O R K O X F O R D P A R I SS A N D I E G O S A N F R A N C I S C OS I N G A P O R E S Y D N E Y T O K Y OPublisherDiane CerraPublishingServices Manager Simon CrumpEditorial AssistantAsma StephanCover DesignCover ImageCover IllustrationText DesignCompositiondiacriTechTechnical IllustrationDartmouth Publishing, PressProofreaderMultiscience PressIndexerMultiscience PressInterior printerMaple-Vail Book Manufacturing GroupCover printerPhoenix ColorMorgan Kaufmann Publishers is an imprint of Sansome Street, Suite 400, San Francisco.

CA 94111 This book is printed on acid-free 2006 by Elsevier Inc. All rights used by companies to distinguish their products are often claimed as trademarks orregistered trademarks. In all instances in which Morgan Kaufmann Publishers is aware of a claim,the product names appear in initial capital or all capital letters. Readers, however, should contactthe appropriate companies for more complete information regarding trademarks part of this publication may be reproduced, stored in a retrieval system, or transmitted in anyform or by any means electronic, mechanical, photocopying, scanning, or otherwise withoutprior written permission of the may be sought directly from Elsevier s Science & Technology Rights Department inOxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, may also complete your request on-line via the Elsevier homepage( ) by selecting Customer Support and then Obtaining Permissions.

Library of Congress Cataloging-in-Publication DataApplication submittedISBN 13: 978-1-55860-901-3 ISBN 10: 1-55860-901-6 For information on all Morgan Kaufmann publications, visit our Web site in the United States of America06 07 08 09 105 4 3 2 1 DedicationTo Y. Dora and Lawrence for your love and Erik, Kevan, Kian, and Mikael for your love and the Author xviiForeword xixPreface xxiChapter 1 Introduction Motivated Data Mining? Why Is It Important? , What Is Data Mining? Mining On What Kind of Data? Relational Databases Data Warehouses Transactional Databases Advanced Data and Information Systems and AdvancedApplications Mining Functionalities What Kinds of Patterns Can BeMined? Concept/Class Description: Characterization andDiscrimination Mining Frequent Patterns, Associations, and Correlations Classification and Prediction Cluster analysis Outlier analysis Evolution analysis All of the Patterns Interesting?

Of Data Mining Systems Mining Task Primitives of a Data Mining System witha Database or Data Warehouse System Issues in Data Mining 39 Exercises 40 Bibliographic Notes 42 Chapter 2 Data Preprocessing Preprocess the Data? Data Summarization Measuring the Central Tendency Measuring the Dispersion of Data Graphic Displays of Basic Descriptive Data Summaries Cleaning Missing Values Noisy Data Data Cleaning as a Process Integration and Transformation Data Integration Data Transformation Reduction Data Cube Aggregation Attribute Subset Selection Dimensionality Reduction Numerosity Reduction Discretization and Concept Hierarchy Generation Discretization and Concept Hierarchy Generation forNumerical Data Concept Hierarchy Generation for Categorical Data 97 Exercises 97 Bibliographic Notes 101 Chapter 3 Data Warehouse and OLAP Technology.

An Overview Is a Data Warehouse? Differences between Operational Database Systemsand Data Warehouses But, Why Have a Separate Data Warehouse? Multidimensional Data Model From Tables and Spreadsheets to Data Cubes Stars, Snowflakes, and Fact Constellations:Schemas for Multidimensional Databases Examples for Defining Star, Snowflake,and Fact Constellation Schemas Measures: Their Categorization and Computation Concept Hierarchies OLAP Operations in the Multidimensional Data Model A Starnet Query Model for QueryingMultidimensional Databases Warehouse Architecture Steps for the Design and Construction of Data Warehouses A Three-Tier Data Warehouse Architecture Data Warehouse Back-End Tools and Utilities Metadata Repository Types of OLAP Servers.

ROLAP versus MOLAP versus HOLAP Warehouse Implementation Efficient Computation of Data Cubes Indexing OLAP Data Efficient Processing of OLAP Queries Data Warehousing to Data Mining Data Warehouse Usage From On-Line Analytical Processingto On-Line Analytical Mining 150 Exercises 152 Bibliographic Notes 154 Chapter 4 Data Cube Computation and Data Generalization Methods for Data Cube Computation A Road Map for the Materialization of Different Kindsof Cubes Multiway Array Aggregation for Full Cube Computation BUC: Computing Iceberg Cubes from the Apex CuboidDownward Star-cubing: Computing Iceberg Cubes Usinga Dynamic Star-tree Structure Precomputing Shell Fragments for Fast High-DimensionalOLAP Computing Cubes with Complex Iceberg Conditions Development of Data Cube and OLAPT echnology Discovery-Driven Exploration of Data Cubes Complex Aggregation at Multiple Granularity:Multifeature Cubes Constrained Gradient analysis in Data Cubes Induction An AlternativeMethod for Data Generalization and Concept Description Attribute-Oriented Induction for Data Characterization Efficient Implementation of Attribute-Oriented Induction Presentation of the Derived Generalization Mining Class Comparisons: Discriminating betweenDifferent Classes Class Description.

Presentation of Both Characterizationand Comparison 218 Exercises 219 Bibliographic Notes 223 Chapter 5 Mining Frequent Patterns, Associations, and Correlations Concepts and a Road Map Market Basket analysis : A Motivating Example Frequent Itemsets, Closed Itemsets, and Association Rules Frequent Pattern Mining: A Road Map and Scalable Frequent Itemset Mining Methods The Apriori Algorithm: Finding Frequent Itemsets UsingCandidate Generation Generating Association Rules from Frequent Itemsets Improving the Efficiency of Apriori Mining Frequent Itemsets without Candidate Generation Mining Frequent Itemsets Using Vertical Data Format Mining Closed Frequent Itemsets Various Kinds of Association Rules Mining multilevel Association Rules Mining Multidimensional Association Rulesfrom Relational Databases and Data Warehouses Association Mining to Correlation analysis Strong Rules Are Not Necessarily Interesting: An Example From Association analysis to Correlation analysis Association Mining Metarule-Guided Mining of Association Rules Constraint Pushing.

Mining Guided by Rule Constraints 272 Exercises 274 Bibliographic Notes 280 ContentsxiChapter 6 Classification and Prediction Is Classification? What Is Prediction? Regarding Classification and Prediction Preparing the Data for Classification and Prediction Comparing Classification and Prediction Methods by Decision Tree Induction Decision Tree Induction Attribute Selection Measures Tree Pruning Scalability and Decision Tree Induction Classification Bayes Theorem Na ve Bayesian Classification Bayesian Belief Networks Training Bayesian Belief Networks Classification Using IF-THEN Rules for Classification Rule Extraction from a Decision Tree Rule Induction Using a Sequential Covering Algorithm by Backpropagation A Multilayer Feed-Forward Neural Network Defining a Network Topology

Backpropagation Inside the Black Box: Backpropagation and Interpretability Vector Machines The Case When the Data Are Linearly Separable The Case When the Data Are Linearly Inseparable Classification: Classification by AssociationRule analysis Learners (or Learning from Your Neighbors) Classifiers Case-Based Reasoning Classification Methods Genetic Algorithms Rough Set Approach Fuzzy Set Approaches Linear Regression Nonlinear Regression Other Regression-Based Methods and Error Measures Classifier Accuracy Measures Predictor Error Measures the Accuracy of a Classifier or Predictor Holdout Method and Random Subsampling Cross-validation Bootstrap Methods Increasing the Accuracy Bagging Boosting Selection Estimating Confidence Intervals ROC Curves 373 Exercises 375 Bibliographic Notes 378 Chapter 7 Cluster analysis Is Cluster analysis ?

Of Data in Cluster analysis Interval-Scaled Variables Binary Variables Categorical, Ordinal, and Ratio-Scaled Variables Variables of Mixed Types Vector Objects Categorization of Major Clustering Methods Methods Classical Partitioning Methods:k-Means andk-Medoids Partitioning Methods in Large Databases: Fromk-Medoids to CLARANS Methods Agglomerative and Divisive Hierarchical Clustering BIRCH: Balanced Iterative Reducing and ClusteringUsing Hierarchies ROCK: A Hierarchical Clustering Algorithm forCategorical Attributes Chameleon: A Hierarchical Clustering AlgorithmUsing Dynamic Modeling Methods DBSCAN: A Density-Based Clustering Method Based onConnected Regions with Sufficiently High Density OPTICS: Ordering Points to Identify the ClusteringStructure DENCLUE: Clustering Based on DensityDistribution Functions Methods STING: STatistical INformation Grid WaveCluster : Clustering Using Wavelet Transformation Clustering Methods Expectation-Maximization Conceptual Clustering Neural Network Approach High-Dimensional Data CLIQUE: A Dimension-Growth Subspace Clustering Method PROCLUS.

Data Mining: Concepts and Techniques

Tags:

Information

Transcription of Data Mining: Concepts and Techniques

Related search queries

Data Mining: Concepts and Techniques

Tags:

Information

Documents from same domain

2.4.8 Kullback-Leibler Divergence

Data Mining: Concepts and Techniques

Related documents

Fundamentals of Hierarchical Linear and Multilevel Modeling

Getting Started in Data Analysis using Stata

Digital Modulation - University of Pittsburgh

MULTILEVEL ANALYSIS - University of Oxford

Lecture 1 Introduction to Multi-level Models

Related search queries