Example: barber

DIGITAL NOTES ON DATA WAREHOUSING AND DATA …

DWDM-MRCET Page 1 DIGITAL NOTES ON data WAREHOUSING AND data MINING III YEAR - II SEM (2018-19) DEPARTMENT OF INFORMATION TECHNOLOGY MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY (Autonomous Institution UGC, Govt. of India) (Affiliated to JNTUH, Hyderabad, Approved by AICTE - Accredited by NBA & NAAC A Grade - ISO 9001:2015 Certified) Maisammaguda, Dhulapally (Post Via. Hakimpet), Secunderabad 500100, Telangana State, INDIA. DWDM-MRCET Page 2 MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY DEPARTMENT OF INFORMATION TECHNOLOGY III Year B. Tech. IT II Sem L T/P/D C 5 -/- / - 4 (R15A0526) data WAREHOUSING AND data MINING Objectives: Understand the fundamental processes, concepts and techniques of data mining and develop an appreciation for the inherent complexity of the data -mining task.

Introduction to Knowledge Discovery in Databases(KDD) 5 2 I A Three Tier Data Warehouse Architecture 9 3 I DataWare House Models 11 4 II Introduction to Data Mining 16 5 II Architecture Data Mining 18 6 II Classification Data Mining 23 7 II Major Issues of Data mining 25 8 III Association Rules Mining 30 9 III

Tags:

  Introduction, Data, Warehousing, Data warehousing, Introduction to data

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of DIGITAL NOTES ON DATA WAREHOUSING AND DATA …

1 DWDM-MRCET Page 1 DIGITAL NOTES ON data WAREHOUSING AND data MINING III YEAR - II SEM (2018-19) DEPARTMENT OF INFORMATION TECHNOLOGY MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY (Autonomous Institution UGC, Govt. of India) (Affiliated to JNTUH, Hyderabad, Approved by AICTE - Accredited by NBA & NAAC A Grade - ISO 9001:2015 Certified) Maisammaguda, Dhulapally (Post Via. Hakimpet), Secunderabad 500100, Telangana State, INDIA. DWDM-MRCET Page 2 MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY DEPARTMENT OF INFORMATION TECHNOLOGY III Year B. Tech. IT II Sem L T/P/D C 5 -/- / - 4 (R15A0526) data WAREHOUSING AND data MINING Objectives: Understand the fundamental processes, concepts and techniques of data mining and develop an appreciation for the inherent complexity of the data -mining task.

2 Characterize the kinds of patterns that can be discovered by association rule mining. Evaluate methodological issues underlying the effective application of data mining. Advance research skills through the investigation of data -mining literature. UNIT I data Warehouse: introduction to data Warehouse,Difference between operational database systems and data warehouses, data Warehouse Characateristics, data Warehouse Architecture and its components, Extraction-Transformation-Loading, Logical(Multi-Dimensional), data Modeling,Schema Design, Star and Snow-Flake Schema,fact Constellation, Fact Table, Fully Additive, Semi Additive, Non Additive Measures; Fact-less Facts, Dimension Table Characteristics,OLAP Cube, OLAP Operations,OLAP Server Architecture-ROLAP,MOLAP and HOLAP.

3 UNIT II introduction : Fundamentals of data mining, data Mining Functionalities, Classification of data Mining systems, data Mining Task Primitives, Integration of a data Mining System with a Database or a data Warehouse System, Major issues in data Mining. data Preprocessing: Need for Preprocessing the data , data Cleaning, data Integration and Transformation, data Reduction, Discretization and Concept Hierarchy Generation. UNIT-III DWDM-MRCET Page 3 Association Rules: Problem Definition, Frequent Item Set Generation, The APRIORI Principle, Support and Confidence Measures, Association Rule Generation; APRIOIRI Algorithm, The Partition Algorithms, FP-Growth Algorithms, Compact Representation of Frequent Item Set- Maximal Frequent Item Set, Closed Frequent Item Set.

4 UNIT IV Classification: Problem Definition, General Approaches to solving a classification problem, Evaluation of Classifiers , Classification techniques, Decision Trees-Decision tree Construction, Methods for Expressing attribute test conditions, Measures for Selecting the Best Split, Algorithm for Decision tree Induction ; Naive-Bayes Classifier, Bayesian Belief Networks; K- Nearest neighbor classification-Algorithm and Characteristics. Prediction: Accuracy and Error measures. Evaluating the accuracy of a Classifier or a Predictor, Ensemble Methods UNIT V Cluster Analysis : Types of data in Cluster Analysis, A Categorization of Major Clustering Methods, Partitioning Methods, Hierarchical Methods, Density-Based Methods, Grid-Based Methods, Model-Based Clustering Methods, Outlier Analysis TEXT BOOKS: 1.

5 data Mining-Concepts and Techniques -Jiawei Han & Michel Kamber. Morten Publisher 2nd Edition, 2006. REFERENCE BOOKS: data Mining Introductory and advanced topics -Margaret H Dunham. Pearson education. data Mining Techniques - Arun K Pujari. University Press. data WAREHOUSING in the Real World- Sam Aanhory & Dennis Murray Pearson in Edn Asia.. data WAREHOUSING Fundamentals-Paulraj Ponnaiah Wiley student Edition The data Warehouse Life cycle Tool kit-Ralph Kimball Wiley student edition Outcomes: At the end of this course the student should be able to Acquire knowledge about different data mining models and techniques. Explore various data mining and data WAREHOUSING application areas.

6 Demonstrate an appreciation of the importance of paradigms from the fields of Artificial Intelligence and Machine Learning to data mining. DWDM-MRCET Page 4 MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY DEPARTMENT OF INFORMATION TECHNOLOGY INDEX S. No Unit Topic Page no 1 I introduction to Knowledge Discovery in Databases(KDD) 5 2 I A Three Tier data Warehouse Architecture 9 3 I DataWare House Models 11 4 II introduction to data Mining 16 5 II Architecture data Mining 18 6 II Classification data Mining 23 7 II Major Issues of data mining 25 8 III Association Rules Mining 30 9 III Efficient Frequent Itemset Mining Methods 36 10 III Approaches For Mining Multilevel Associations 42 11 IV Classification and Prediction 46 12 IV Classification by Decision Tree 49 13 IV Bayesian Classification 52 14 IV K-Nearest Neighbor Calssifier 59 15 V Cluster Analysis 66 16 V Classical Partitioning Methods 72 17

7 V Outlier Analysis 79 DWDM-MRCET Page 5 MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY DEPARTMENT OF INFORMATION TECHNOLOGY UNIT-I Knowledge Discovery in Databases(KDD) Some people treat data mining same as Knowledge discovery while some people view data mining essential step in process of knowledge discovery. Here is the list of steps involved in knowledge discovery process: data Cleaning - In this step the noise and inconsistent data is removed. data Integration - In this step multiple data sources are combined. data Selection - In this step relevant to the analysis task are retrieved from the database. data Transformation - In this step data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations.

8 data Mining - In this step intelligent methods are applied in order to extract data patterns. Pattern Evaluation - In this step, data patterns are evaluated. Knowledge Presentation - In this step,knowledge is represented. DWDM-MRCET Page 6 The following diagram shows the process of knowledge discovery process: Architecture of KDD data Warehouse: A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process. DWDM-MRCET Page 7 Subject-Oriented: A data warehouse can be used to analyze a particular subject area.

9 For example, "sales" can be a particular subject. Integrated: A data warehouse integrates data from multiple data sources. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product. Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where often only the most recent data is kept. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses associated with a customer.

10 Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data warehouse should never be altered. data Warehouse Design Process: A data warehouse can be built using a top-down approach, a bottom-up approach, or a combination of both. The top-down approach starts with the overall design and planning. It is useful in cases where the technology is mature and well known, and where the business problems that must be solved are clear and well understood. The bottom-up approach starts with experiments and prototypes. This is useful in the early stage of business modeling and technology development.


Related search queries