Transcription of Data Mining: Concepts and Techniques - Elsevier
1 data mining : Concepts and Techniques2nd EditionSolution ManualJiawei HanandMicheline KamberThe University of Illinois at Urbana-Champaignc Morgan Kaufmann, 2006 Note: For Instructors reference only. Do not copy! Do not distribute!iiContents1 Exercises .. 32 data Exercises .. 133 data Warehouse and OLAP Technology: An Exercises .. 314 data Cube Computation and data Exercises .. 415 mining Frequent Patterns, Associations, and Exercises .. 536 Classification and Exercises .. 697 Cluster Exercises .. 798 mining Stream, Time-Series, and Sequence Exercises .. 919 Graph mining , Social Network Analysis, and Multirelational data Exercises .. 10310 mining Object, Spatial, Multimedia, Text, and Web Exercises .. 11111 Applications and Trends in data Exercises.
2 12312 CONTENTSC hapter isdata mining ? In your answer, address the following:(a)Is it another hype?(b)Is it a simple transformation of technology developed from databases, statistics, and machine learning?(c)Explain how the evolution of database technology led to data mining .(d)Describe the steps involved in data mining when viewed as a process of knowledge : data miningrefers to the process or method that extracts or mines interesting knowledge or patternsfrom large amounts of data .(a) Is it another hype? data mining is not another hype. Instead, the need for data mining has arisen due to the wideavailability of huge amounts of data and the imminent need for turning such data into useful informationand knowledge. Thus, data mining can be viewed as the result of the natural evolution of informationtechnology.
3 (b) Is it a simple transformation of technology developed from databases, statistics, and machine learning?No. data mining is more than a simple transformation of technology developed from databases, sta-tistics, and machine learning. Instead, data mining involves an integration, rather than a simpletransformation, of Techniques from multiple disciplines such as database technology, statistics, ma-chine learning, high-performance computing, pattern recognition, neural networks, data visualization ,information retrieval, image and signal processing, and spatial data analysis.(c) Explain how the evolution of database technology led to data technology began with the development of data collection and database creation mechanismsthat led to the development of effective mechanisms for data management including data storage andretrieval, and query and transaction processing.
4 The large number of database systems offering queryand transaction processing eventually and naturally led to the need for data analysis and , data mining began its development out of this necessity.(d) Describe the steps involved in data mining when viewed as a process of knowledge steps involved in data mining when viewed as a process of knowledge discovery are as follows: data cleaning, a process that removes or transforms noise and inconsistent data data integration, where multiple data sources may be combined34 CHAPTER 1. INTRODUCTION data selection, where data relevant to the analysis task are retrieved from the database data transformation, where data are transformed or consolidated into forms appropriate formining data mining , an essential process where intelligent and efficient methods are applied in order toextract patterns Pattern evaluation, a process that identifies the truly interesting patterns representing knowl-edge based on some interestingness measures Knowledge presentation, where visualization and knowledge representation Techniques are usedto present the mined knowledge to the an example where data mining is crucial to the success of a business.
5 Whatdata mining functionsdoes this business need? Can they be performed alternatively by data query processing or simple statisticalanalysis?Answer:A department store, for example, can use data mining to assist with its target marketing mail data mining functions such as association, the store can use the mined strong association rules todetermine which products bought by one group of customers are likely to lead to the buying of certainother products. With this information, the store can then mail marketing materials only to those kinds ofcustomers who exhibit a high likelihood of purchasing additional products. data query processing is usedfor data or information retrieval and does not have the means for finding association rules. Similarly, simplestatistical analysis cannot handle large amounts of data such as those of customer records in a your task as a software engineer atBig-Universityis to design a data mining system to examinetheir university course database, which contains the following information: the name, address, and status( , undergraduate or graduate) of each student, the courses taken, and their cumulative grade pointaverage (GPA).
6 Describe thearchitectureyou would choose. What is the purpose of each component of thisarchitecture?Answer:A data mining architecture that can be used for this application would consist of the following majorcomponents: Adatabase, data warehouse, or other information repository, which consists of the set ofdatabases, data warehouses, spreadsheets, or other kinds of information repositories containing thestudent and course information. Adatabase or data warehouse server, which fetches the relevant data based on the users datamining requests. Aknowledge basethat contains the domain knowledge used to guide the search or to evaluate theinterestingness of resulting patterns. For example, the knowledge base may contain concept hierarchiesand metadata ( , describing data from multiple heterogeneous sources).
7 Adata mining engine, which consists of a set of functional modules for tasks such as classification,association, classification, cluster analysis, and evolution and deviation analysis. Apattern evaluation modulethat works in tandem with the data mining modules by employinginterestingness measures to help focus the search towards interesting patterns. A graphical user interfacethat provides the user with an interactive approach to the data is adata warehousedifferent from a database? How are they similar?Answer: Differences between a data warehouse and a database: Adata warehouseis a repository of informa-tion collected from multiple sources, over a history of time, stored under a unified schema, and used fordata analysis and decision support; whereas adatabase, is a collection of interrelated data that rep-resents the current status of the stored data .
8 There could be multiple heterogeneous databases wherethe schema of one database may not agree with the schema of another. A database system supportsad-hoc query and on-line transaction processing. Additional differences are detailed in Section between Operational Databases Systems and data Warehouses. Similarities between a data warehouse and a database: Both are repositories of information, storinghuge amounts of persistent describe the followingadvanced database systemsand applications: object-relational databases,spatial databases, text databases, multimedia databases, the World Wide : An objected-oriented databaseis designed based on the object-oriented programming paradigmwhere data are a large number of objects organized into classes and class hierarchies. Each entity inthe database is considered as an object.
9 The object contains a set of variables that describe the object,a set of messages that the object can use to communicate with other objects or with the rest of thedatabase system, and a set of methods where each method holds the code to implement a message. A spatial databasecontains spatial-related data , which may be represented in the form of rasteror vector data . Raster data consists ofn-dimensional bit maps or pixel maps, and vector data arerepresented by lines, points, polygons or other kinds of processed primitives, Some examples of spatialdatabases include geographical (map) databases, VLSI chip designs, and medical and satellite imagesdatabases. A text databaseis a database that contains text documents or other word descriptions in the form oflong sentences or paragraphs, such as product specifications, error or bug reports, warning messages,summary reports, notes, or other documents.
10 A multimedia databasestores images, audio, and video data , and is used in applications such aspicture content-based retrieval, voice-mail systems, video-on-demand systems, the World Wide Web,and speech-based user interfaces. TheWorld Wide Webprovides rich, world-wide, on-line information services, where data objectsare linked together to facilitate interactive access. Some examples of distributed information servicesassociated with the World Wide Web include America Online, Yahoo!, AltaVista, and each of the followingdata mining functionalities: characterization, discrimination, association andcorrelation analysis, classification, prediction, clustering, and evolution analysis. Give examples of each datamining functionality, using a real-life database that you are familiar : Characterizationis a summarization of the general characteristics or features of a target class ofdata.