Example: tourism industry

R and Data Mining: Examples and Case Studies

R and Data Mining: Examples and Case Studies1 Yanchang 20, 20151c 2012-2015 Yanchang Zhao. Published by Elsevier in December 2012. All rights from the AuthorCase Studies :The case Studies are not included in this online version. They are reserved exclu-sively for a book version published by Elsevier in December version:The latest online version is available at links below. See the websites also for anR Reference Card for Data Mining. (for readers having no access to above website)R code, data and FAQs:R code, data and FAQs are provided at links below. to add:topic modelling and stream graph; spatial data analysis; perfor-mance evaluation of classification/prediction models (with ROC and AUC); parallel computingand big data.

be broken into six major phases: business understanding, data understanding, data preparation, modeling, evaluation and deployment, as de ned by the CRISP-DM (Cross Industry Standard Process for Data Mining)1. This book focuses on the modeling phase, with data exploration and model evaluation involved in some chapters.

Tags:

  Preparation

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of R and Data Mining: Examples and Case Studies

1 R and Data Mining: Examples and Case Studies1 Yanchang 20, 20151c 2012-2015 Yanchang Zhao. Published by Elsevier in December 2012. All rights from the AuthorCase Studies :The case Studies are not included in this online version. They are reserved exclu-sively for a book version published by Elsevier in December version:The latest online version is available at links below. See the websites also for anR Reference Card for Data Mining. (for readers having no access to above website)R code, data and FAQs:R code, data and FAQs are provided at links below. to add:topic modelling and stream graph; spatial data analysis; perfor-mance evaluation of classification/prediction models (with ROC and AUC); parallel computingand big data.

2 Please let me know if some topics are interesting to you but not covered yet by and feedback:If you have any questions or comments, or come across any problemswith this document or its book version, please feel free to post them tothe RDataMining groupbelow or email them to me. forum:Please join our discussions on R and data mining atthe RDataMining group(16,000+ members, as of October 2015) on LinkedIn< >.Twitter:Follow @RDataMining on Twitter (2,200+ followers, as of October 2015).A sister book:See a new edited book titledData Mining Application with Rat links below,which features 15 real-world applications on data mining with R.

3 Of FiguresvList of Abbreviationsvii1 Data Mining .. R .. Basics .. Datasets .. Iris Dataset .. Bodyfat Dataset ..42 Data Import and Save and Load R Data .. Import from and Export .. Import Data from SAS .. Import/Export via ODBC .. from Databases .. to and Input from EXCEL Files .. Read and Write EXCEL files with packagexlsx.. Further Readings ..113 Data Exploration and Have a Look at Data .. Explore Individual Variables .. Explore Multiple Variables .. More Explorations .. Save Charts into Files .. Further Readings.

4 324 Decision Trees and Random Decision Trees with Packageparty.. Decision Trees with Packagerpart.. Random Forest ..405 Linear Regression .. Logistic Regression .. Generalized Linear Regression .. Non-linear Regression ..52iiiCONTENTS6 The k-Means Clustering .. The k-Medoids Clustering .. Hierarchical Clustering .. Density-based Clustering ..577 Outlier Univariate Outlier Detection .. Outlier Detection with LOF .. Outlier Detection by Clustering .. Outlier Detection from Time Series .. Discussions ..728 Time Series Analysis and Time Series Data in R.

5 Time Series Decomposition .. Time Series Forecasting .. Time Series Clustering .. Time Warping .. Control Chart Time Series Data .. Clustering with Euclidean Distance .. Clustering with DTW Distance .. Time Series Classification .. with Original Data .. with Extracted Features .. Classification .. Discussions .. Further Readings ..889 Association Basics of Association Rules .. The Titanic Dataset .. Association Rule Mining .. Removing Redundancy .. Interpreting Rules .. Visualizing Association Rules .. Further Readings ..9910 Text Retrieving Text from Twitter.

6 Transforming Text .. Stemming Words .. Building a Term-Document Matrix .. Frequent Terms and Associations .. Word Cloud .. Clustering Words .. Clustering Tweets .. Clustering Tweets with thek-means Algorithm .. Clustering Tweets with thek-medoids Algorithm .. Packages, Further Readings and Discussions .. 114 CONTENTSiii11 Social Network Network of Terms .. Network of Tweets .. Two-Mode Network .. Discussions and Further Readings .. 12912 Case Study I: Analysis and Forecasting of House Price Indices13113 Case Study II: Customer Response Prediction and Profit Optimization13314 Case Study III: Predictive Modeling of Big Data with Limited Memory13515 Online R Reference Cards.

7 R .. Data Mining .. Data Mining with R .. Classification/Prediction with R .. Time Series Analysis with R .. Association Rule Mining with R .. Spatial Data Analysis with R .. Text Mining with R .. Network Analysis with R .. Cleansing and Transformation with R .. Data and Parallel Computing with R .. 141 Bibliography143 General Index149 Package Index151 Function Index153 Appendix: Book Promotion - Data Mining Applications with R155ivCONTENTSList of RStudio .. Histogram .. Density .. Pie Chart .. Bar Chart .. Boxplot.

8 Scatter Plot .. Scatter Plot with Jitter .. Smooth Scatter Plot .. A Matrix of Scatter Plots .. 3D Scatter plot .. Heat Map .. Level Plot .. Contour .. 3D Surface .. Parallel Coordinates .. Parallel Coordinates with Packagelattice.. Scatter Plot with Packageggplot2.. Decision Tree .. Decision Tree (Simple Style) .. Decision Tree with Packagerpart.. Selected Decision Tree .. Prediction Result .. Error Rate of Random Forest .. Variable Importance .. Margin of Predictions .. Australian CPIs in Year 2008 to 2010 .. Prediction with Linear Regression Model - 1.

9 A 3D Plot of the Fitted Model .. Prediction of CPIs in 2011 with Linear Regression Model .. Prediction with Generalized Linear Regression Model .. Results of k-Means Clustering .. Clustering with thek-medoids Algorithm - I .. Clustering with thek-medoids Algorithm - II .. Cluster Dendrogram .. Density-based Clustering - I .. Density-based Clustering - II .. Density-based Clustering - III ..60vviLIST OF Prediction with Clustering Model .. Univariate Outlier Detection with Boxplot .. Outlier Detection - I .. Outlier Detection - II .. Density of outlier factors.

10 Outliers in a Biplot of First Two Principal Components .. Outliers in a Matrix of Scatter Plots .. Outliers with k-Means Clustering .. Outliers in Time Series Data .. A Time Series ofAirPassengers.. Seasonal Component .. Time Series Decomposition .. Time Series Forecast .. Alignment with Dynamic Time Warping .. Six Classes in Synthetic Control Chart Time Series .. Hierarchical Clustering with Euclidean Distance .. Hierarchical Clustering with DTW Distance .. Decision Tree .. Decision Tree with DWT .. A Scatter Plot of Association Rules.


Related search queries