Example: biology

DIGITAL NOTES ON BIG DATA ANALYTICS B.TECH IV YEAR - I …

B I G D A T A A N A L Y T I C S Page 1 DIGITAL NOTES ON BIG data ANALYTICS IV YEAR - I SEM (2019-20) DEPARTMENT OF INFORMATION TECHNOLOGY MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY (Autonomous Institution UGC, Govt. of India) (Affiliated to JNTUH, Hyderabad, Approved by AICTE - Accredited by NBA & NAAC A Grade - ISO 9001:2015 Certified) Maisammaguda, Dhulapally (Post Via. Hakimpet), Secunderabad 500100, Telangana State, INDIA. B I G D A T A A N A L Y T I C S Page 2 MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY DEPARTMENT OF INFORMATION TECHNOLOGY SYLLABUS (R15A0530) BIG data ANALYTICS (ASSOCIATE ANALYTICS II) (Elective III) Unit I: data Management (NOS 2101): Design data Architecture and manage the data for analysis, understand various sources of data like Sensors/signal/GPS etc.

Introduction, workplace safety, Report Accidents & Emergencies, Protect health & safety as your work, course conclusion, assessment Unit II Big Data Tools (NOS 2101): Introduction to Big Data tools like Hadoop, Spark, Impala etc., Data ETL process, Identify gaps in the data and follow-up for decision making.

Tags:

  Introduction, Data

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of DIGITAL NOTES ON BIG DATA ANALYTICS B.TECH IV YEAR - I …

1 B I G D A T A A N A L Y T I C S Page 1 DIGITAL NOTES ON BIG data ANALYTICS IV YEAR - I SEM (2019-20) DEPARTMENT OF INFORMATION TECHNOLOGY MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY (Autonomous Institution UGC, Govt. of India) (Affiliated to JNTUH, Hyderabad, Approved by AICTE - Accredited by NBA & NAAC A Grade - ISO 9001:2015 Certified) Maisammaguda, Dhulapally (Post Via. Hakimpet), Secunderabad 500100, Telangana State, INDIA. B I G D A T A A N A L Y T I C S Page 2 MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY DEPARTMENT OF INFORMATION TECHNOLOGY SYLLABUS (R15A0530) BIG data ANALYTICS (ASSOCIATE ANALYTICS II) (Elective III) Unit I: data Management (NOS 2101): Design data Architecture and manage the data for analysis, understand various sources of data like Sensors/signal/GPS etc.

2 data Management, data Quality (noise, outliers, missing values, duplicate data ) and data Pre-processing. Export all the data onto Cloud ex. AWS/Rackspace etc. Maintain Healthy, Safe & Secure Working Environment (NOS 9003): introduction , workplace safety, Report Accidents & Emergencies, Protect health & safety as your work, course conclusion, assessment Unit II Big data Tools (NOS 2101): introduction to Big data tools like Hadoop, Spark, Impala etc., data ETL process, Identify gaps in the data and follow-up for decision making. Provide data /Information in Standard Formats (NOS 9004): introduction , Knowledge Management, Standardized reporting & compliances, Decision Models, course conclusion. Assessment. Unit III Big data ANALYTICS : Run descriptives to understand the nature of the available data , collate all the data sources to suffice business requirement, Run descriptive statistics for all the variables and observer the data ranges, Outlier detection and elimination.

3 Unit IV Machine Learning Algorithms (NOS 9003): Hypothesis testing and determining the multiple analytical methodologies, Train Model on 2/3 sample data using various Statistical/Machine learning algorithms, Test model on 1/3 sample for prediction etc. Unit V (NOS 9004) data Visualization (NOS 2101): Prepare the data for Visualization, Use tools like Tableau, ()lickView and D3, Draw insights out of Visualization tool. Product Implementation B I G D A T A A N A L Y T I C S Page 3 TEXT BOOK 1. Student's Handbook for Associate ANALYTICS . REFERENCE BOOKS: 1. introduction to data Mining, Tan, Steinbach and Kumar, Addison Wesley, 2006 2. data Mining Analysis and Concepts, M. Zaki and W. Meira (the authors have kindly made an online version available): 3.

4 Mining of Massive Datasets Jure Leskovec Stanford Univ. Anand RajaramanMilliway Labs Jeffrey D. Ullman Stanford Univ. 4. ( :__Big_Data_Analysis) B I G D A T A A N A L Y T I C S Page 4 MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY DEPARTMENT OF INFORMATION TECHNOLOGY INDEX S. No Unit Topic Page no 1 I Design data Architecture and manage the data for analysis 6 2 I understand various sources of data like Sensors/signal/GPS etc. 7 3 I data Management, data Quality (noise, outliers, missing values, duplicate data ) 8 4 I data Pre-processing 9 5 I Export all the data onto Cloud ex. AWS/Rackspace etc. 11 7 I introduction , workplace safety, Report Accidents & Emergencies, Protect health & safety as your work, course conclusion, assessment 13 8 II introduction to Big data tools like Hadoop, Spark, Impala etc, data ETL process, Identify gaps in the data and follow-up for decision making.

5 19 9 II Provide data /Information in Standard Formats 20 10 II Knowledge Management 22 11 II Standardized reporting & compliances 24 12 II Decision Models, Course conclusion. Assessment 25 13 III Run descriptives to understand the nature of the available data 28 14 III collate all the data sources to suffice business requirement 32 15 III Run descriptive statistics for all the variables and observer the data ranges 33 B I G D A T A A N A L Y T I C S Page 5 16 III Outlier detection and elimination 36 17 IV Hypothesis testing and determining the multiple analytical methodologies 37 18 IV Train Model on 2/3 sample data using various Statistical/Machine learning algorithms, 39 19 IV Test model on 1/3 sample for prediction etc. 40 20 V Prepare the data for Visualization 41 21 V Use tools like Tableau, ()lickView and D3 42 22 V Draw insights out of Visualization tool.

6 Product Implementation 43 B I G D A T A A N A L Y T I C S Page 6 UNIT I data Management (NOS 2101) Design data Architecture and manage the data for analysis data architecture is composed of models, policies, rules or standards that govern which data is collected, and how it is stored, arranged, integrated, and put to use in data systems and in organizations. data is usually one of several architecture domains that form the pillars of an enterprise architecture or solution architecture. Various constraints and influences will have an effect on data architecture design. These include enterprise requirements, technology drivers, economics, business policies and data processing needs. Enterprise requirements These will generally include such elements as economical and effective system expansion, acceptable performance levels (especially system access speed), transaction reliability, and transparent data management.

7 In addition, the conversion of raw data such as transaction records and image files into more useful information forms through such features as data warehouses is also a common organizational requirement, since this enables managerial decision making and other organizational processes. One of the architecture techniques is the split between managing transaction data and (master) reference data . Another one is splitting data capture systems from data retrieval systems (as done in a data warehouse). Technology drivers These are usually suggested by the completed data architecture and database architecture designs. In addition, some technology drivers will derive from existing organizational integration frameworks and standards, organizational economics, and existing site resources ( previously purchased software licensing).

8 Economics These are also important factors that must be considered during the data architecture phase. It is possible that some solutions, while optimal in principle, may not be potential candidates due to their cost. External factors such as the business cycle, interest rates, market conditions, and legal considerations could all have an effect on decisions relevant to data architecture. Business policies Business policies that also drive data architecture design include internal organizational policies, rules of regulatory bodies, professional standards, and applicable governmental laws that can vary by applicable agency. These policies and rules will help describe the manner in which enterprise wishes to process their data . data processing needs These include accurate and reproducible transactions performed in high volumes, data warehousing for the support of management information systems (and potential data mining), repetitive periodic reporting, ad hoc reporting, and support of various organizational initiatives as required ( annual budgets, new product development).

9 B I G D A T A A N A L Y T I C S Page 7 The General Approach is based on designing the Architecture at three Levels of Specification :- The Logical Level The Physical Level The Implementation Level Understand various sources of the data data can be generated from two types of sources namely Primary and Secondary Sources of Primary data The sources of generating primary data are - Observation Method Survey Method Experimental Method Experimental Method There are number of experimental designs that are used in carrying out and experiment. However, Market researchers have used 4 experimental designs most frequently. These are - CRD - Completely Randomized Design RBD - Randomized Block Design - The term Randomized Block Design has originated from agricultural research.

10 In this design several treatments of variables are applied to different blocks of land to ascertain their effect on the yield of the crop. Blocks are formed in such a manner that each block contains as many plots as a number of treatments so that one plot from each is selected at random for each treatment. The production of each plot is measured after the treatment is given. These data are then interpreted and inferences are drawn by using the analysis of Variance Technique so as to know the effect of various treatments like different dozes of fertilizers, different types of irrigation etc. LSD - Latin Square Design - A Latin square is one of the experimental designs which has a balanced two way classification scheme say for example - 4 X 4 arrangement.


Related search queries