Example: marketing

An example of statistical data analysis using the R ...

Tutorial:An example of statistical data analysisusing the R environment for statistical computingD G RossiterVersion ; May 6, 2017llllllll1020304050607080102030405060 7080 Topsoil clay %Subsoil clay %Subsoil vs. topsoil clay, by zonel1234 Slopes:zone 1 : 2 : 3 : 4 : : 15 10 5051015 Regression Residuals vs. Fitted Values, subsoil clay %FittedResidual1781119128137138139145660 0006700006800006900007000003150003200003 25000330000335000340000 GLS 2nd order trend surface, subsoil clay %ENCopyright D G Rossiter 2008 2010, 2014, 2017 All rights reserved. Repro-duction and dissemination of the work as a whole (not parts) freely permitted ifthis original copyright notice is included. Sale or placement on a web site wherepayment must be made to access this document is strictly prohibited. To adaptor translate please contact the author Introduction12 example Data Loading the dataset.

This tutorial presents a data analysis sequence which may be applied to en-vironmental datasets, using a small but typical data set of multivariate point observations. It is aimed at students in geo-information application elds who have some experience with basic statistics, but not necessarily with statistical computing. Five aspects are ...

Tags:

  Analysis, Applied, Multivariate

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of An example of statistical data analysis using the R ...

1 Tutorial:An example of statistical data analysisusing the R environment for statistical computingD G RossiterVersion ; May 6, 2017llllllll1020304050607080102030405060 7080 Topsoil clay %Subsoil clay %Subsoil vs. topsoil clay, by zonel1234 Slopes:zone 1 : 2 : 3 : 4 : : 15 10 5051015 Regression Residuals vs. Fitted Values, subsoil clay %FittedResidual1781119128137138139145660 0006700006800006900007000003150003200003 25000330000335000340000 GLS 2nd order trend surface, subsoil clay %ENCopyright D G Rossiter 2008 2010, 2014, 2017 All rights reserved. Repro-duction and dissemination of the work as a whole (not parts) freely permitted ifthis original copyright notice is included. Sale or placement on a web site wherepayment must be made to access this document is strictly prohibited. To adaptor translate please contact the author Introduction12 example Data Loading the dataset.

2 A normalized database structure* ..53 Research questions84 Univariarte Univariarte Exploratory Data analysis .. Point estimation; inference of the mean .. Answers ..155 Bivariate correlation and Conceptual issues in correlation and regression .. Bivariate Exploratory Data analysis .. Bivariate Correlation analysis .. Fitting a regression line .. Bivariate Linear Regression .. Bivariate Regression analysis from scratch* .. Regression diagnostics .. to observed data .. residuals .. of residuals .. * .. * .. Prediction .. Robust regression* .. Structural analysis * .. Structural analysis by Principal Components* .. A more difficult case .. Non-parametric correlation .. Answers ..536 One-way analysis of Variance (ANOVA) Exploratory Data analysis .

3 One-way ANOVA .. ANOVA as a linear model* .. Means separation* .. One-way ANOVA from scratch* .. Answers ..667 multivariate correlation and Multiple Correlation analysis .. simple correlations .. partial correlations .. Multiple Regression analysis .. Comparing regression models .. regression models with the adjustedR2.. regression models with the AIC .. regression models with ANOVA .. Stepwise multiple regression* .. Combining discrete and continuous predictors .. Diagnosing multi-colinearity .. Visualising parallel regression* .. Interactions* .. analysis of covariance* .. Design matrices for combined models* .. Answers ..968 Factor Principal components analysis .. synthetic variables* .. * .. * .. * .. Factor analysis *.

4 Answers .. 1179 Postplots .. Trend surfaces .. Higher-order trend surfaces .. Local spatial dependence and Ordinary Kriging .. objects .. of local spatial structure .. by Ordinary Kriging .. Answers .. 13810 Going further140 References141 Index of R concepts146A Derivation of the hat Influence of values on prediction .. 147ii1 IntroductionThis tutorial presents a data analysis sequence which may be applied to en-vironmental datasets, using a small but typical data set of multivariate pointobservations. It is aimed at students in geo-information application fields whohave some experience with basic statistics, but not necessarily with statisticalcomputing. Five aspects are emphasised:1. Placing statistical analysis in the framework of research questions;2.

5 Moving from simple to complex methods: first exploration, then selectionof promising modelling approaches;3. Visualising as well as computing;4. Making correct inferences;5. statistical computation and analysis is carried out in the R environment for statistical computing andvisualisation [16], which is an open-source dialect of the S statistical computinglanguage. It is free, runs on most computing platforms, and contains contribu-tions from top computational statisticians. If you are unfamiliar with R, see themonograph Introduction to the R Project for statistical Computing for use atITC [30], the R Project s introduction to R [28], or one of the many tutorialsavailable via the R web help is available for all R methods using the?methodsyntax at thecommand prompt; for example ?lmopens a window with help for thelm(fitlinear models) :These notes use R rather than one of the many commercial statisticsprograms because R is a completestatistical computing environment, based ona modern computing language (accessible to the user), and with packages con-tributed by leading computational statisticians.

6 R allows unlimited flexibility andsophistication. Press the button and fill in the box is certainly faster but aswith Windows word processors, what you see isallyou get . With R it may bea bit harder at first to do simple things, but you are not limited. R is completelyfree, can be freely-distributed, runs on all desktop computing platforms, is regu-larly updated, is well-documented both by the developers and users, is the subjectof several good statistical computing texts, and has an active user introductory textbook with similar intent to these notes, but with a wider setof examples, is by Dalgaard [7]. A more advanced text, with many interestingapplications, is by Venables and Ripley [35]. Fox [12] is an extensive explanationof regression modelling; the companion Fox and Weisberg [14] shows how to useR for this, mostly with social sciences tutorial follows a data analysis problem typical of earth sciences, natural andwater resources, and agriculture, proceeding from visualisation and explorationthrough univariate point estimation, bivariate correlation and regression analysis , multivariate factor analysis , analysis of variance, and finally some each section, there are sometasks, for which a possible solution is shown assomeR codeto be typed at the console (or cut-and-pasted from the PDF versionof this document, or loaded from the code files).

7 Then thereare somequestionsto answer, based on the output of the task. Sampleanswersare found at the end of each readers may want to skip more advanced sections or those that explainthe mathematics behind the methods in more detail; these are marked with anOptionalsectionsasterisk * in the section title and in the table of notes only scratch the surface of R s capabilities. In particular, the reader isGoingfurtherencouraged to consult the on-line help as necessary to understand all the optionsof the methods used. Neither do these notes pretend to teach statistical inference;the reader should refer to a statistics reference as necessary; some good choices,depending on your background and the application, are Brownlee [3], Bulmer[4], Dalgaard [7] (general); Davis [9] (geology),Wilks [39] (meteorology); Snedecorand Cochran [31], Steel et al.

8 [34] (agriculture); Legendre and Legendre [17](ecology); and Webster and Oliver [38] (soil science).See also 10, Going further , at the end of the example Data SetThis data set, fully described in Yemefack [40] and summarized in Yemefack et al.[41], contains 147 soil profile observations from the research area of the Tropen-bos Cameroon Programme (TCP), representative of the humid forest region ofsouthwestern Cameroon and adjacent areas of Equatorial Guinea and fixed soil layers (0 10 cm, 10 20 cm, and 30 50 cm) were sampled. Thedata set is from two sources. First, 45 representative soil profiles were describedand sampled by genetic horizon. Soil characteristics for each of the three fixed lay-ers were computed as weighted averages using genetic horizon thickness. Second,102 plots from various land use/land cover types were sampled at the three fixeddepths.

9 Each of these samples was a bulked composite of five sub-samples takenwith an auger in a plot diagonal basis. For both data sets, samples were locatedpurposively and subjectively to represent soil and land use types. Laboratoryanalysis was by standard local methods [23].For this exercise, we have selected three soil properties:1. Clay content (codeClay), weight % of the mineral fine earth (<2mm);2. Cation exchange capacity (codeCEC), cmol+(kg soil)-13. Organic carbon (codeOC), volume % of the fine three variables are related; in particular we know from theory and manydetailed studies that the CEC of a soil depends on reactive sites, either on claycolloids or on organic complexes such as humus, where cations (such as K+andCa++) can be easily adsorbed and desorbed [22, 32].The CEC is important for soil management, since it controls howmuch added artificial or natural fertiliser or liming materials will be2retained by the soil for a long-lasting effect on crop growth.

10 Heavydoses of fertiliser on soils with low CEC will be wasted, since the extranutrients will addition, for each observation the following site information was recorded: East and North Coordinates, UTM Zone 32N, WGS84 datum, in meters(codeseandn) Elevation in meters above sea level (codeelev) Agro-ecological zone, arbitrary code (codezone) Reference soil group, arbitrary code (codewrb1) Land cover type (codeLC)The soil group codes refer to Reference Groups of the World Reference Base forSoil Resources (WRB) , the international soil classification system [11]. Theseare presented in the text file as integer codes which correspond to three of the31 Reference Groups identified worldwide, and which differ substantially in theirproperties and response to management [10]:1. Acrisols (from the Haplic, Ferralic, and Plinthic subgroups)2.


Related search queries