Example: marketing

Statistics and Computing - Academia.dk

Statistics and ComputingSeries Editors:J. ChambersD. HandW. H ardleStatistics and ComputingBrusco/Stahl:Branch and Bound Applications in CombinatorialData AnalysisChambers:Software for Data Analysis: Programming withRDalgaard: introductory Statistics withR, 2nd :Elements of Computational StatisticsGentle:Numerical Linear Algebra for Applications in StatisticsGentle:Random Number Generation and MonteCarlo Methods, 2nd ardle/Klinke/Turlach:XploRe: An Interactive StatisticalComputing EnvironmentH ormann/Leydold/Derflinger:Automatic Nonuniform RandomVariate GenerationKrause/Olson:The Basics of S-PLUS, 4th :Numerical Analysis for StatisticiansLemmon/Schafer:Developing Statistical Software in Fortran 95 Loader:Local Regression and LikelihoodMarasinghe/Kennedy:SAS for Data Analysis: IntermediateStatistical Methods O Ruanaidh/Fitzgerald:Numerical Bayesian Methods Applied toSignal ProcessingPannatier:VARIOWIN: Software for Spatial Data Analysis in 2 DPinheiro/Bates:Mixed-Effects Models in S and S-PLUSU nwin/Theus/Hofmann.

Statistics and Computing Brusco/Stahl: Branch and Bound Applications in Combinatorial Data Analysis Chambers: Software for Data Analysis: Programming with R Dalgaard: Introductory Statistics with R, 2nd ed.

Tags:

  With, Statistics, Introductory, Introductory statistics with r

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Statistics and Computing - Academia.dk

1 Statistics and ComputingSeries Editors:J. ChambersD. HandW. H ardleStatistics and ComputingBrusco/Stahl:Branch and Bound Applications in CombinatorialData AnalysisChambers:Software for Data Analysis: Programming withRDalgaard: introductory Statistics withR, 2nd :Elements of Computational StatisticsGentle:Numerical Linear Algebra for Applications in StatisticsGentle:Random Number Generation and MonteCarlo Methods, 2nd ardle/Klinke/Turlach:XploRe: An Interactive StatisticalComputing EnvironmentH ormann/Leydold/Derflinger:Automatic Nonuniform RandomVariate GenerationKrause/Olson:The Basics of S-PLUS, 4th :Numerical Analysis for StatisticiansLemmon/Schafer:Developing Statistical Software in Fortran 95 Loader:Local Regression and LikelihoodMarasinghe/Kennedy:SAS for Data Analysis: IntermediateStatistical Methods O Ruanaidh/Fitzgerald:Numerical Bayesian Methods Applied toSignal ProcessingPannatier:VARIOWIN: Software for Spatial Data Analysis in 2 DPinheiro/Bates:Mixed-Effects Models in S and S-PLUSU nwin/Theus/Hofmann.

2 Graphics of Large Datasets:Visualizing a MillionVenables/Ripley:Modern Applied Statistics with S, 4th :S ProgrammingWilkinson:The Grammar of Graphics, 2nd DalgaardIntroductory Statistics withRSecond Edition123 Peter DalgaardDepartment of BiostatisticsUniversity of 978-0-387-79053-4e-ISBN: 978-0-387-79054-1 DOI: of Congress Control Number: 2008932040c 2008 Springer Science+Business Media, LLCAll rights reserved. This work may not be translated or copied in whole or in part without the writtenpermission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York,NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Usein connection with any form of information storage and retrieval, electronic adaptation, computersoftware, or by similar or dissimilar methodology now known or hereafter developed is use in this publication of trade names, trademarks, service marks, and similar terms, even if theyare not identified as such, is not to be taken as an expression of opinion as to whether or not they aresubject to proprietary on acid-free Grete, for putting up with me for so longPrefaceRis a statistical computer program made available through the Internetunder the General Public License (GPL).

3 That is, it is supplied with a li-cense that allows you to use it freely, distribute it, or even sell it, as long asthe receiver has the same rights and the source code is freely available. Itexists for Microsoft Windows XP or later, for a variety of Unix and Linuxplatforms, and for Apple Macintosh OS an environment in which you can perform statistical analysisand produce graphics. It is actually a complete programming language,although that is only marginally described in this book. Here we contentourselves with learning the elementary concepts and seeing a number ofcookbook designed in such a way that it is always possible to do furthercomputations on the results of a statistical procedure. Furthermore, thedesign for graphical presentation of data allows both no-nonsense meth-ods, for exampleplot(x,y), and the possibility of fine-grained controlof the output s appearance. The fact thatRis based on a formal computerlanguage gives it tremendous flexibility. Other systems present simplerinterfaces in terms of menus and forms, but often the apparent user-friendliness turns into a hindrance in the longer run.

4 Although elementarystatistics is often presented as a collection of fixed procedures, analysisof moderately complex data requires ad hoc statistical model building,which makes the added flexibility ofRhighly PrefaceRowes its name to typical Internet humour. You may be familiar withthe programming language C (whose name is a story in itself). Inspiredby this, Becker and Chambers chose in the early 1980s to call their newlydeveloped statistical programming language S. This language was furtherdeveloped into the commercial product S-PLUS, which by the end of thedecade was in widespread use among statisticians of all kinds. Ross Ihakaand Robert Gentleman from the University of Auckland, New Zealand,chose to write a reduced version of S for teaching purposes, and what wasmore natural than choosing the immediately preceding letter? Ross andRobert s initials may also have played a 1995, Martin Maechler persuaded Ross and Robert to release the sourcecode forRunder the GPL.

5 This coincided with the upsurge in Open Sourcesoftware spurred by the Linux turned out to fill a gap forpeople like me who intended to use Linux for statistical Computing buthad no statistical package available at the time. A mailing list was set upfor the communication of bug reports and discussions of the August 1997, I was invited to join an extended international core teamwhose members collaborate via the Internet and that has controlled thedevelopment ofRsince then. The core team was subsequently expandedseveral times and currently includes 19 members. On February 29, 2000,version was released. As of this writing, the current version is book was originally based upon a set of notes developed for thecourse in Basic Statistics for Health Researchers at the Faculty of HealthSciences of the University of Copenhagen. The course had a primary tar-get of students for the degree in medicine. However, the materialhas been substantially revised, and I hope that it will be useful for a largeraudience, although some biostatistical bias remains, particularly in thechoice of later years, the course in Statistical Practice in Epidemiology, which hasbeen held yearly in Tartu, Estonia, has been a major source of inspirationand experience in introducing young statisticians and epidemiologists book is not a manual forR.

6 The idea is to introduce a number of basicconcepts and techniques that should allow the reader to get started withpractical terms of the practical methods, the book covers a reasonable curriculumfor first-year students of theoretical Statistics as well as for engineeringstudents. These groups will eventually need to go further and studymore complex models as well as general techniques involving actualprogramming in ixFor fields where elementary Statistics is taught mainly as a tool, the bookgoes somewhat further than what is commonly taught at the under-graduate level. Multiple regression methods or analysis of multifactorialexperiments are rarely taught at that level but may quickly become essen-tial for practical research. I have collected the simpler methods near thebeginning to make the book readable also at the elementary level. How-ever, in order to keep technical material together, Chapters 1 and 2 doinclude material that some readers will want to book is thus intended to be useful for several groups, but I will notpretend that it can stand alone for any of them.

7 I have included brieftheoretical sections in connection with the various methods, but morethan as teaching material, these should serve as reminders or perhaps asappetizers for readers who are new to the world of on the 2nd editionThe original first chapter was expanded and broken into two chapters,and a chapter on more advanced data handling tasks was inserted afterthe coverage of simpler statistical methods. There are also two new chap-ters on statistical methodology, covering Poisson regression and nonlinearcurve fitting, and a few items have been added to the section on de-scriptive Statistics . The original methodological chapters have been quiteminimally revised, mainly to ensure that the text matches the actual out-put of the current version ofR. The exercises have been revised, andsolution sketches now appear in Appendix , this book would not have been possible without the efforts ofmy friends and colleagues on theRCore Team, the authors of contributedpackages, and many of the correspondents of the e-mail discussion am deeply grateful for the support of my colleagues and co-teachersLene Theil Skovgaard, Bendix Carstensen, Birthe Lykke Thomsen, HelleRootzen, Claus Ekstr m, Thomas Scheike, and from the Tartu courseKrista Fischer, Esa L ra, Martyn Plummer, Mark Myatt, and MichaelHills, as well as the feedback from several students.

8 In addition, sev-eral people, including Bill Venables, Brian Ripley, and David James, gavevaluable advice on early drafts of the , profound thanks are due to the free software community at would not have been possible without their effort. For thex Prefacetypesetting of this book, TEX, LATEX, and the consolidating efforts of theLATEX2e project have been DalgaardCopenhagenApril 2008 ContentsPrefacevii1 First steps .. An overgrown calculator .. Assignments .. Vectorized arithmetic .. Standard procedures .. Graphics .. essentials .. Expressions and objects .. Functions and arguments .. Vectors .. Quoting and escape sequences .. Missing values .. Functions that create vectors .. Matrices and arrays .. Factors .. Lists .. Data frames .. Indexing .. Conditional selection .. Indexing of data frames .. Grouped data and data frames ..25xii Implicit loops .. Sorting .. Exercises ..282 Session management.

9 The workspace .. Textual output .. Scripting .. Getting help .. Packages .. Built-in data .. ,transform, andwithin.. The graphics subsystem .. Plot layout .. Building a plot from pieces .. Usingpar.. Combining plots .. Flow control .. Classes and generic functions .. Data entry .. Reading from a text file .. Further details .. The data editor .. Interfacing to other programs .. Exercises ..533 Probability and Random sampling .. Probability calculations and combinatorics .. Discrete distributions .. Continuous distributions .. The built-in distributions inR.. Densities .. Cumulative distribution functions .. Quantiles .. Random numbers .. Exercises ..654 Descriptive Statistics and Summary Statistics for a single group .. Graphical display of distributions .. Histograms ..71 Contents Empirical cumulative distribution .. Q Q plots .. Boxplots .. Summary Statistics by groups.

10 Graphics for grouped data .. Histograms .. Parallel boxplots .. Stripcharts .. Tables .. Generating tables .. Marginal tables and relative frequency .. Graphical display of tables .. Barplots .. Dotcharts .. Piecharts .. Exercises ..935 One- and two-sample One-samplettest .. Wilcoxon signed-rank test .. Two-samplettest .. Comparison of variances .. Two-sample Wilcoxon test .. The pairedttest .. The matched-pairs Wilcoxon test .. Exercises .. 1076 Regression and Simple linear regression .. Residuals and fitted values .. Prediction and confidence bands .. Correlation .. Pearson correlation .. Spearman s .. Kendall s .. Exercises .. 1247 Analysis of variance and the Kruskal Wallis One-way analysis of variance .. Pairwise comparisons and multiple testing .. Relaxing the variance assumption .. Graphical presentation.


Related search queries