Example: biology

PAST: Paleontological Statistics Software Package for ...

Palaeontologia Electronica , yvind, Harper, David , and Paul D. Ryan, 2001. past : Paleontological Statistics Software Package for Education and Data Analysis. Palaeontologia Electronica, vol. 4, issue 1, art. 4: 9pp., 178kb. : Paleontological Statistics Software Package FOR EDUCATION AND DATA ANALYSIS yvind Hammer, David Harper, and Paul D. Ryan yvind Hammer. Paleontological Museum, University of Oslo, Sars gate1, 0562 Oslo, Norway David A. T. Harper. Geological Museum, ster Voldgade 5-7, University of Copenhagen, DK-1350 Copen-hagen K, DenmarkPaul D. Ryan. Department of Geology, National University of Ireland, Galway, IrelandABSTRACTA comprehensive, but simple-to-use Software Package for executing a range ofstandard numerical analysis and operations used in quantitative paleontology hasbeen developed. The program, called past ( Paleontological Statistics ), runs on stan-dard Windows computers and is available free of charge. past integrates spread-sheet-type data entry with univariate and multivariate Statistics , curve fitting, time-series analysis, data plotting, and simple phylogenetic analysis.

Hierarchical clustering routines pro-duce a dendrogram showing how and where data points can be clustered (Davis 1986, Harper 1999). Clustering is one of the most commonly used methods of mul-tivariate data analysis in paleontology. Both R-mode clustering (groupings of taxa), and Q-mode clustering (grouping variables or associations) can be carried

Tags:

  Software, Variable, Past, Packages, Clustering, Software package

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of PAST: Paleontological Statistics Software Package for ...

1 Palaeontologia Electronica , yvind, Harper, David , and Paul D. Ryan, 2001. past : Paleontological Statistics Software Package for Education and Data Analysis. Palaeontologia Electronica, vol. 4, issue 1, art. 4: 9pp., 178kb. : Paleontological Statistics Software Package FOR EDUCATION AND DATA ANALYSIS yvind Hammer, David Harper, and Paul D. Ryan yvind Hammer. Paleontological Museum, University of Oslo, Sars gate1, 0562 Oslo, Norway David A. T. Harper. Geological Museum, ster Voldgade 5-7, University of Copenhagen, DK-1350 Copen-hagen K, DenmarkPaul D. Ryan. Department of Geology, National University of Ireland, Galway, IrelandABSTRACTA comprehensive, but simple-to-use Software Package for executing a range ofstandard numerical analysis and operations used in quantitative paleontology hasbeen developed. The program, called past ( Paleontological Statistics ), runs on stan-dard Windows computers and is available free of charge. past integrates spread-sheet-type data entry with univariate and multivariate Statistics , curve fitting, time-series analysis, data plotting, and simple phylogenetic analysis.

2 Many of the functionsare specific to paleontology and ecology, and these functions are not found in stan-dard, more extensive, statistical packages . past also includes fourteen case studies(data files and exercises) illustrating use of the program for Paleontological problems,making it a complete educational Package for courses in quantitative WORDS: Software , data analysis, educationCopyright: Palaeontological Association, 22 June 2001 Submission: 28 February 2001 Acceptance: 13 May 2001 INTRODUCTIONEven a cursory glance at the recentpaleontological literature should convinceanyone that quantitative methods in pale-ontology have arrived at last. Neverthe-less, many paleontologists still hesitate inapplying such methods to their own of the reasons for this has been thedifficulty in acquiring and using appropri-ate data-analysis Software . The PALSTAT program was developed in the 1980s inorder to minimize such obstacles and pro-vide students with a coherent, easy-to-usepackage that supported a wide range ofalgorithms while allowing hands-on experi-ence with quantitative methods.

3 The firstPALSTAT version was programmed for theBBC microcomputer (Harper and Ryan1987), while later revisions were made forthe PC (Ryan et al. 1995). Incorporatingunivariate and multivariate Statistics andother plotting and analytical functions spe-cific to paleontology and ecology, PAL- yvind Hammer, David A. T. Harper, and Paul D. Ryan: Paleontological Statistics SOFTWARE2 STAT gained a wide user base amongboth paleontologists and some years of service, however,it was becoming clear that PALSTAT hadto undergo major revision. The DOS-based user interface and an architecturedesigned for computers with minisculememories (by modern standards) wasbecoming an obstacle for most , the field of quantitative paleontologyhas changed and expanded considerablyin the last 15 years, requiring the imple-mentation of many new algorithms. There-fore, in 1999 we decided to redesign theprogram totally, keeping the general con-cept but without concern for the originalsource code.

4 The new program, calledPAST ( Paleontological Statistics ) takesfull advantage of the Windows operatingsystem, with a modern, spreadsheet-based, user interface and extensivegraphics. Most past algorithms producegraphical output automatically, and thehigh-quality figures can be printed orpasted into other programs. The function-ality has been extended substantially withinclusion of important algorithms in thestandard past toolbox. Functions foundin past that were not available in PAL-STAT include (but are not limited to) parsi-mony analysis with cladogram plotting,detrended correspondence analysis, prin-cipal coordinates analysis, time-seriesanalysis (spectral and autocorrelation),geometrical analysis (point distributionand Fourier shape analysis), rarefaction,modelling by nonlinear functions ( ,logistic curve, sum-of-sines) and quantita-tive biostratigraphy using the unitary asso-ciations method.

5 We believe that thefunctions we have implemented reflect thepresent practice of Paleontological dataanalysis, with the exception of some func-tionality that we hope to include in futureversions ( , morphometric analysis withlandmark data and more methods for thevalidation and correction of diversitycurves).One of the main ideas behind past isto include many functions in a single pro-gram Package while providing for a con-sistent user interface. This minimizes timespent on searching for, buying, and learn-ing a new program each time a newmethod is approached. Similar projectsare being undertaken in other fields (e,g.,systematics and morphometry). Oneexample is Wayne Maddison s Mesquite Package ( ).An important aspect of PALSTAT wasthe inclusion of case studies, includingdata sets designed to illustrate possibleuses of the algorithms. Working throughthese examples allowed the student toobtain a practical overview of the differentmethodologies in a very efficient of these case studies have beenadjusted and included in past , and newcase studies have been added in order todemonstrate the new features.

6 The casestudies are primarily designed as studentexercises for courses in paleontologicaldata analysis. The past program, docu-mentation, and case studies are availablefree of charge at ~ AND BASIC STATISTICSG raphical plotting functions (see ~ohammer/ ) in past include different typesof graph, histogram, and scatter plots. Theprogram can also produce ternary (trian-gle) plots and survivorship Statistics (see ~ohammer/ ) include minimum, maximum,and mean values, population variance,sample variance, population and samplestandard deviations, median, skewness,and kurtosis. yvind Hammer, David A. T. Harper, and Paul D. Ryan: Paleontological Statistics SOFTWARE3 For associations or paleocommunitydata, several diversity Statistics can becomputed: number of taxa, number of indi-viduals, dominance, Simpson index,Shannon index (entropy), Menhinick s andMargalef s richness indices, equitability,and Fisher s a (Harper 1999).

7 Rarefaction (Krebs 1989) is a methodfor estimating the number of taxa in asmall sample, when abundance data for alarger sample are given. With this method,the number of taxa in samples of differentsizes can be compared. An example appli-cation of rarefaction in paleontology isgiven by Adrain et al. (2000).The program also includes standardstatistical tests (see ~ohammer/ ) for univariate data, includ-ing: tests for normality (chi-squared andShapiro-Wilk), the F and t tests, one-wayANOVA, 2 for comparing binned samples,Mann-Whitney s U test and Kolmogorov-Smirnov association test (non-parametric),and both Spearman s r and Kendall s tnon-parametric rank-order tests. Dice andJaccard similarity indices are used forcomparing associations limited toabsence/presence data. The Raup-Crickrandomization method for comparingassociations (Raup and Crick 1979) isalso implemented. Finally, the programcan also compute correlation matrices andperform contingency-table ANALYSISP aleontological data sets, whetherbased on fossil occurrences or morphol-ogy, often have high dimensionality.

8 past includes several methods for multivariatedata analysis (see ~ohammer/ ), includingmethods that are specific to paleontologyand components analysis (PCA)is a procedure for finding hypothetical vari-ables (components) that account for asmuch of the variance in a multidimensionaldata set as possible (Davis 1986, Harper1999). These new variables are linearcombinations of the original is a standard method for reducing thedimensionality of morphometric and eco-logical data. The PCA routine finds theeigenvalues and eigenvectors of the vari-ance-covariance matrix or the correlationmatrix. The eigenvalues, giving a measureof the variance accounted for by the corre-sponding eigenvectors (components), aredisplayed together with the percentages ofvariance accounted for by each of thesecomponents. A scatter plot of these dataprojected onto the principal components isprovided, along with the option of includingthe Minimal Spanning Tree, which is theshortest possible set of connected linesjoining all points.

9 This may be used as avisual aid in grouping close points (Harper1999). The component loadings can alsobe plotted. Bruton and Owen (1988)describe a typical morphometrical applica-tion of coordinates analysis (PCO)is another ordination method, somewhatsimilar to PCA. The PCO routine finds theeigenvalues and eigenvectors of a matrixcontaining the distances between all datapoints, measured with the Gower distanceor the Euclidean distance. The PCO algo-rithm used in past was taken from Davis(1986), which also includes a moredetailed description of the method andexample analysis (CA) is afurther ordination method, somewhat simi-lar to PCA, but for counted or discretedata. Correspondence analysis can com-pare associations containing counts oftaxa or counted taxa across , CA is more suitable if it is expectedthat species have unimodal responses tothe underlying parameters, that is theyfavor a certain range of the parameter and yvind Hammer, David A.

10 T. Harper, and Paul D. Ryan: Paleontological Statistics SOFTWARE4become rare under for lower and highervalues (this is in contrast to PCA, thatassumes a linear response). The CA algo-rithm employed in past is taken fromDavis (1986), which also includes a moredetailed description of the method andexample analysis. Ordination of both sam-ples and taxa can be plotted in the sameCA coordinate system, whose axes willnormally be interpreted in terms of envi-ronmental parameters ( , water depth,type of substrate temperature).The Detrended Correspondence(DCA) module uses the same reciprocalaveraging algorithm as the program Dec-orana (Hill and Gauch 1980). It is special-ized for use on ecological data sets withabundance data (taxa in rows, localities incolumns), and it has become a standardmethod for studying gradients in suchdata. Detrending is a type of normalizationprocedure in two steps. The first stepinvolves an attempt to straighten out points lying along an arch-like pattern (=Kendall s Horseshoe).


Related search queries