Example: barber

A Layered Grammar of Graphics - Hadley Wickham

Please see the online version of this article for supplementary Layered Grammar of GraphicsHadley WICKHAMA Grammar of Graphics is a tool that enables us to concisely describe the componentsof a graphic. Such a Grammar allows us to move beyond named Graphics ( , the scat-terplot ) and gain insight into the deep structure that underlies statistical Graphics . Thisarticle builds on Wilkinson, Anand, and Grossman (2005), describing extensions andrefinements developed while building an open source implementation of the grammarof Graphics for R, topics in this article include an introduction to the Grammar by working throughthe process of creating a plot, and discussing the components that we need. The gram-mar is then presented formally and compared to Wilkinson s Grammar , highlighting thehierarchy of defaults, and the implications of embedding a graphical Grammar into aprogramming language. The power of the Grammar is illustrated with a selection ofexamples that explore different components and their interactions, in more detail.

Please see the online version of this article for supplementary materials. A Layered Grammar of Graphics Hadley WICKHAM A grammar of graphics is a tool that enables us to concisely describe the components

Tags:

  Grammar, Graphics, Grammar of graphics

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of A Layered Grammar of Graphics - Hadley Wickham

1 Please see the online version of this article for supplementary Layered Grammar of GraphicsHadley WICKHAMA Grammar of Graphics is a tool that enables us to concisely describe the componentsof a graphic. Such a Grammar allows us to move beyond named Graphics ( , the scat-terplot ) and gain insight into the deep structure that underlies statistical Graphics . Thisarticle builds on Wilkinson, Anand, and Grossman (2005), describing extensions andrefinements developed while building an open source implementation of the grammarof Graphics for R, topics in this article include an introduction to the Grammar by working throughthe process of creating a plot, and discussing the components that we need. The gram-mar is then presented formally and compared to Wilkinson s Grammar , highlighting thehierarchy of defaults, and the implications of embedding a graphical Grammar into aprogramming language. The power of the Grammar is illustrated with a selection ofexamples that explore different components and their interactions, in more detail.

2 Thearticle concludes by discussing some perceptual issues, and thinking about how we canbuild on the Grammar to learn how to create graphical poems. Supplemental materials are available Words: Grammar of Graphics ; Statistical INTRODUCTIONWhat is a graphic? How can we succinctly describe a graphic? And how can we createthe graphic that we have described? These are important questions for the field of way to answer these questions is to develop a Grammar : the fundamental principlesor rules of an art or science (OED Online 1989). A good Grammar will allow us to gaininsight into the composition of complicated Graphics , and reveal unexpected connectionsbetween seemingly different Graphics (Cox 1978). A Grammar provides a strong foundationfor understanding a diverse range of Graphics . A Grammar may also help guide us on whata well-formed or correct graphic looks like, but there will still be many grammaticallycorrect but nonsensical Graphics . This is easy to see by analogy to the English language:good Grammar is just the first step in creating a good Wickham is Assistant Professor of Statistics, Rice University, Houston, TX 77030 2010 American Statistical Association, Institute of Mathematical Statistics,andInterface Foundation of North AmericaJournal of Computational and Graphical Statistics, Volume 19, Number 1, Pages 3 28 WICKHAMThe most important modern work in graphical grammars is The Grammar of Graph-ics byWilkinson, Anand, and Grossman(2005).

3 This work built on earlier work byBertin(1983) and proposed a Grammar that can be used to describe and construct a wide range ofstatistical Graphics . This article proposes an alternative parameterization of the Grammar ,based around the idea of building up a graphic from multiple layers of data. The gram-mar differs from Wilkinson s in its arrangement of the components, the development ofa hierarchy of defaults, and in that it is embedded inside another programming three aspects form the core of the article, comparing and contrasting the layeredgrammar to Wilkinson s Grammar . These sections are followed by a discussion of some ofthe implications of the Grammar , how we might use it to build higher level tools for ideas presented in this article have been implemented in the open-source R package,ggplot2, available fromCRAN. More details about the Grammar and implementation,including a comprehensive set of examples, can be found on the package websitehttp:// ggplot2. The code used to produce the figures in this article is available onlinein the supplemental HOW TO BUILD A PLOTWhen creating a plot we start with data.

4 For this example, to focus on the essence ofdrawing a graphic, without getting distracted by more complicated manipulations of thedata, we will use the trivial dataset shown in Table1. It has four variables,A,B,C, andD, and four A BASICPLOTLet us draw a scatterplot ofAversusC. What exactly is a scatterplot? One way todescribe it is that we are going to draw a point for each observation, and we will positionthe point horizontally according to the value ofA, and vertically according toC. For thisexample, we will also map categorical variableDto the shape of the first step in making this plot is to create a new dataset that reflects the mappingofx-position toA,y-position toC, and shape ,y-position, and shape areexamples of aesthetics, things that we can perceive on the graphic. We will also remove allother variables that do not appear in the plot. This is shown in CD234a121a4515b91080bALAYEREDGRAMMAR OFGRAPHICS5 Table 2. Simple dataset with variables named according to the aesthetic that they a11 a415 b980 bWe can create many different types of plots using this same basic specification.

5 Forexample, if we were to draw lines instead of points, we would get a line plot. If we usedbars, we would get a bar plot. Bars, lines, and points are all examples of geometric next thing we need to do is to convert these numbers measured in data units tonumbers measured in physical units, things that the computer can display. To do that weneed to know that we are going to use linear scales and a Cartesian coordinate system. Wecan then convert the data units to aesthetic units, which have meaning to the underlyingdrawing system. For example, to convert from a continuous data value to a horizontal pixelcoordinate, we need a function like the following:floor(x min(x)range(x) screen width).In this example, we will scale thex-position to[0,200]and they-position to[0,300].Theprocedure is similar for other aesthetics, such as shape: here we map a to a circle, and b to a square. The results of these scalings are shown in Table3. These transformationsare the responsibility ofscales, described in detail in general, there is another step that we have skipped in this simple example: a statisticaltransformation.

6 Here we are using the identity transformation, but there are many othersthat are useful, such as binning or aggregating. Statistical transformations, or stats, aredescribed in detail in , we need to render these data to create the graphical objects that are displayedon screen or paper. To create a complete plot we need to combine graphical objects fromthree sources: thedata, represented by the point geom; thescales and coordinate system,which generates axes and legends so that we can read values from the graph; and theplot annotations, such as the background and plot title. These components are shown inFigure1. Combining and displaying these graphical objects produces the final plot, asin dataset with variables mapped into aesthetic WICKHAMF igure objects produced by (from left to right): geometric objects, scales and coordinate system,plot final graphic, produced by combining the pieces in A MORECOMPLICATEDPLOTNow that you are acquainted with drawing a simple plot, we will create a more com-plicated plot that uses faceting.

7 Faceting is a more general case of the techniques knownas conditioning, trellising, and latticing, and produces small multiples showing differentsubsets of the data. If we facet the previous plot byDwe will get a plot that looks likeFigure3, where each value ofDis displayed in a different splits the original dataset into a dataset for each subset, so the data that underlieFigure3 look like first steps of plot creation proceed as before, but new steps are necessary whenwe get to the scales. Scaling actually occurs in three parts: transforming, training, andmapping. Scale transformation occurs before statistical transformation so that statistics arecomputed on the scale-transformed data. This ensures that a plot of log(x)versuslog(y)on linear scales looks the same asxversusyon log scales. See more details. Transformation is only necessary for nonlinear scales, because allstatistics are location-scale OFGRAPHICS7 Figure more complicated plot, faceted by variableD.

8 Here the faceting uses the same variable that ismapped to shape so that there is some redundancy in our visual representation. This allows us to easily see howthe data have been broken into dataset faceted into 4 circlea1 1 circleb415squareb980square After the statistics are computed, each scale is trained on every faceted dataset (a plotcan contain multiple datasets, , raw data and predictions from a model). The train-ing operation combines the ranges of the individual datasets to get the range of thecomplete data. If scales were applied locally, comparisons would only be meaningfulwithin a facet. This is shown in Table5. Finally the scales map the data values into aesthetic values. This gives Table6, whichis essentially identical to Table2 apart from the structure of the datasets. Given thatwe end up with an essentially identical structure, you might wonder why we do notsimply split up the final result. There are several reasons for this. It makes writingTable scaling, where data are scaled independently within each facet.

9 Note that each facet occupiesthe full range of positions, and only uses one shape. Comparisons across facets are not WICKHAMT able data correctly mapped to aesthetics. Note the similarity to transformation functions easier, as they only need to operate on a singlefacet of data, and some need to operate on a single subset, for example, calculatinga percentage. Also, in practice we may have a more complicated training scheme forthe position scales so that different columns or rows can have COMPONENTS OF THE Layered GRAMMARIn the examples above, we have seen some of the components that make up a plot: data and aesthetic mappings, geometric objects, scales, and facet have also touched on two other components: statistical transformations, and the coordinate , the data, mappings, statistical transformation, and geometric object form a plot may have multiple layers, for example, when we overlay a scatterplot with asmoothed be precise, the Layered Grammar defines the components of a plot as.

10 A default dataset and set of mappings from variables to aesthetics, one or more layers, with each layer having one geometric object, one statistical trans-formation, one position adjustment, and optionally, one dataset and set of aestheticmappings, one scale for each aesthetic mapping used, a coordinate system, the facet high-level components are quite similar to those of Wilkinson s Grammar , asshowninFigure4. In both grammars, the components are independent, meaning that weALAYEREDGRAMMAR OFGRAPHICS9 Figure between components of Wilkinson s Grammar (left) and the Layered Grammar (right).TRANShas no correspondence inggplot2: its role is played by built-in R generally change a single component in isolation. There are more differences withinthe individual components, which are described in the details that layer component is particularly important as it determines the physical represen-tation of the data, with the combination of stat and geom defining many familiar namedgraphics: the scatterplot, histogram, contourplot, and so on.


Related search queries