Transcription of Analysing Spatial Data in R: Worked example: …
1 Analysing Spatial data in R: Worked example :point patternsRoger BivandDepartment of EconomicsNorwegian School of Economics and Business AdministrationBergen, Norway31 August 2007 OutlineIWhat we see on a map is a pattern, or perhaps some patternsmixed is not easy to work back from map pattern to the process orprocesses that generated a variety of approaches, we can explore and analysepoint patterns , also reviewing an important chapter in thedevelopment of quantitative , we will also see how we can try out differentapproaches, and how their assumptions affect our O Sullivan and David Unwin (2003)GeographicalInformation Analysis, Wiley, chapter 4, plus chapter 3 for thecurious;IIan Smalley and David Unwin (1968) The formation andshape of drumlins and their distribution and orientation indrumlin fields,Journal of Glaciology, 7, pp.
2 377 390; Alan (1973) The distribution of drumlins in County Down,Ireland,Annals, AAG, 63 (2). pp. 226 geographers may also like Trevor Bailey and AnthonyGatrell (1995)Interactive Spatial data analysis, Longman,chapter , PolandData> library(rgdal)> drumlins <- readOGR(".", "drumlins")OGR data source with driver: ESRI ShapefileSource: ".", layer: "drumlins"with 232 rows and 4 columns> drumlins_SP <- as(drumlins,+ "SpatialPoints")A data set similar to the one referedto by O Sullivan and Unwin on is available inspatialinR (associated with Venables andRipley (2002) Modern AppliedStatistics withS) it is the oneused by Upton and Fingleton, codedby Ripley. We have here copied thepoints to a , County Down, Ireland(Hill, 1973)(Upton & Fingleton, 1985)Usingspatstatwithsp> library(maptools)> library(spatstat)> drumlins_ppp <- as(drumlins_SP,+ "ppp")> drumlins_pppplanar point pattern: 232pointswindow: rectangle = [ , ] x [ , ]unitsAlthoughspatstatand thespclasseshave developed independently, theyhave a good deal in common, andpoint patterns , images and polygonwindows can be exchangedEdges and plotPoint pattern objects need boundingwindows to show where thepopulation of data points werecollected.
3 The default window is thebounding box of the points , butothers are available.> bb <- (drumlins_ppp)> ch <- (drumlins_ppp)> rr <- ripras(drumlins_ppp)> drumlins_rr <- ppp(drumlins_ppp$x,+ drumlins_ppp$y, window = rr)drumlins_pppllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllQuadrat analysisOne legacy approach to pointpatterns, avoiding the drudge ofmeasuring inter-point distances, hasbeen to divide the study area intoquadrats, and count the numbers ofpoints falling into each quadrat. Thiscan take the form of a 2D histogram,or be displayed as an image plot.> qc <- quadratcount(drumlins_ppp)02468681012 Quadrat counts461312417714104141289101310571315111310 Quadrat testsChi-squared tests for CompleteSpatial Randomness using quadratcounts may seem attractive, butsuffer from the same problems as dohistogram bins:> (drumlins_ppp)Chi-squared test of CSR usingquadrat countsdata: drumlins_pppX-squared = , df = 24,p-value = adding one more row andcolumn of quadrats, or switchingwindows, changes our conclusion:> (drumlins_ppp, nx = 6)Chi-squared test of CSR usingquadrat countsdata: drumlins_pppX-squared = , df = 35,p-value = > (drumlins_rr)Chi-squared test of CSR usingquadrat countsdata.
4 Drumlins_rrX-squared = , df = 24,p-value = plotsDensity plots use a 2D kernel, inspatstata Gaussian kernel, tocreate smoothed histograms avoiding the problems of quadratcounts. The key argument to pass to the density method for pointpatterm objects issigma=, which determines the bandwidth of thekernel. Since we can coerce the image objects output by themethod to anspclass, we use this to cumulate density values fordifferent values of sigma.> k025 <- density(drumlins_rr, sigma = , xy = crds)> SG <- as(k025, "SpatialGridDataFrame")> k050 <- density(drumlins_rr, sigma = , xy = crds)> SG <- cbind(SG, as(k050, "SpatialGridDataFrame"))> k075 <- density(drumlins_rr, sigma = , xy = crds)> SG <- cbind(SG, as(k075, "SpatialGridDataFrame"))> k100 <- density(drumlins_rr, sigma = 1, xy = crds)> SG <- cbind(SG, as(k100, "SpatialGridDataFrame"))> names(SG) <- c("k025", "k050", "k075", "k100")Kernel density surfacesk025k050k075k1000246810 Impact of bandwidth on surfaceNarrower bandwidths yield more extreme values, broaderbandwidths narrow the interquartile range.
5 From this table, we cansee how the change in the bandwidth is affecting the relativedifferences in our view of the local estimates of intensity.> summary(as(SG, " ")[, 1:4])k025 k050 k075 k100 Min. Min. Min. Min. Qu. +00 1st Qu. 1st Qu. 1st Qu. +00 Median Median Median +00 Mean Mean Mean Qu. +00 3rd Qu. 3rd Qu. 3rd Qu. +01 Max. Max. Max. perspective plotsWe can also display the local estimates of point intensity as aquasi-3D surface using therglpackage, and examine itinteractively. This can be manipulated by changing the viewpointwith and , and equivalent parameters for the lighting vertical scale is that of the local intensity.> library(rgl)> z <- as(SG["k025"], "matrix")> z[ (z)] <- 0> x <- 1:nrow(z)> y <- 1:ncol(z)> (color = "white")> ()> (theta = 350, phi = 35)> (x, y, z)> (" ")Perspective plot ( bandwidth)Nearest-neighbour distancesWe can find and plot nearestneighbour distances, finding themwithnndist plotting the empiricalcumulative distribution function ofthe nearest neighbour distances isinteresting:> nns <- nndist(drumlins_rr)> summary(nns)Min.
6 1st Qu. Median Qu. > plot(ecdf(nns)) (nns)xFn(x)lllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllll llllllllUsing G empirical cumulative distribution functionThe Gmeasure turns out to be justthe ECDF of the nearest neighbourdistances, plotted by default with theexpected CSR line;Gestreturnsbinned values for a range of distancebins best chosen by the function:> plot(ecdf(nns), xlim = c(0,+ ))> plot(Gest(drumlins_ppp), add = TRUE,+ lwd = 3) (nns)xFn(x)lllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllS imulating Gfor CSRIf we generate many simulated CSRpoint patterns for the currentwindow, we can use theenvelopemethod to explore whether theobserved Gmeasures lie in relationto the simulated ones:> n <- drumlins_rr$n> ex <- expression(runifpoint(n,+ win = rr))> res <- envelope(drumlins_rr,+ Gest, nsim = 99, simulate = ex,+ verbose = FALSE, saveall = TRUE)> plot(res, xlim = c(0, ))> for (i in 2.)
7 100) lines(attr(res,+ "savedata")[[1]], attr(res,+ "savedata")[[i]], col = "grey")> plot(res, add = TRUE, xlim = c(0,+ )) G(r)Clark/EvansRstatisticsWe can also compute the nearest neighbour based Clark/EvansRstatistic :> source(" ")> clarkevans(drumlins_ppp) > clarkevans(drumlins_rr, correction = "none") > clarkevans(drumlins_rr, correction = "guard", clipregion = (rr,+ r = 1)) seem to indicate that the observed and CSR expecteddistances are similar, but perhaps more evenly spaced CSR a good idea?From what we have seen, it appears thethe drumlin summit points are moreregularly than randomly distributed. If wethink, however, the absence of short nearestneighbour distance may mean that theypusheach other apart (in fact this is aboutpoints not being a good way of representingellipses) so we can try to simulate from aSimple Sequential Inhibition (SSI) processwith a 180m inhibition radius instead ofCSR:> ex <- expression(rSSI( ,+ n, win = rr))> res <- envelope(drumlins_rr,+ Gest, nsim = 99, simulate = ex,+ verbose = FALSE, saveall = TRUE) G(r) Kwith CSR simulationAs we know, Guses nearestneighbour distances to expresssummary features of a point Kfunction uses point intensitiesin rings spreading out from thepoints, and so uses more of the datato examine what is driving theprocess (reported here as L).
8 > ex <- expression(runifpoint(n,+ win = rr))> res <- envelope(drumlins_rr,+ Kest, nsim = 99, simulate = ex,+ verbose = FALSE, saveall = TRUE) simulationrL(r) Kwith SSI simulationFrom what we already know,drumlins represented as pointsappear to inhibit each other undera distance of about 200m, so runningthe Kmeasure with an SSI processshould show more of what is goingon:> ex <- expression(rSSI( ,+ n, win = rr))> res <- envelope(drumlins_rr,+ Kest, nsim = 99, simulate = ex,+ verbose = FALSE, saveall = TRUE) simulationrL(r)Conclusions about drumlins (so far)IComparing the SSI with the CSR results indicates that we cannot only reject the CSR process as being that driving drumlinpoint locations, but we have good grounds to suppose that aspatial inhibition process is operatingIthe minimum drumlin footprint is such that drumlins cannotbe closer to each other than a certain minimum distanceIAt short range, the estimated Lvalues are lower than thelower envelope bounds, but only less than distance units this corresponds to Spatial inhibitionIAs the Lsimulation values for the SSI process indicate,drumlins are not well represented by points , because they havearea, even volume, and rarely overlap but is there perhapsa trace of clustering at greater distances?
9 Conclusions about point patternsIPoint patterns are often observed: have we observed all thepoints?IAre points a reasonable representation of the object we areinterested in?IWhat are the boundaries of our study area does changingthem make a difference?IWhat kinds of processes can be generating our observedpattern can we model them?IIs our study area representative of the class of phenomena inmore general terms?