R - PASI

PASI

Pan-American Advanced Study Institute on Spatio-Temporal Statistics

June 16-26, 2014

Búzios, RJ, Brazil

Practicals - geoR

Paulo Justiniano Ribeiro Jr (I),

(I) LEG: Statistics and Geoinformation Lab), DEST: Department of Statistics, UFPR: Federal University of Paraná

Table of contents

A (very) basic geoR session
Loading and exploring the data
Model estimation
Model prediction
Bayesian inferance

This is a brief overview of some functionalities in geoR.
We refer to the geoR “Tutorials” page for further examples.

A (very) basic geoR session

##-----------------------------------------------------------------------------
##
##        Pratical session: geostatistical analysis with geoR package
## 
##                       Paulo Justiniano Ribeiro Jr
## 
##-----------------------------------------------------------------------------
## Required package.
require(geoR)

## Loading a data-set included in the package
data(s100)

A quick data display with exploratory checks on spatial patterns, trends on the coordinates and distribution of the data.

plot(s100, lowess=TRUE)

plot of chunk s100plot

Another function allows for mode options for displaying the data.

points(s100)
args(points.geodata)

## function (x, coords = x$coords, data = x$data, data.col = 1, 
##     borders = x$borders, pt.divide = c("data.proportional", "rank.proportional", 
##         "quintiles", "quartiles", "deciles", "equal"), lambda = 1, 
##     trend = "cte", abs.residuals = FALSE, weights.divide = "units.m", 
##     cex.min, cex.max, cex.var, pch.seq, col.seq, add.to.plot = FALSE, 
##     x.leg, y.leg = NULL, dig.leg = 2, round.quantiles = FALSE, 
##     permute = FALSE, ...) 
## NULL

plot of chunk s100points

The brief summary include information on the distances between data points.

summary(s100)

## $n
## [1] 100
## 
## $coords.summary
##      Coord.X Coord.Y
## min 0.005638 0.01091
## max 0.983921 0.99125
## 
## $distances.summary
##      min      max 
## 0.007641 1.278175 
## 
## $data.summary
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -1.170   0.273   1.100   0.931   1.610   2.870 
## 
## $others
## [1] "cov.model" "nugget"    "cov.pars"  "kappa"     "lambda"   
## 
## attr(,"class")
## [1] "summary.geodata"

There are several options for computing the empirical variogram. Here we use default options except for the argument setting the maximum distance to be considered between pairs of points.

s100.v <- variog(s100, max.dist=1)
plot(s100.v)

plot of chunk s100vario

Turning to the parameter estimation we fit the model with a constant mean, isotropic exponential correlation function and allowing for estimating the variance of the conditionally independent observations (nugget variance)

(s100.ml <- likfit(s100, ini=c(1, 0.15)))
summary(s100.ml)

## likfit: estimated model parameters:
##    beta   tausq sigmasq     phi 
## "0.777" "0.000" "0.752" "0.183" 
## Practical Range with cor=0.05 for asymptotic range: 0.5474
## 
## likfit: maximised log-likelihood = -83.6

## Summary of the parameter estimation
## -----------------------------------
## Estimation method: maximum likelihood 
## 
## Parameters of the mean component (trend):
##  beta 
## 0.777 
## 
## Parameters of the spatial component:
##    correlation function: exponential
##       (estimated) variance parameter sigmasq (partial sill) =  0.752
##       (estimated) cor. fct. parameter phi (range parameter)  =  0.183
##    anisotropy parameters:
##       (fixed) anisotropy angle = 0  ( 0 degrees )
##       (fixed) anisotropy ratio = 1
## 
## Parameter of the error component:
##       (estimated) nugget =  0
## 
## Transformation parameter:
##       (fixed) Box-Cox parameter = 1 (no transformation)
## 
## Practical Range with cor=0.05 for asymptotic range: 0.5474
## 
## Maximised Likelihood:
##    log.L n.params      AIC      BIC 
##  "-83.6"      "4"    "175"    "186" 
## 
## non spatial model:
##    log.L n.params      AIC      BIC 
##   "-126"      "2"    "256"    "261" 
## 
## Call:
## likfit(geodata = s100, ini.cov.pars = c(1, 0.15))

Finally, we start the spatial prediction defining a grid of points. The kriging function by default performs ordinary kriging It minimaly requires the data, prediction locations and estimated model parameters.

s100.gr <- expand.grid((0:100)/100, (0:100)/100)
s100.kc <- krige.conv(s100, locations=s100.gr, krige=krige.control(obj.model=s100.ml))

## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").

names(s100.kc)

## [1] "predict"      "krige.var"    "beta.est"     "distribution" "message"     
## [6] "call"

If the locations forms a grid the predictions can be visualised as image, contour or persp plots. Image plots shows the predicted values (left) and the respective std errors.

par(mfrow=c(1,2), mar=c(3.5, 3.5, 0.5, 0.5))
image(s100.kc, col=gray(seq(1, 0.2, l=21)))
contour(s100.kc, nlevels=11, add=TRUE)
image(s100.kc, val = sqrt(krige.var), col=gray(seq(1, 0.2, l=21)))

plot of chunk s100kc

It is worth notice that under the Gaussian model the prediction errors depends on the coordinates of the data, and not to their values (only through model parameter estimates).

image(s100.kc, val = sqrt(krige.var), col=gray(seq(1, 0.2, l=21)), coords.data=TRUE)

plot of chunk s100kcerr

Back to table of contents

Loading and exploring the data

There are some datasets included in geoR which can be listed as follows. We use some of them to illustrate further options in the functions.

##-----------------------------------------------------------------------------
## Datasets included in geoR.
data(package="geoR")

Some of them are of the class “geodata” which is convenient (but not compulsory!) to use with geoR' functions. Others are simply data.frames from which “geodata” can be created. “geodata” (S3) objects are just lists with a class “geodata” and must have elements “coords” and “data”. Optional elements includes “covariates”, “units.m” among others (see help(as.geodata)).

class(s100)
class(parana)
class(Ksat)
class(camg)
head(camg)
ctc020 <- as.geodata(camg, coords.col=c(1,2), data.col=7, covar.col=c(3,4))

## [1] "geodata"

## [1] "geodata"

## [1] "geodata"

## [1] "data.frame"

##   east north elevation region ca020 mg020 ctc020 ca2040 mg2040 ctc2040
## 1 5710  4829      6.10      3    52    18  106.0     40     16    86.3
## 2 5727  4875      6.05      3    57    20  131.1     49     21   123.1
## 3 5745  4922      6.30      3    72    22  114.6     63     22   101.7
## 4 5764  4969      6.60      3    74    11  114.4     74     12    95.6
## 5 5781  5015      6.60      3    68    34  124.4     44     36   106.3
## 6 5799  5062      5.75      3    45    27  132.9     27     22   105.3

The function calls shown above mostly use default options and it is worth examine the arguments (and the help files).

args(plot.geodata)
args(points.geodata)
args(variog)

## function (x, coords = x$coords, data = x$data, borders = x$borders, 
##     trend = "cte", lambda = 1, col.data = 1, weights.divide = "units.m", 
##     lowess = FALSE, scatter3d = FALSE, density = TRUE, rug = TRUE, 
##     qt.col, ...) 
## NULL

## function (x, coords = x$coords, data = x$data, data.col = 1, 
##     borders = x$borders, pt.divide = c("data.proportional", "rank.proportional", 
##         "quintiles", "quartiles", "deciles", "equal"), lambda = 1, 
##     trend = "cte", abs.residuals = FALSE, weights.divide = "units.m", 
##     cex.min, cex.max, cex.var, pch.seq, col.seq, add.to.plot = FALSE, 
##     x.leg, y.leg = NULL, dig.leg = 2, round.quantiles = FALSE, 
##     permute = FALSE, ...) 
## NULL

## function (geodata, coords = geodata$coords, data = geodata$data, 
##     uvec = "default", breaks = "default", trend = "cte", lambda = 1, 
##     option = c("bin", "cloud", "smooth"), estimator.type = c("classical", 
##         "modulus"), nugget.tolerance, max.dist, pairs.min = 2, 
##     bin.cloud = FALSE, direction = "omnidirectional", tolerance = pi/8, 
##     unit.angle = c("radians", "degrees"), angles = FALSE, messages, 
##     ...) 
## NULL

We point in particular to three arguments:

borders: a matrix with coordinates of a polygon limiting the area
lambda: parameter of the BoxCox transformation
trend: a matrix, formula or special words (“1st”, “2nd”) specifying a mean model

Borders are conveniently dealt with if included in the geodata object. As an example lets add borders to the s100 data

s100$borders <- matrix(c(0,0,1,0,1,1,0,1,0,0), ncol=2, byrow=TRUE)
points(s100)

plot of chunk s100borders

If you don't have the coordinates of the border consider setting them manually by using “locator()” over the “points()” plot.

The dataset Ksat with data on the saturated soil conductivity has a clear asymetric behaviour and the BoxCox transformation suggests to log transform the data.

data(Ksat)
require(MASS)
summary(Ksat)
boxcox(Ksat)

## $n
## [1] 32
## 
## $coords.summary
##        x   y
## min  2.5 0.8
## max 21.0 9.1
## 
## $distances.summary
##     min     max 
##  0.5099 18.5529 
## 
## $data.summary
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0200  0.0922  0.3050  0.8720  1.1400  4.3000 
## 
## $borders.summary
##        x  y
## min  0.0  0
## max 22.5 13
## 
## attr(,"class")
## [1] "summary.geodata"

plot of chunk ksatboxcox

Differences on exploratory plots are clear.

plot(Ksat, lowess=TRUE)

plot of chunk ksat-plot

plot(Ksat, lowess=TRUE, lambda=0)

plot of chunk ksat0-plot

We now use the Paraná rainfall data to illustrate the usage and effect of trend argument. Notice also the data also include the borders of the state

data(parana)
summary(parana)

## $n
## [1] 143
## 
## $coords.summary
##      east  north
## min 150.1  70.36
## max 768.5 461.97
## 
## $distances.summary
##   min   max 
##   1.0 619.5 
## 
## $data.summary
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     163     234     270     274     318     414 
## 
## $borders.summary
##      east  north
## min 138.0  46.77
## max 798.6 507.93
## 
## $others
## [1] "loci.paper"
## 
## attr(,"class")
## [1] "summary.geodata"

It is clear from the exploratory plot that there is a gradient of the response along the L-W and N-S coordinates.

plot(parana, lowess=TRUE)

plot of chunk parana-plot

If the argument trend is provided a (ordinary) regression is fitted and the residuals are plotted. The options trend=“1st” defines a linear trend on the coordinates and is equivalent to trend=~coords. More generally a formula on a set of covariates can be used.

plot(parana, trend = "1st", lowess=TRUE)

plot of chunk parana-trend-plot

The effect on the trend in the variograms are very much noticeable in this example.

par(mfrow=c(1,2))
parana.vario <- variog(parana, max.dist=400)
plot(parana.vario)
parana.variot <- variog(parana, trend="1st",  max.dist=400)
plot(parana.variot)

plot of chunk parana-trend-vario

Next we show a simple Monte Carlo test using the empirical variogram. The envelope reflects the assumption of no spatial correlation and is given by permutations of the data across the sampling locations.

s100.v <- variog(s100, max.dist=1)
s100.v.env <- variog.mc.env(s100, obj.variog=s100.v)
plot(s100.v, env=v.env)

plot of chunk s100.vario.env

To close this section we show a helper function which facilitate building empirical variograms on different directions which may be used to investigate anisotropy.

s100.v4 <- variog4(s100, max.dist=1)
plot(s100.v4, env=s100.v.env, omni=TRUE)

plot of chunk s100.vario4

Back to table of contents

Model estimation

The are four methods for estimating the (covariance) model parameters implemented in geoR.

visual variogram fitting: using the function eyefit().
variogram fitting: using the function variofit() basically fitting a nnon-linear model to the empirical variogram.
likelihood (ML or REML): using the function likfit().
Bayesian inference: using the function krige.bayes().

We now illustrate the first three, leaving Bayesian inference for later.

eyefit() provides an interactive visual model fitting tool which can be used on its own to provide estimates, or to set reasonable initial values for the other methods. Try experimenting with the following.

data(s100)
s100.v <- variog(s100, max.dist=1)
plot(s100)
eyefit(s100)

variofit() allows for several options. In what follows we show variogram fits for the parana data under constant mean and a trend on the coordinates. By default a exponential correlation function (which corresponds to the Martèrn model with kappa=0.5) is used. We also show fits for e spherical and Matèrn (with kappa=1.5) variogram models.

parana.vfit.exp <- variofit(parana.vario)
parana.vfit.mat1.5  <- variofit(parana.vario, kappa=1.5)
parana.vfit.sph  <- variofit(parana.vario, cov.model="sph")
#
parana.vtfit.exp <- variofit(parana.variot)
parana.vtfit.mat1.5  <- variofit(parana.variot, kappa=1.5)
parana.vtfit.sph  <- variofit(parana.variot, cov.model="sph")
par(mfrow=c(1,2))
plot(parana.vario)
lines(parana.vfit.exp);lines(parana.vfit.mat1.5, col=2);lines(parana.vfit.sph, col=4)
plot(parana.variot)
lines(parana.vtfit.exp);lines(parana.vtfit.mat1.5, col=2);lines(parana.vtfit.sph, col=4)

plot of chunk parana-variofit

Maximum likelihood estimates are obtained fitting the model to the data (and not to the empirical variogram). Models can be compared by fitting measures.

(parana.ml0 <- likfit(parana, ini=c(4500,50), nug=500))
(parana.ml1 <- likfit(parana, trend="1st", ini=c(1000,50), nug=100))
(parana.ml2 <- likfit(parana, trend="2nd", ini=c(1000,50), nug=100))
logLik(parana.ml0)
logLik(parana.ml1)
logLik(parana.ml2)

## likfit: estimated model parameters:
##    beta   tausq sigmasq     phi 
##  " 243"  " 359"  "9594"  "1659" 
## Practical Range with cor=0.05 for asymptotic range: 4969
## 
## likfit: maximised log-likelihood = -672

## likfit: estimated model parameters:
##     beta0     beta1     beta2     tausq   sigmasq       phi 
## "416.498" " -0.138" " -0.400" "385.518" "785.690" "184.386" 
## Practical Range with cor=0.05 for asymptotic range: 552.4
## 
## likfit: maximised log-likelihood = -664

## likfit: estimated model parameters:
##      beta0      beta1      beta2      beta3      beta4      beta5      tausq    sigmasq 
## "423.9282" "  0.0620" " -0.6360" " -0.0004" "  0.0000" "  0.0006" "381.2267" "372.5993" 
##        phi 
## " 77.5441" 
## Practical Range with cor=0.05 for asymptotic range: 232.3
## 
## likfit: maximised log-likelihood = -660

## 'log Lik.' -671.6 (df=4)

## 'log Lik.' -663.9 (df=6)

## 'log Lik.' -660.2 (df=9)

Back to table of contents

Model prediction

There are two functions for spatial prediction in geoR.

krige.conv() performing “plug-in” predictions, i.e. treating estimated parameters as if they were the true values
krige.bayes() for Bayesian inference and prediction

For the former, the function takes: the data, coordinates of the locations, kriging options (through krige.control()) and output options (through output.control()). We use the Paraná rainfall data to illustrate more options for this function.

The first step is to define a grid of prediction locations.

parana.gr <- pred_grid(parana$borders, by=15)
points(parana)
points(parana.gr, pch=19, col=2, cex=0.25)
parana.gr0 <- locations.inside(parana.gr, parana$borders)
points(parana.gr0, pch=19, col=4, cex=0.25)

plot of chunk parana-grid

The following kriging call corresponds to the “universal kriging”. The arguments “trend.d” and “trend.l” require the specification (or terms in a formula) on both, data and prediction locations, respectively.

args(krige.control)
args(output.control)
KC <- krige.control(obj.m = parana.ml1, trend.d="1st", trend.l="1st")
OC <- output.control(simulations=TRUE, n.pred=1000,
             quantile=c(0.10, 0.25, 0.5, 0.75, 0.90), threshold = 350) 
parana.kc <- krige.conv(parana, loc=parana.gr, krige = KC, output = OC) 
names(parana.kc)

## function (type.krige = "ok", trend.d = "cte", trend.l = "cte", 
##     obj.model = NULL, beta, cov.model, cov.pars, kappa, nugget, 
##     micro.scale = 0, dist.epsilon = 1e-10, aniso.pars, lambda) 
## NULL

## function (n.posterior, n.predictive, moments, n.back.moments, 
##     simulations.predictive, mean.var, quantile, threshold, sim.means, 
##     sim.vars, signal, messages) 
## NULL

##  [1] "predict"                   "krige.var"                 "beta.est"                 
##  [4] "distribution"              "simulations"               "mean.simulations"         
##  [7] "variance.simulations"      "quantiles.simulations"     "probabilities.simulations"
## [10] "sim.means"                 ".Random.seed"              "message"                  
## [13] "call"

From the output provided we picked the following in the next plot: prediction means, medians, a simulation from the (joint) predictive distribution, and the probability of exceeding the value of 350.

par(mfrow=c(2,2), mar=c(3,3,0.5, 0.5))
image(parana.kc, col = terrain.colors(21), x.leg=c(500, 750), y.leg=c(0, 50))

## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").

image(parana.kc, val = parana.kc$quantile[,3], col=terrain.colors(21), 
    x.leg=c(500, 750), y.leg=c(0, 50))

## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").

image(parana.kc, val = parana.kc$simulation[,1], col=terrain.colors(21), 
    x.leg=c(500, 750), y.leg=c(0, 50))

## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").

image(parana.kc, val = 1-parana.kc$prob, col=terrain.colors(21), 
    x.leg=c(500, 750), y.leg=c(0, 50))

## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").

plot of chunk parana-krige

More generally, simulations can be used to derive maps of quantities of interest. The top panels in next example calls extract minimum and maximum simulated values at each location. The bottom panels shows probabilities of exceeding 300 at each location, and predictive distribution of the proportion of the area exceeding 300.

par(mfrow=c(2,2), mar=c(3,3,0.5, 0.5))
image(parana.kc, val = apply(parana.kc$simulation, 1, min), col=terrain.colors(21), 
    x.leg=c(500, 750), y.leg=c(0, 50))

## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").

image(parana.kc, val = apply(parana.kc$simulation, 1, max), col=terrain.colors(21), 
    x.leg=c(500, 750), y.leg=c(0, 50))

## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").

image(parana.kc, val=apply(parana.kc$simulations, 1, function(x) mean(x > 300, na.rm=T)), 
                 x.leg=c(500, 750), y.leg=c(0,50), col=gray(seq(1, 0.2, length=21)))

## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").

hist(apply(parana.kc$simulations, 2, function(x) mean(x > 300, na.rm=T)), main="")

plot of chunk parana-krige2

We now pick four particular locations for prediction

loc4 <- cbind(c(300, 480, 680, 244), c(460, 260, 170, 270))
parana.kc4 <- krige.conv(parana, loc=loc4, krige = KC, output = OC)

## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").

points(parana)
points(loc4, col=2, pch=19)

plot of chunk parana-kc4

## NULL

plot of chunk parana-kc4-hist

Back to table of contents

Bayesian Inference

For the Bayesian analysis both, inference on model parameters and spatial predictions (if prediction locations are provided) are performed with a call to the function krige.bayes(). In the next call we run an analysis setting a discrete prior for the parameter in the (exponential) correlation function and fixing the “nugget” parameter to zero. More general call can set a discrete prior also for the relative nugget through the “tausq.rel” parameter.

args(krige.bayes)
parana.bayes <- krige.bayes(parana, loc=parana.gr, 
                           model = model.control(trend.d="1st", trend.l="1st"),
               prior= prior.control(phi.prior="rec", phi.disc=seq(0,150, by=15)),
               output = OC)

## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").

names(parana.bayes)

## function (geodata, coords = geodata$coords, data = geodata$data, 
##     locations = "no", borders, model, prior, output) 
## NULL

## [1] "posterior"    "predictive"   "prior"        "model"        ".Random.seed"
## [6] "max.dist"     "call"

names(parana.bayes$posterior)
plot(parana.bayes)

## [1] "beta"                "sigmasq"             "phi"                
## [4] "tausq.rel"           "joint.phi.tausq.rel" "sample"

## parameter `tausq.rel` is fixed

plot of chunk parana-bayes-prior-post

For predictions, the output is stored in the obj$predictive are similar to the returned by krige.conv() and can be explored and ploted using similar function calls.

names(parana.bayes$predictive)
par(mfrow=c(1,3))
image(parana.bayes, col=terrain.colors(21))

## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").

image(parana.bayes, val = apply(parana.bayes$pred$simul, 1, quantile, prob=0.90),
      col=terrain.colors(21))

## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").

hist(apply(parana.bayes$pred$simul, 2, median), main="")

## [1] "mean"                      "variance"                  "distribution"             
## [4] "simulations"               "mean.simulations"          "variance.simulations"     
## [7] "quantiles.simulations"     "probabilities.simulations" "sim.means"

How to cite geoR

citation(package="geoR")

## 
## To cite the package geoR in publications use:
## 
##   Diggle, P.J. & Ribeiro Jr, P.J. Model Based Geostatistics Springer, New York,
##   2007
## 
##   Paulo J. Ribeiro Jr & Peter J. Diggle geoR: a package for geostatistical
##   analysis R-NEWS, 1(2):15-18. June, 2001

This page was produced with the knitr using this Rmarkdown source file. The HTML was generated with the following command

require(knitr)
knit2html("geoR-PASI.Rmd")

and a file with only the R code can be extracted (tangle) with:

Back to table of contents