June 16-26, 2014
Búzios, RJ, Brazil
Paulo Justiniano Ribeiro Jr (I),
(I) LEG: Statistics and Geoinformation Lab), DEST: Department of Statistics, UFPR: Federal University of Paraná
This is a brief overview of some functionalities in geoR.
We refer to the
geoR “Tutorials” page
for further examples.
##-----------------------------------------------------------------------------
##
## Pratical session: geostatistical analysis with geoR package
##
## Paulo Justiniano Ribeiro Jr
##
##-----------------------------------------------------------------------------
## Required package.
require(geoR)
## Loading a data-set included in the package
data(s100)
A quick data display with exploratory checks on spatial patterns, trends on the coordinates and distribution of the data.
plot(s100, lowess=TRUE)
Another function allows for mode options for displaying the data.
points(s100)
args(points.geodata)
## function (x, coords = x$coords, data = x$data, data.col = 1,
## borders = x$borders, pt.divide = c("data.proportional", "rank.proportional",
## "quintiles", "quartiles", "deciles", "equal"), lambda = 1,
## trend = "cte", abs.residuals = FALSE, weights.divide = "units.m",
## cex.min, cex.max, cex.var, pch.seq, col.seq, add.to.plot = FALSE,
## x.leg, y.leg = NULL, dig.leg = 2, round.quantiles = FALSE,
## permute = FALSE, ...)
## NULL
The brief summary include information on the distances between data points.
summary(s100)
## $n
## [1] 100
##
## $coords.summary
## Coord.X Coord.Y
## min 0.005638 0.01091
## max 0.983921 0.99125
##
## $distances.summary
## min max
## 0.007641 1.278175
##
## $data.summary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.170 0.273 1.100 0.931 1.610 2.870
##
## $others
## [1] "cov.model" "nugget" "cov.pars" "kappa" "lambda"
##
## attr(,"class")
## [1] "summary.geodata"
There are several options for computing the empirical variogram. Here we use default options except for the argument setting the maximum distance to be considered between pairs of points.
s100.v <- variog(s100, max.dist=1)
plot(s100.v)
Turning to the parameter estimation we fit the model with a constant mean, isotropic exponential correlation function and allowing for estimating the variance of the conditionally independent observations (nugget variance)
(s100.ml <- likfit(s100, ini=c(1, 0.15)))
summary(s100.ml)
## likfit: estimated model parameters:
## beta tausq sigmasq phi
## "0.777" "0.000" "0.752" "0.183"
## Practical Range with cor=0.05 for asymptotic range: 0.5474
##
## likfit: maximised log-likelihood = -83.6
## Summary of the parameter estimation
## -----------------------------------
## Estimation method: maximum likelihood
##
## Parameters of the mean component (trend):
## beta
## 0.777
##
## Parameters of the spatial component:
## correlation function: exponential
## (estimated) variance parameter sigmasq (partial sill) = 0.752
## (estimated) cor. fct. parameter phi (range parameter) = 0.183
## anisotropy parameters:
## (fixed) anisotropy angle = 0 ( 0 degrees )
## (fixed) anisotropy ratio = 1
##
## Parameter of the error component:
## (estimated) nugget = 0
##
## Transformation parameter:
## (fixed) Box-Cox parameter = 1 (no transformation)
##
## Practical Range with cor=0.05 for asymptotic range: 0.5474
##
## Maximised Likelihood:
## log.L n.params AIC BIC
## "-83.6" "4" "175" "186"
##
## non spatial model:
## log.L n.params AIC BIC
## "-126" "2" "256" "261"
##
## Call:
## likfit(geodata = s100, ini.cov.pars = c(1, 0.15))
Finally, we start the spatial prediction defining a grid of points. The kriging function by default performs ordinary kriging It minimaly requires the data, prediction locations and estimated model parameters.
s100.gr <- expand.grid((0:100)/100, (0:100)/100)
s100.kc <- krige.conv(s100, locations=s100.gr, krige=krige.control(obj.model=s100.ml))
## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").
names(s100.kc)
## [1] "predict" "krige.var" "beta.est" "distribution" "message"
## [6] "call"
If the locations forms a grid the predictions can be visualised as image, contour or persp plots. Image plots shows the predicted values (left) and the respective std errors.
par(mfrow=c(1,2), mar=c(3.5, 3.5, 0.5, 0.5))
image(s100.kc, col=gray(seq(1, 0.2, l=21)))
contour(s100.kc, nlevels=11, add=TRUE)
image(s100.kc, val = sqrt(krige.var), col=gray(seq(1, 0.2, l=21)))
It is worth notice that under the Gaussian model the prediction errors depends on the coordinates of the data, and not to their values (only through model parameter estimates).
image(s100.kc, val = sqrt(krige.var), col=gray(seq(1, 0.2, l=21)), coords.data=TRUE)
There are some datasets included in geoR which can be listed as follows. We use some of them to illustrate further options in the functions.
##-----------------------------------------------------------------------------
## Datasets included in geoR.
data(package="geoR")
Some of them are of the class “geodata” which is convenient (but not compulsory!) to use with geoR' functions. Others are simply data.frames from which “geodata” can be created. “geodata” (S3) objects are just lists with a class “geodata” and must have elements “coords” and “data”. Optional elements includes “covariates”, “units.m” among others (see help(as.geodata)).
class(s100)
class(parana)
class(Ksat)
class(camg)
head(camg)
ctc020 <- as.geodata(camg, coords.col=c(1,2), data.col=7, covar.col=c(3,4))
## [1] "geodata"
## [1] "geodata"
## [1] "geodata"
## [1] "data.frame"
## east north elevation region ca020 mg020 ctc020 ca2040 mg2040 ctc2040
## 1 5710 4829 6.10 3 52 18 106.0 40 16 86.3
## 2 5727 4875 6.05 3 57 20 131.1 49 21 123.1
## 3 5745 4922 6.30 3 72 22 114.6 63 22 101.7
## 4 5764 4969 6.60 3 74 11 114.4 74 12 95.6
## 5 5781 5015 6.60 3 68 34 124.4 44 36 106.3
## 6 5799 5062 5.75 3 45 27 132.9 27 22 105.3
The function calls shown above mostly use default options and it is worth examine the arguments (and the help files).
args(plot.geodata)
args(points.geodata)
args(variog)
## function (x, coords = x$coords, data = x$data, borders = x$borders,
## trend = "cte", lambda = 1, col.data = 1, weights.divide = "units.m",
## lowess = FALSE, scatter3d = FALSE, density = TRUE, rug = TRUE,
## qt.col, ...)
## NULL
## function (x, coords = x$coords, data = x$data, data.col = 1,
## borders = x$borders, pt.divide = c("data.proportional", "rank.proportional",
## "quintiles", "quartiles", "deciles", "equal"), lambda = 1,
## trend = "cte", abs.residuals = FALSE, weights.divide = "units.m",
## cex.min, cex.max, cex.var, pch.seq, col.seq, add.to.plot = FALSE,
## x.leg, y.leg = NULL, dig.leg = 2, round.quantiles = FALSE,
## permute = FALSE, ...)
## NULL
## function (geodata, coords = geodata$coords, data = geodata$data,
## uvec = "default", breaks = "default", trend = "cte", lambda = 1,
## option = c("bin", "cloud", "smooth"), estimator.type = c("classical",
## "modulus"), nugget.tolerance, max.dist, pairs.min = 2,
## bin.cloud = FALSE, direction = "omnidirectional", tolerance = pi/8,
## unit.angle = c("radians", "degrees"), angles = FALSE, messages,
## ...)
## NULL
We point in particular to three arguments:
Borders are conveniently dealt with if included in the geodata object. As an example lets add borders to the s100 data
s100$borders <- matrix(c(0,0,1,0,1,1,0,1,0,0), ncol=2, byrow=TRUE)
points(s100)
If you don't have the coordinates of the border consider setting them manually by using “locator()” over the “points()” plot.
The dataset Ksat with data on the saturated soil conductivity has a clear asymetric behaviour and the BoxCox transformation suggests to log transform the data.
data(Ksat)
require(MASS)
summary(Ksat)
boxcox(Ksat)
## $n
## [1] 32
##
## $coords.summary
## x y
## min 2.5 0.8
## max 21.0 9.1
##
## $distances.summary
## min max
## 0.5099 18.5529
##
## $data.summary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0200 0.0922 0.3050 0.8720 1.1400 4.3000
##
## $borders.summary
## x y
## min 0.0 0
## max 22.5 13
##
## attr(,"class")
## [1] "summary.geodata"
Differences on exploratory plots are clear.
plot(Ksat, lowess=TRUE)
plot(Ksat, lowess=TRUE, lambda=0)
We now use the Paraná rainfall data to illustrate the usage and effect of trend argument. Notice also the data also include the borders of the state
data(parana)
summary(parana)
## $n
## [1] 143
##
## $coords.summary
## east north
## min 150.1 70.36
## max 768.5 461.97
##
## $distances.summary
## min max
## 1.0 619.5
##
## $data.summary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 163 234 270 274 318 414
##
## $borders.summary
## east north
## min 138.0 46.77
## max 798.6 507.93
##
## $others
## [1] "loci.paper"
##
## attr(,"class")
## [1] "summary.geodata"
It is clear from the exploratory plot that there is a gradient of the response along the L-W and N-S coordinates.
plot(parana, lowess=TRUE)
If the argument trend is provided a (ordinary) regression is fitted and the residuals are plotted. The options trend=“1st” defines a linear trend on the coordinates and is equivalent to trend=~coords. More generally a formula on a set of covariates can be used.
plot(parana, trend = "1st", lowess=TRUE)
The effect on the trend in the variograms are very much noticeable in this example.
par(mfrow=c(1,2))
parana.vario <- variog(parana, max.dist=400)
plot(parana.vario)
parana.variot <- variog(parana, trend="1st", max.dist=400)
plot(parana.variot)
Next we show a simple Monte Carlo test using the empirical variogram. The envelope reflects the assumption of no spatial correlation and is given by permutations of the data across the sampling locations.
s100.v <- variog(s100, max.dist=1)
s100.v.env <- variog.mc.env(s100, obj.variog=s100.v)
plot(s100.v, env=v.env)
To close this section we show a helper function which facilitate building empirical variograms on different directions which may be used to investigate anisotropy.
s100.v4 <- variog4(s100, max.dist=1)
plot(s100.v4, env=s100.v.env, omni=TRUE)
The are four methods for estimating the (covariance) model parameters implemented in geoR.
We now illustrate the first three, leaving Bayesian inference for later.
eyefit() provides an interactive visual model fitting tool which can be used on its own to provide estimates, or to set reasonable initial values for the other methods. Try experimenting with the following.
data(s100)
s100.v <- variog(s100, max.dist=1)
plot(s100)
eyefit(s100)
variofit() allows for several options. In what follows we show variogram fits for the parana data under constant mean and a trend on the coordinates. By default a exponential correlation function (which corresponds to the Martèrn model with kappa=0.5) is used. We also show fits for e spherical and Matèrn (with kappa=1.5) variogram models.
parana.vfit.exp <- variofit(parana.vario)
parana.vfit.mat1.5 <- variofit(parana.vario, kappa=1.5)
parana.vfit.sph <- variofit(parana.vario, cov.model="sph")
#
parana.vtfit.exp <- variofit(parana.variot)
parana.vtfit.mat1.5 <- variofit(parana.variot, kappa=1.5)
parana.vtfit.sph <- variofit(parana.variot, cov.model="sph")
par(mfrow=c(1,2))
plot(parana.vario)
lines(parana.vfit.exp);lines(parana.vfit.mat1.5, col=2);lines(parana.vfit.sph, col=4)
plot(parana.variot)
lines(parana.vtfit.exp);lines(parana.vtfit.mat1.5, col=2);lines(parana.vtfit.sph, col=4)
Maximum likelihood estimates are obtained fitting the model to the data (and not to the empirical variogram). Models can be compared by fitting measures.
(parana.ml0 <- likfit(parana, ini=c(4500,50), nug=500))
(parana.ml1 <- likfit(parana, trend="1st", ini=c(1000,50), nug=100))
(parana.ml2 <- likfit(parana, trend="2nd", ini=c(1000,50), nug=100))
logLik(parana.ml0)
logLik(parana.ml1)
logLik(parana.ml2)
## likfit: estimated model parameters:
## beta tausq sigmasq phi
## " 243" " 359" "9594" "1659"
## Practical Range with cor=0.05 for asymptotic range: 4969
##
## likfit: maximised log-likelihood = -672
## likfit: estimated model parameters:
## beta0 beta1 beta2 tausq sigmasq phi
## "416.498" " -0.138" " -0.400" "385.518" "785.690" "184.386"
## Practical Range with cor=0.05 for asymptotic range: 552.4
##
## likfit: maximised log-likelihood = -664
## likfit: estimated model parameters:
## beta0 beta1 beta2 beta3 beta4 beta5 tausq sigmasq
## "423.9282" " 0.0620" " -0.6360" " -0.0004" " 0.0000" " 0.0006" "381.2267" "372.5993"
## phi
## " 77.5441"
## Practical Range with cor=0.05 for asymptotic range: 232.3
##
## likfit: maximised log-likelihood = -660
## 'log Lik.' -671.6 (df=4)
## 'log Lik.' -663.9 (df=6)
## 'log Lik.' -660.2 (df=9)
There are two functions for spatial prediction in geoR.
For the former, the function takes: the data, coordinates of the locations, kriging options (through krige.control()) and output options (through output.control()). We use the Paraná rainfall data to illustrate more options for this function.
The first step is to define a grid of prediction locations.
parana.gr <- pred_grid(parana$borders, by=15)
points(parana)
points(parana.gr, pch=19, col=2, cex=0.25)
parana.gr0 <- locations.inside(parana.gr, parana$borders)
points(parana.gr0, pch=19, col=4, cex=0.25)
The following kriging call corresponds to the “universal kriging”. The arguments “trend.d” and “trend.l” require the specification (or terms in a formula) on both, data and prediction locations, respectively.
args(krige.control)
args(output.control)
KC <- krige.control(obj.m = parana.ml1, trend.d="1st", trend.l="1st")
OC <- output.control(simulations=TRUE, n.pred=1000,
quantile=c(0.10, 0.25, 0.5, 0.75, 0.90), threshold = 350)
parana.kc <- krige.conv(parana, loc=parana.gr, krige = KC, output = OC)
names(parana.kc)
## function (type.krige = "ok", trend.d = "cte", trend.l = "cte",
## obj.model = NULL, beta, cov.model, cov.pars, kappa, nugget,
## micro.scale = 0, dist.epsilon = 1e-10, aniso.pars, lambda)
## NULL
## function (n.posterior, n.predictive, moments, n.back.moments,
## simulations.predictive, mean.var, quantile, threshold, sim.means,
## sim.vars, signal, messages)
## NULL
## [1] "predict" "krige.var" "beta.est"
## [4] "distribution" "simulations" "mean.simulations"
## [7] "variance.simulations" "quantiles.simulations" "probabilities.simulations"
## [10] "sim.means" ".Random.seed" "message"
## [13] "call"
From the output provided we picked the following in the next plot: prediction means, medians, a simulation from the (joint) predictive distribution, and the probability of exceeding the value of 350.
par(mfrow=c(2,2), mar=c(3,3,0.5, 0.5))
image(parana.kc, col = terrain.colors(21), x.leg=c(500, 750), y.leg=c(0, 50))
## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").
image(parana.kc, val = parana.kc$quantile[,3], col=terrain.colors(21),
x.leg=c(500, 750), y.leg=c(0, 50))
## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").
image(parana.kc, val = parana.kc$simulation[,1], col=terrain.colors(21),
x.leg=c(500, 750), y.leg=c(0, 50))
## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").
image(parana.kc, val = 1-parana.kc$prob, col=terrain.colors(21),
x.leg=c(500, 750), y.leg=c(0, 50))
## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").
More generally, simulations can be used to derive maps of quantities of interest. The top panels in next example calls extract minimum and maximum simulated values at each location. The bottom panels shows probabilities of exceeding 300 at each location, and predictive distribution of the proportion of the area exceeding 300.
par(mfrow=c(2,2), mar=c(3,3,0.5, 0.5))
image(parana.kc, val = apply(parana.kc$simulation, 1, min), col=terrain.colors(21),
x.leg=c(500, 750), y.leg=c(0, 50))
## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").
image(parana.kc, val = apply(parana.kc$simulation, 1, max), col=terrain.colors(21),
x.leg=c(500, 750), y.leg=c(0, 50))
## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").
image(parana.kc, val=apply(parana.kc$simulations, 1, function(x) mean(x > 300, na.rm=T)),
x.leg=c(500, 750), y.leg=c(0,50), col=gray(seq(1, 0.2, length=21)))
## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").
hist(apply(parana.kc$simulations, 2, function(x) mean(x > 300, na.rm=T)), main="")
We now pick four particular locations for prediction
loc4 <- cbind(c(300, 480, 680, 244), c(460, 260, 170, 270))
parana.kc4 <- krige.conv(parana, loc=loc4, krige = KC, output = OC)
## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").
points(parana)
points(loc4, col=2, pch=19)
## NULL
For the Bayesian analysis both, inference on model parameters and spatial predictions (if prediction locations are provided) are performed with a call to the function krige.bayes(). In the next call we run an analysis setting a discrete prior for the parameter in the (exponential) correlation function and fixing the “nugget” parameter to zero. More general call can set a discrete prior also for the relative nugget through the “tausq.rel” parameter.
args(krige.bayes)
parana.bayes <- krige.bayes(parana, loc=parana.gr,
model = model.control(trend.d="1st", trend.l="1st"),
prior= prior.control(phi.prior="rec", phi.disc=seq(0,150, by=15)),
output = OC)
## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").
names(parana.bayes)
## function (geodata, coords = geodata$coords, data = geodata$data,
## locations = "no", borders, model, prior, output)
## NULL
## [1] "posterior" "predictive" "prior" "model" ".Random.seed"
## [6] "max.dist" "call"
names(parana.bayes$posterior)
plot(parana.bayes)
## [1] "beta" "sigmasq" "phi"
## [4] "tausq.rel" "joint.phi.tausq.rel" "sample"
## parameter `tausq.rel` is fixed
For predictions, the output is stored in the obj$predictive are similar to the returned by krige.conv() and can be explored and ploted using similar function calls.
names(parana.bayes$predictive)
par(mfrow=c(1,3))
image(parana.bayes, col=terrain.colors(21))
## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").
image(parana.bayes, val = apply(parana.bayes$pred$simul, 1, quantile, prob=0.90),
col=terrain.colors(21))
## Warning: 'overlayPointsWithPolygons' is deprecated.
## Use 'over' instead.
## See help("Deprecated") and help("sp-deprecated").
hist(apply(parana.bayes$pred$simul, 2, median), main="")
## [1] "mean" "variance" "distribution"
## [4] "simulations" "mean.simulations" "variance.simulations"
## [7] "quantiles.simulations" "probabilities.simulations" "sim.means"
citation(package="geoR")
##
## To cite the package geoR in publications use:
##
## Diggle, P.J. & Ribeiro Jr, P.J. Model Based Geostatistics Springer, New York,
## 2007
##
## Paulo J. Ribeiro Jr & Peter J. Diggle geoR: a package for geostatistical
## analysis R-NEWS, 1(2):15-18. June, 2001
This page was produced with the knitr using this Rmarkdown source file. The HTML was generated with the following command
require(knitr)
knit2html("geoR-PASI.Rmd")
and a file with only the R code can be extracted (tangle) with: