Spatial Sampling Design and Soil Science

Spatial statistics is the science of the analysis of geo-referenced data and loosely speaking may be divided into the three sub-areas analysis of point processes, analysis of areal data and geostatistics. Point processes naturally arise for example in geophysics, when the locations of earthquakes are noticed, in epidemiology, where new illness cases of certain epidemies are mapped geographically or in biology, where cell centers of a certain tissue are mapped under the microscope. Areal data are data that are attached to areas like number of illness cases in certain medical districts, percent of grassland in a certain county or number of votes in certain political districts. The topic of this article is but geostatistics, the science of continuous stochastic processes or so-called random ﬁelds that are deﬁned over some region in 2- or 3-dimensional geographic space X or in space-time. A random ﬁeld { Y ( x ) : x ∈ X } is a set of random variables or so-called regionalized variables Y ( x ) that are attached to every location x ∈ X . The probability law of the random ﬁeld is uniquelly determined by its so-called projective family, the set of all ﬁnite dimensional distributions of any ﬁnite set of Y ( x ) obeying symmetrie and consistency with marginal distributions although we will see next that geostatistics most often deals only with the ﬁrst and second order characteristics of random ﬁelds, the trend or mean function m ( x ) = E ( Y ( x )) and the covariance function C ( x , y ) = Cov ( Y ( x ) , Y ( y )) . Most often a linear trend function m ( x ) = f ( x ) T β , where f ( x ) is a ﬁxed vector-valued function and β is a regression parameter vector to be estimated, is sufﬁcient for modelling purposes. For x ∈ ℜ 2 , f ( x ) could be for example a vector of polynomials in the coordinates x = ( x 1 , x 2 ) . The covariance function C ( .,. ) must be positive semideﬁnite, meaning that it must give any linear combination of Y ( x 1 ) , Y ( x 2 Our dependence on soil, and our curiosity about it, is leading to the investigation of changes within soil processes. Furthermore, the diversity and dynamics of soil are enabling new discoveries and insights, which help us to understand the variations in soil processes. Consequently, this permits us to take the necessary measures for soil protection, thus promoting soil health. This book aims to provide an up-to-date account of the current state of knowledge in recent practices and assessments in soil science. Moreover, it presents a comprehensive evaluation of the effect of residue/waste application on soil properties and, further, on the mechanism of plant adaptation and plant growth. Interesting examples of simulation using various models dealing with carbon sequestration, ecosystem respiration, and soil landscape, etc. are demonstrated. The book also includes chapters on the analysis of areal data and geostatistics using different assessment methods. More recent developments in analytical techniques used to obtain answers to the various physical mechanisms, chemical, and biological processes in soil are also present.


14
www.intechopen.com variable Y(x 0 ) to be predicted. It can be written is the generalized least squares estimate of the regression parameter vector β and F is the design matrix corresponding to f(.) and the locations x 1 , x 2 ,...,x n .T h em e a ns q u a r e de r r o r (MSEP) of this unbiased predictor is given by where Obviously, the mean squared error σ 2 (x 0 ) is dependent on the arrangement of the sampling locations x 1 , x 2 ,...,x n via the covariance matrix K, the covariance vector c 0 and the design matrix F. Like with any statistical prediction method one aim of the statistician in kriging is to make best use of the available data, maybe under certain budget constraints or constraints on the number of locations that can be sampled. Thus, the aim is to get as good predictions as possible for the complete area of investigation X. One possibility to formalize this aim is to try to select the sampling locations x 1 , x 2 ,...,x n in such a way that the sum of all kriging MSEP's X σ 2 (x 0 )dx 0 (5) becomes a minimum over the area of investigation X. Notably, this is a very complicated optimization problem and becomes still more complicated by the fact that the covariance matrix K enters the kriging MSEP in its inverse form K −1 . It has been tried to solve this sampling design problem besides other criteria that also measure the accuracy of kriging predictions in a number of recent research papers. The aim of the next section is to discuss some of these papers. In soil science especially the remediation of contaminated waste soils is of importance. Not only the area of contamination must be best detected but the contamination concentrations must be optimally predicted. Most often then soil with contamination level above a certain threshold must be remediated. Sampling design for contamination detection in soil science most often works in three phases: 1. The area of contaminated soil must be detected: Here most often samples on a coarse regular rectangular grid are taken to detect the contaminated area.
2. The spatial variability or the covariance function must be best estimated: Here the design from stage 1.) is updated by some nearby locations in order to get also some good estimates of the nugget effect and the small scale variability (the appearance of the covariance function near the origin).
3. Contamination concentrations must be best predicted and their uncertainties be documented.
This work especially deals with phase 2.) and phase 3.). We consider as design criterion for 3.) first the integrated kriging variance averaged over the area of contamination. Next as a combined design criterion for 2.) and 3.) the averaged expected lengths of predictive intervals are considered. Finally it is shown how these design criteria can be generalized to spatial variables having skewed distributions. In this context a criterion for spatial sampling design with Box-Cox transformed spatial variables is proposed.
Mathematically speaking we will approximate the investigated spatial random field by means of a large regression model consisting of cosine-sine Bessel surface harmonics with random amplitudes (Sections 3,4). This approximating regression model is a direct result of the polar spectral representation theorem of isotropic random fields (Section 3). We show that kriging prediction in the original random field model is equivalent to trend prediction in this approximating Bayesian linear regression model (Section 4). Thus, standard convex experimental design theory for Bayesian linear regression may be used to find spatial sampling designs. Based on this theory we propose two algorithms for the calculation of exact designs (Section 7.3): • An algorithm for removing redundant design locations.
• An algorithm for adding sampling locations to an existing network.
In Section 5 we generalize our approach to the case that the covariance function is estimated by restricted maximumum likelihood (REML) and thus is uncertain. The result is that the actual predictive intervals are wider than when not taking into account this uncertainty. The Smith and Zhu design criterion adjusts to this uncertainty by considering the average of the expected lengths of predictive intervals as a design criterion. Section 6 again is a generalization, because here we assume the data distribution to be skewed and to be transformable to a Gaussian distribution by means of a Box-Cox transformation. Once again the average of the expected lengths of predictive intervals is considered as a design criterion. The proposed methodologies are illustrated by a true data example from soil science (Section 8). Finally, one from the author freely available MATLAB and Octave toolbox is described which can perform all the tasks of spatial sampling design and spatial interpolation (Section 7).

A review on spatial sampling design
The importance of (optimal) spatial sampling design considerations for environmental applications and soil science has been demonstrated in quite a few papers and monographs (Brus and de Gruijter, 1997;Brus and Heuvelink, 2007;Caeiro et al., 2003;Cox Jr., 1999;Delmelle and Goovaerts, 2009;Diggle and Lophaven, 2006;Dobbie et al., 2008;Groenigen et al., 1999;Hengl et al., 2003;Lark, 2002;US-EPA, 2002). The papers on spatial sampling design may be divided into several categories of which some are overlapping.
First of all we may differentiate between design criteria for spatial prediction and for estimation of the covariance function and between combined criteria for both desires at once. Works falling into the category of criteria for prediction are (Brus and Heuvelink, 2007;Fedorov and Flanagan, 1997;Müller and Pazman, 1998;1999;Müller, 2005;Pazman and Müller, 2001). Criteria for the estimation of the covariance function are considered by (Lark, 2002;Müller and Zimmerman, 1999;Zimmerman and Homer, 1991). Combined criteria we can find in the article (Zhu and Stein, 2006), who condsider the minimization of the average expected length of predictive intervals. Further papers falling into this category of combined criteria are Bayesian articles specifying a priori distributions over covariance functions like (Brown et al., 1994); Diggle and Lophaven, 2006;Fuentes et al., Müller et al., 2004). Actually (Brown et al., 1994)) and (Fuentes et al., 2007) consider the covariance function to be non-stationary and deal with an entropy based design criterion according to which the determinant of the covariance matrix between locations to be added to the design must be maximized. Both make use of simulated annealing algorithms to find optimal designs obeying their criteria. At this stage we are at a second distinguishing feature of optimal design algorithms. We can distinguish beween stochastic search algorithms like simulated annealing (Aarts and Korst, 1989) or evolutionary genetic algorithms and deterministic algorithms for optimizing the investigated design criteria. With the exception of the works of (Fedorov and Flanagan, 1997;Müller and Pazman, 1998;1999;Müller, 2005;Pazman and Müller, 2001;Spöck and Pilz, 2010) almost all algorithms for spatial sampling design optimization use stochastic search algorithms for the finding of optimal configurations of sampling locations x 1 , x 2 ,...,x n . The term spatial simulated annealing (SSA) finds its first manifestation in the work of (Groenigen et al., 1999). (Trujillo-Ventura and Ellis, 1991) consider multiobjective sampling design optimization. Available free software for spatial sampling design optimization is quite rare. Up to the knowledge of the author there are only 6 sampling design toolboxes freely available including an own one: (Gramacy, 2007) has implemented sampling design for treed Gaussian random fields in the R-package tgp. In treed Gaussian random fields the area X of investigation is partitioned by means of classification trees into rectangular sub-areas with sides parallel to the coordinate axes. This software is especially useful for the design of computer simulation experiments, where parameters guiding the computer simulation output are identified as spatial coordinates. Another software especially useful for computer simulation experiments and sequential design is the DAKOTA package (http://dakota.sandia.gov).
Further papers falling into the category of computer simulation experiments are (Chen et al., 2006;Gramacy and Lee, 2010;Kleijnen et van Beers, 2004;Lim et al., 2002;Mitchell and Morris, 1992;Morris et al., 1993;Schonlau, 1997). Software freely available upon request for research purposes and monitoring network optimization is (Le and Zidek, 2009). This software implements the entropy based design criterion mentioned above. (Gebhardt, 2003) implements a branch and bound algorithm for designing with the criterion (5). (Baume et al. , 2011) compare different greedy algorithms for spatial design. Maybe this small list of software for spatial sampling design is not complete and hopefully more software can be obtained from the different authors of research papers upon request. Since for the practioneer there is a strong need for spatial sampling design and almost no software is freely available this was an impetus for the author of this chapter to program a toolbox for spatial sampling design. The spatDesign Matlab toolbox of the author is freely available online at • http://wwwu.uni-klu.ac.at/guspoeck/spatDesignMatlab.zip An Octave version not implementing as many design criteria as the Matlab toolbox is available at • http://wwwu.uni-klu.ac.at/guspoeck/spatDesignOctave.zip

The spatial mixed linear model
This section starts our own approach to spatial sampling design. No stochastic search algorithms like simulated annealing to optimize the design criterion are needed in this approach because we make use of the mathematical structure of the investigated design criteria.
We consider a mean square continuous (m.s.c.) and isotropic random field {Y(x) : where f(x) is a known vector of regression functions, β ∈ R r a vector of unknown regression parameters and Cov(Y(x), Y(y)) = C(||x − y||); x, y ∈ X.( 7 ) Then, according to (Yaglom, 1987), the covariance function can be represented in the form where J 0 (.) is the Bessel function of the first kind and order 0, t = ||x − y|| is the Euclidean distance between x and y,a n dG(.) is the so-called (polar) spectral distribution function associated with C(.).A ss u c hG(.) is positive, monotonically increasing and bounded from above. On the other hand, knowing C(.) its spectral distribution can be obtained from the inversion formula where G(ω + ) andG(ω − ) denote the right-and left-hand side limits at ω and J 1 denotes the Bessel function of first kind and order 1. Approximating G(.) by means of a step function with positive jumps a 2 i = G(ω i+1 ) − G(ω i ) at preselected points ω i , i = 0, 1, , . . . , n − 1, and changing to polar coordinates (t, ϕ)=( radius, angle) the polar spectral representation theorem for m.s.c. isotropic random fields tells us that the error process may be approximated as where all the random variables U m,i and V m,i are uncorrelated, have mean zero, and their variances are var(U m,i )=var(V m,i )=d m a 2 i ;a n dd m = 1f o rm = 0a n dd m = 2f o rm ≥ 1. By truncating the above series at a sufficiently large m = M, we get an approximation of our random field in form of a mixed linear model From above it becomes clear that the components of the additional regression vector g(·) are made up of the following radial basis functions (cosine-sine-Bessel-harmonics)

261
Spatial Sampling Design and Soil Science www.intechopen.com

Classical Bayesian experimental design problem
Starting from our spatial mixed linear model (11) we may gain further flexibility with a Bayesian approach incorporating prior knowledge on the trend. To this we assume that the regression parameter vector β is random with This is exactly in the spirit of (Omre, 1987) who introduced Bayesian kriging this way. He used physical process knowledge to arrive at "qualified guesses" for the first and second order moments, μ and Φ. On the other hand, the state of prior ignorance or non-informativity can be modelled by setting μ = 0 and letting Φ −1 tend to the matrix of zeroes, thus passing the "Bayesian bridge" to universal kriging, see (Omre and Halvorsen, 1989). Now, combining (11) and (13), we arrive at the Bayesian spatial linear model (BSLM) where Here ε 0 (x) is white-noise with variance σ 2 0 and A denotes the covariance matrix of α,resulting after the polar spectral approximation of the random field. (Spöck and Pilz, 2010) demonstrate that Bayesian linear trend estimation in the above BSLM actually approximates Bayesian linear kriging in the original model abitrarily closely. The same is true for the total mean squared error (TMSEP) of the trend prediction and the TMSEP of Bayesian kriging. Thus taking the TMSEP of the trend prediction in the approximating model as a substitute for the Bayes kriging TMSEP we arrive at the following classical experimental design problem for so-called I-optimality: Here d n = {x 1 , x 2 ,...,x n } collects either the design points to be added to the monitoring network or in the case of reducing the network the design points remaining in the monitoring network. H(d n ) expresses the dependence of the design matrix H =( h(x i ) T ) i=1,2,...,n on the design points in the set d n .
At this point we advise the reader not familiar with Bayesian experimental design theory to read the Appendix of (Spöck and Pilz, 2010). The key point in this theory is that the above so-called concrete design problem seemingly showing no mathematical structure may be expanded to a so-called continuous design problem that has the nice feature to be a convex optimization problem. Thus, the whole apparatus of convex optimization theory is available to approximately solve the above design problem for I-optimality. In particular, directional derivatives may be calculated and optimal continuous designs may be found by steepest descent algorithms. Continuous designs are just probability measures ξ on X and may be rounded to exact designs d n . Defining the so-called continuous Bayesian information matrix and it may be shown that the set of all such information matrices is convex and compact and that the extended design functional is convex and continuous in M B (ξ). The above design functional Ψ(.) thus attains its minimum at a design ξ * ∈ Ξ,w h e r eΞ is the set of all probability measures defined on the compact design region X, see (Pilz, 1991). The closeness of exact designs d n to the optimal continuous design ξ * may be judged by means of a well-known efficiency formula, see Appendix of (Spöck and Pilz, 2010).

The Smith and Zhu design criterion
In real world applications the isotropic covariance function C θ (t) is always uncertain and estimated. The kriging predictor used is then based on this estimated covariance function Cθ(t). Thus, the kriging predictor is always a plug-in predictor and the reported (plug-in) kriging variance underestimates the true variance of this plug-in predictor. (Smith and Zhu, 2004) consider spatial sampling design by means of minimizing the average of the expected lengths of 1 − α predictive intervals: Their predictors of the α/2 and 1 − α/2 quantiles of the predictive distributions are selected in such a way that the corresponding predictive intervals have coverage probability bias 0. The predictors of the mentioned quantiles are essentially the plug-in kriging predictors based on restricted maximum likelihood (REML) estimation of the covariance function plus/minus a scaled plug-in kriging standard error term that is corrected to take account of REML estimation. Based on Laplace approximation they show that this design criterion up to order O(n −2 ),wheren is the number of data, is equivalent to: is the Fisher information matrix for REML, z 1−α/2 is the 1 − α/2-quantile of the standard normal distribution, σ 2 θ (x 0 ) is the universal kriging variance at x 0 and λ θ (x 0 ) is the universal kriging weights vector for prediction at x 0 . This design criterion takes both prediction accuracy and covariance uncertainty into account. Sections 3 and 4 have demonstrated that by using the BSLM (14) as approximation to the true isotropic random field the I-optimality design criterion can be completely expressed in terms of the Bayesian information matrix Going from this information matrix to its continuous version where ξ is a probability measure on the design space X, the extended design functional becomes continuous and convex on the compact and convex set of all such information matrices M B (ξ). This was the reason why classical convex experimental design algorithms could be used to find optimal spatial sampling designs minimizing the criterion (18).
In (Spöck et al., 2011) it is shown that also the Smith and Zhu design criterion has some favourable properties, so that classical convex experimental design theory can be applied to this design criterion, too: • Expression (20) can be expressed completely in terms of the Bayesian information matrix M B .
• The design functional is continuous on the convex and compact set of all M B (ξ) and has some advantageous properties according to which classical experimental design algorithms may be used in order to find spatial sampling designs.
Assuming the BSLM (14) the covariance function actually is parametrized in the diagonal matrix A and the nugget variance σ 2 0 . Since the Smith and Zhu design criterion assumes the covariance parameters to be estimated by restricted maximum likelihood we actually estimate this diagonal matrix A and σ 2 0 by this methodology. The a priori covariance matrix Φ = cov(β) must be given almost infinite diagonal values because the (Smith and Zhu, 2004) approach assumes the trend parameter vector β to be estimated by generalized least squares and Φ → ∞ bridges the gap from Bayesian linear to generalized least squares trend estimation. The a priori mean μ = E(β) can be set to 0 then. According to the polar spectral representation (10) several values in the diagonal matrix A are identical: where the definitions of d m and a 2 i and the indexing derive from the polar spectral representation (10). For restricted maximum likelihood estimation of A we have two possibilities: • We can leave the a i 's unspecified: This approach is almost nonparametric because a lot of a i 's and corresponding frequencies w i are needed to get the isotropic random field properly approximated and corresponds to a semiparametric estimation of the spectral distribution function via a step function.
• We can specify a parametric model for the a 2 i 's: The polar spectral density function for an isotropic random field over ℜ 2 possesing for example an exponential covariance function B(h)=Cexp(− 3h α ) is given by The polar spectral density function is defined just as the first derivative of the polar spectral distribution function G(w). A possible parametrization for the a 2 i 's then is where 0 = w 0 < w 1 < ...,w n are fixed frequencies.
For the optimization of the Smith and Zhu design criterion we make use of the same exchange design algorithms as described in Section 7.3 . We must only replace Ψ(M B (ξ)) by the Smith an Zhu design functional given in (Spöck et al., 2011).

Spatial sampling design for trans-Gaussian kriging
In trans-Gaussian kriging the originally positive valued data Z(x i ), i = 1, 2, . . . , n are transformed to Gaussianity by means of the Box-Cox transformation Let Z =(Z(x 1 ), Z(x 2 ),...,Z(x n )) T be the vector of original data and Y =(g λ (Z(x 1 )), g λ (Z(x 2 )),...,g λ (Z(x n ))) T be the vector of transformed data. The predictive density for trans-Gaussian kriging at a location x 0 then may be written: where Φ(.;Ŷ OK (x 0 ), σ 2 OK (x 0 )) is the Gaussian density with mean the ordinary kriging predictorŶ OK (x 0 ) at x o and based on the transformed variables Y, and variance the ordinary kriging variance σ 2 OK (x 0 ). z λ−1 is the Jacobian of the Box-Cox transformation. There is a nice relationship between the α-quantiles z α of the predictive density (31) and the Gaussian density Φ(.;Ŷ OK (x 0 ), σ 2 OK (x 0 )) which makes spatial sampling design an easy task: It can be shown that the transformed α-quantiles g λ (z α ) are actually equivalent to the α-quantiles y α of the Gaussian density Φ(.;Ŷ BK (x 0 ), σ 2 (x 0 )) that is truncated at 0 and where we only take the positive interval [0, ∞) for calculating these y α -quantiles. For spatial sampling design we can consider again the average expected length of 1 − α-predictive intervals. Once we have calculated the Gaussian density Φ(.;Ŷ OK (x 0 ), σ 2 OK (x 0 )), truncated it at 0, and have determined its α/2and 1 − α/2-quantiles y α/2 and y 1−α/2 the corresponding quantiles z α/2 and z 1−α/2 can be calculated easily by means of applying the inverse Box-Cox transformation to y α/2 and y 1−α/2 . In order to make the expected length of predictive intervals also dependent on REML-estimation of the covariance function, we can consider instead of the Gaussian density Φ(.;Ŷ OK (x 0 ), σ 2 OK (x 0 )) that unique Gaussian densitŷ Φ whose 0.025-and 0.975-quantiles are given by the (Smith and Zhu, 2004) Last but not least to get expected predictive intervals we must replace in the statistic t(Y)= Y OK (x 0 ) every variable Y(x i ) for which we do not have data by its ordinary kriging predictor based on the available data. Furthermore, we note that in the above approach we have not taken into account the fact that the transformation parameter λ itself is estimated too, i.e. by maximum likelihood, and then is plugged-into the ordinary kriging predictor. A future paper will take account of also this additional uncertainty.

The spatDesign toolbox
The spatial sampling design and geostatistics toolbox spatDesign has been developed since 2003. It can be run in both MATLAB and Octave and can be downloaded from: • http://wwwu.uni-klu.ac.at/guspoeck/spatDesignMatlab.zip • http://wwwu.uni-klu.ac.at/guspoeck/spatDesignOctave.zip The toolbox underlies the GNU Public Licence Version 3 or higher and thus is freely available.
In MATLAB (www.mathworks.com) the toolbox is fully functional but assumes that also the MATLAB Optimization and Statistics toolboxes are installed. In Octave functions like weighted-least-squares that make in MATLAB use of the Optimization Toolbox and especially of the function FMINCON do not work. But in near future also an Octave workaround for the function FMINCON will become available from the author. The spatial sampling design functions corresponding to the Smith and Zhu design criterion need on a standard PC a lot of computation time. For this reason this part of the toolbox has been parallelized to work with NVIDIA GPU's and the freely available MATLAB parallelization package GPUmat (www.gp-you.org). If you have a CUDA (www.nvidia.com) compatible graphics card installed then this will be automatically detected and the parallelized algorithms for the Smith and Zhu design criterion will be used. Unfortunately, almost no efforts have been undertaken to date to parallelize also Octave, so the Smith and Zhu part of the Octave toolbox will not work unless you have a lot of time to wait for the results in the unparallelized version. Wolfgang Nowak from the Institute of Hydraulic Engineering (IWS), University of Stuttgart, has added his FFT-Kriging Toolbox to the software package. This is really fast MATLAB and Octave code for Bayesian linear kriging with external drift and works in both 2-and 3-dimensional Euclidean space. The original kriging code of the spatDesign toolbox is not so fast as this code and implements only 2D interpolation. The reason is that the original code works with local kriging neighborhoods and as a consequence for every new prediction an often high dimensional covariance matrix K between given local observations must be inverted.
The spatDesign Toolbox provides code for all three areas of geostatistics, • covariance estimation and variography • spatial interpolation and kriging • spatial sampling design and planning of monitoring networks.
But the largest emphasis of the toolbox surely is on spatial sampling design and the planning of monitoring networks.

Covariance estimation and variography software
For covariance estimation and variography the empirical semivariogram estimator of (Matheron, 1962) where is implemented in both forms the isotropic one and the general stationary one. The corresponding MATLAB and Octave functions are EMPVARIOGRAM.m and EMPVARIOGRAMANISO.m. Having calculated the empirical semivariogram a theoretical semivariogram model can be fitted to the empirical semivariogram by means of weighted-least-squares. As theoretical semivariogram model a nested model or convex combination of an exponential and a Gaussian semivariogram model with different ranges and nugget is implemented. The corresponding MATLAB and Octave functions are WEIGHTEDLEASTSQUARES.m for the isotropic case and WEIGHTEDLEASTSQUARESANISO.m for the geometrically anisotropic case.
The toolbox implements also transformed-Gaussian kriging. For this reason a maximum likelihood estimation procedure has been implemented that estimates both the Box-Cox transformation parameter λ and the eventually geometrically anisotropic covariance function at once by means of a profile likelihood approach. Iteratively any kind of parameters are fixed and the likelihood function is maximized in the rest of the parameters until convergence. This function is quite slow since the likelihood function is iteratively maximized either in λ,t h e covariance parameters or the anisotropy parameters. The function performing these tasks is called ESTIMATE_TRANSFO_COV_ML.m.
In the up to here mentioned functions for covariance estimation the trend of the random field is assumed to be constant. For the case that the trend is of the form it must be eleminated before the covariance function or the semivariogram can be estimated. For this case the function CALCULATE_RESIDUALS_TREND_COVARIANCE.m has been implemented. Starting with a least squares estimate of β and the corresponding residuals iteratively empirical semivariograms of residuals, corresponding weighted least squares fits to the empirical semivariogram, generalized least squares estimates of β and again residuals and semivariograms.....are calculated until convergence.

Spatial interpolation and kriging software
As simplest interpolation routine Voronoi interpolation is implemented in the function VORONOIPOLYGONALINTERPOLATIONONGRID.m. Ungauged locations are given the same value as the closest datum location. Besides Voronoi interpolation also Bayesian linear kriging with external drift and transformed-Gaussian kriging based on the Box-Cox transformation are implemented. The corresponding functions are named KRIGELINEARBAYESONGRID.m and TRANSGAUSSIANKRIGINGONGRID.m. The output of the first function are the Bayesian kriging predictions and corresponding TMSEPs on a grid. The output of the second function are the skew predictive distributions from transformed-Gaussian kriging on a grid. Because the complete predictive distributions are calculated several statistics may be derived from them: The function VISUALIZEPOSTQUANTILE.m allows to calculate and visualize several quantiles of the predictive distributions as well as the means, medians and modal values of these distributions. The function VISUALIZEPROBGREATER.m allows to calculate maps that give the probabilities that certain thresholds are exceeded. Finally the functions CROSSVALIDATION.m and VISUALIZECROSSVALIDATION.m calculate crossvalidation statistics like mean absolute errors, percents of actual data below the quantiles of the predictive distribution and percent actual data above threshold vs. expected percent data above threshold.

Spatial sampling design software
spatDesign implements three design criteria for Bayesian linear kriging, where the first two are criteria for prediction only with the covariance function assumed to be certain. The third criterion is the (Smith and Zhu, 2004) criterion taking also account of the fact that the covariance function is estimated. Concerning trans-Gaussian kriging also the Smith and Zhu design criterion is implemented. The actual version of the toolbox is spatDesign V.2.2.0. The implemented criteria for prediction only are: • I-optimality: where the integral in (17) has been replaced by the sum over a fine grid of locations x i,j ∈ X.
• D-optimality: The Smith and Zhu design criterion can be expressed according to Section 6 also as although because of place restrictions we cannot give this expression in detail here. The interested reader is referred to (Spöck et al., 2011). The basic algorithm for calculating spatial sampling designs is an exchange algorithm from experimental design theory for calculating exact designs and going back to (Fedorov, 1972). Contrary to the construction of optimal discrete designs, here we cannot prove convergence of the exact designs to the functional value Ψ(d * ) of an optimal exact design d * ;w ec a n only guarantee stepwise improvement of a given exact starting design, i.e. the sequence of functional values Ψ(d n,s ) decreases monotonically with increasing iteration index s.T h e algorithm is an exchange algorithm improving n-point designs and starting from an initial design.
Step 3. Repeat Step 2 until the point to be deleted is equivalent to the point to be added.
For the design functional (18) Step 2 is determined as follows: For the Smith and Zhu design criterion no such simplification exists and the complete design functional Ψ(.) must be recalculated in every step.

Generation of an initial design
The initial design is a one-point design which minimizes the design functional among all designs of size n = 1. Note that such a design exists since the Bayesian information matrix is positive definite even for designs of size n = 1.
Step 2. Beginning with i = 1, find x i+1 such that x i+1 = arg min x∈X Ψ(M B (d i +(x))) and form d i+1 = d i +(x i+1 ).Con t i n u ew i t hi replaced by i + 1untili + 1 = n.

Combination of the algorithms 7.3.1 and 7.3.2
It is a good idea to combine the initial design algorithm 7.3.2 and the exchange algorithm 7.3.1 in the following way: Step 1. Start with the initial design algorithm and find a design with one first design point.
Step 2. Having found a design with m ≥ 1 design points apply the exchange algorithm to this design to improve it.
Step 3. Add to the design from Step 2 one further design point by means of the initial design algorithm to get m + 1designpoints.
Step 4. Go back to Step 2 and iterate Step 2 and Step 3 until you have found n desired design points.

Reduction of experimental designs
Often it is desired to reduce a given experimental design d = {x 1 , x 2 ,...,x n } to one including only m < n design points from d: Step 1. Delete that design point x j * from d for which Step 2. Iterate Step 1 until the design d contains only m design points.
Also this algorithm may be combined with an improvement step similar to the exchange algorithm 7.3.1. In algorithm 7.3.1 merely the calculation of x n+1,s has to be replaced by where d is the inital design that has to be reduced. This improved algorithm has the advantage that design points once deleted can reenter the design in the exchange step.

Inverse of the information matrix
The calculation of exact designs requires in every step the calculation of the inverses of the information matrices M B (d n,s ) or M B (d n+1,s ). In the next Sections we will see that these information matrices can have a quite high dimension of about 3000 × 3000. So, how can one invert such large matrices in affordable time? There is computationally no need to make explicit use of numerical matrix inversion algorithms, when one considers the update formulas (13.26) and (13.28) in (Pilz, 1991): Obviously only matrix-and vector multiplications are needed in these update formulae.

Computation
The I-optimality and D-optimality criteria have the additional advantage that the optimizations in algorithms 7.3.1 -7.3.4 can be computationally further simplified. What is more, the averaging over the design area X corresponding to eq. (17) has to be done only once before actual sampling design by means of computing the matrix U. The Smith and Zhu design criterion and the design criterion for trans-Gaussian kriging no longer have this advantage and the averaging over the design area X corresponding to eq. (20) and (32) must be done whenever Ψ(M B (d)) is calculated. Both eq. (17) and (20) have been simplified by means of replacing the integrals by discrete sums over a fine grid of locations x i,j ∈ X. Also the calculation of the minima in algorithms 7.3.1 -7.3.4 has been simplified by means of searching for the minimum over a fine grid of locations. When the minimum is found over the discrete grid we further iterate with a line search algorithm with this minimum as a starting value to the actual global minimum. For all that, the averaging for calculating the integral in the Smith and Zhu design criterion and the design criterion for trans-Gaussian kriging is computationally too intensive to be calculated on a standard PC.
Recently CUDA technology (www.nvidia.com) has been developed for NVIDIA graphic cards. This allows to make use of the parallel performance of these graphic cards and to put intensive floating point operations to these GPU's. Thus, we have invested in such a graphics card and now do the averaging operations correponding to expression (20) and (32) in parallel on a multiprocessor NVIDIA GTX 580 GPU. To this, we have installed GPUmat (www.gp-you.org) a free software for MATLAB (www.mathworks.com) that automatically can thread operations to the NVIDIA GPU. The usage of this software is quite easy. We just have to specify MATLAB objects that we want to calculate on the GPU as GPUdouble or GPUsingle. Performance is immense. Processing on the NVIDIA GTX 580 is between 100 to 200 times faster than on a standard Pentium 8 Core 3.06 Ghz CPU.

Basic sampling design functions
The basic spatial sampling design functions are: The names of these functions are self-explanatory: "Pooldelete" is the discrete pool of locations that are allowed to be deleted from the design. "Poolcomplete" is the complete compact area of investigation X of points allowed to be added to the design. "Pooladd" is the discrete pool of locations that are allowed to be added to the design. "Improve" means the exchange algorithm, where locations from "Pooldelete" may either be exchanged to locations from "Poolcomplete" or from "Pooladd" and the total number of sampling locations remains constant.

An example session
The purpose of this section is to demonstrate some capabilities of the spatDesign toolbox. The most important Matlab function calls related to sampling design for trans-Gaussian kriging are given. The data set considered is the so-called Gomel data set. It comprises 591 sampling locations in the region of Gomel, Belarus, where ten years after the Chernobyl accident the concentration of Cs137 in soil has been measured. For demonstration purposes we do not use the full data set but only every fifth location, so that we have to deal with only 119 data. The following Matlab code plots the data locations and calculates a histogram of the Cs137 concentrations (see Fig. 1.). In Fig. 1. we see that there are obviously areas in the design region that look very empty having no sampling locations. To the East the sampling grid is more dense than to the West. From the histogram of Fig. 1. it becomes obvious that the distribution of Cs137 is very skewed to the right. Thus, this data set is very suitable for testing the trans-Gaussian kriging spatial sampling design routines of the spatDesign toolbox. In Fig. 2. a Voronoi polygonal interpolation of the Cs137 concentrations is given. There are obviously 3 regions having very high Cs137 concentration, but the rest of the Gomel region has average to low concentrations. The 3 regions having high concentrations are regions that had heavy rainfall during the Chernobyl accident so that the radioactive cloud was washed out into the soil there.
• plotspectraldist(0:0.01:1.5,delta0); Obviously this spectral distribution function almost attains its maximum of 3.36 at w 0 = 1.5. We now select the frequencies w i , i = 1,2,...,n , calculate an approximation to the spectral distribution function via a step function (the steps are the a 2 i ) and check whether with this selected approximation to the spectral distribution function the original covariance function becomes well approximated (Fig. 3.).
•l o a d w s c a l e d • w=wscaled*1.5; (2)); • approxspectraldist(w,deltastep) • plotcovarianceapprox (w,45,deltastep,delta0,-100:3:200,250,-100,-100,200,-50,250); A look at the approximating covariance function in Fig. 3. shows that at the origin the difference between the true covariance function and the approximating covariance function is 0.17. This is small scale variation that the approximating covariance function does not take into account. Later in spatial sampling design we will add this value of 0.17 to the nugget effect 0.1644 of the true covariance function. Thus, 0.17+0.1644 is the variance of the uncorrelated error process 0 (x.) in our BSLM (14). We now have everything that we need in order to do spatial sampling design for trans-Gaussian kriging. We consider to add 20 additional sampling locations from the complete design region X to the available grid of 119 sampling locations (see Fig. 4.).
• [xadd,yadd,zadd,avglengthpredint]=optimally_add_n_locations_from_poolcomplete( {}, % no external drift [], % no need to specify the matrix U x,y,z, % the Gomel data 0.17+delta0(1), % the variance of the uncorrelated error process 0 (x) [1000000, 0, 0; 0, 0.0000001, 0; 0, 0, 0.0000001], % the a priori variance of the constant trend must be given almost infinite variance, no linear drift w, % the frequencies of the Bessel harmonics 45, % the largest angular frequency deltastep, % the a 2 i delta0, % the parameters of the exponential covariance function lambda0, % the Box-Cox transformation parameter 20, % we want to add 20 samples -100,200,-50,250, % the size of the design region 100, % maximally iterate 100 times in the exchangement step polyBoxGomeliso, % the polygonal design region 17,17, % discretization in x an y when considering new samples 'z' % we apply the (Smith and Zhu, 2004) design criterion to trans-Gaussian kriging ); Fig. 5. shows the expected lenghts of the 95% predictive intervals for the optimal 10 points and 20 points design, respectively. Fig. 6. plots the decrease of the average of the expected lenghts of the 95% predictive intervals when adding up to 20 design locations. The calculation of  the optimal 20 point design takes on an Intel i7 8 Core CPU and a NVIDIA 580 GPU with 1.6 Gb RAM 2 weeks. Similar calculations for the simpler design functional (36) take without NVIDIA Cuda support 2 days. From Fig. 4. it is obvious that the first design locations are added to the margin of areas that look empty. In order to get also the covariance function (especially close to its origin) well estimated design locations are selected close to existing locations or they are selected with multiplicities greater than 1. There is a fundamental difference between designing for Gaussian random fields and our designs for trans-Gaussian kriging. Contrary to the designs for Gaussian random fields designs for trans-Gaussian kriging are dependent also on the data by means of the ordinary kriging predictor in eq. (32).

Conclusions
Our approach to spatial sampling design is motivated by the fact that up to date most sampling design implementations use stochastic search algorithms like simulated annealing. The approach to spatial sampling design proposed in this Chapter makes use of the mathematical structure of the investigated design functionals. Based on the polar spectral approximation of the isotropic random field we are able to use classical experimental design theory for the calculation of spatial sampling designs. One only missing link for the Smith and Zhu design criterion and for the design criterion of trans-Gaussian kriging is the fact that up to date we were not able to show convexity properties of the corresponding design functionals. This is surely a topic for future resarch. The data example giving radioactivity measurements in soil shows that trans-Gaussian kriging and sampling design for it are relevant also to soil science. A topic for future research is to extend the proposed approach also to non-stationary random fields. Some work towards this direction has already been done in (Spöck, 2008). The approach discussed therein locally maintains isotropy but globally is non-stationary and is also based on the approximation of a so-called locally isotropic random field by means of a cosine-sine-Bessel surface-harmonics regression model with random amplitudes.