Ambos lados da revisão anterior
Revisão anterior
Próxima revisão
|
disciplinas:verao2007:exercicios [2007/02/17 20:50] paulojus |
Revisão anterior
|
disciplinas:verao2007:exercicios [2007/02/18 20:16] (atual) paulojus |
|
- (3) load the data sets ''parana'', ''Ksat'' e ''ca20'' available in ''geoR'' using commands such as: <code R>data(parana)</code> and the documentation describing each data set with the ''help()'' function <code R>help(parana)</code> Perform exploratory data analysis and build a model you find suitable for each data. |
- (3) In the examples above, would you have othe other //candidate// models for each data-set? |
- Inspect [[http://leg.ufpr.br/geoR/tutorials/Rcruciani.R|an example geoestatistical analysis]] for the hydraulic conductivity data. |
- (4) Consider the following two models for a set of responses, $<m>Y_i : i=1, \ldots... ,n$ </m> associated with a sequence of positions $<m>x_i: i=1,\ldots...,n$ </m> along a one-dimensional spatial axis $<m>x$</m>. |
- $Y_i <m>Y_{i} = \alpha + \beta x_i x_{i} + Z_i$Z_{i}</m>, where $\<m>alpha$ </m> and $\<m>beta$ </m> are parameters and the $Z_i$ <m>Z_{i}</m> are mutually independent with mean zero and variance $\sigma_Z<m>sigma^2$2_{Z}</m>. |
- $<m>Y_i = A + B x_i + Z_i$ </m> where the $<m>Z_i$ </m> are as in (a) but //A// and //B// are now random variables, independent of each other and of the $<m>Z_i$</m>, each with mean zero and respective variances <latex>$\sigma_A^2$</latex> and <latex>$\sigma_B^2$</latex>.\\ For each of these models, find the mean and variance of $<m>Y_i$ </m> and the covariance between $<m>Y_i$ </m> and $<m>Y_j$ </m> for any $<m>j \neq != i$</m>. Given a single realisation of either model, would it be possible to distinguish between them? |
- (5) Suppose that <latex>$Y=(Y_1,\ldots,Y_n)$</latex> follows a multivariate Gaussian distribution with <latex>${\rm E}[Y_i]=\mu$</latex> and <latex>${\rm Var}\{Y_i\}=\sigma^2$</latex> and that the covariance matrix of $<m>Y$ </m> can be expressed as $<m>V=\sigma^2 R(\phi)$</m>. Write down the log-likelihood function for <latex>$\theta=(\mu,\sigma^2,\phi)$</latex> based on a single realisation of $<m>Y$ </m> and obtain explicit expressions for the maximum likelihood estimators of $\<m>mu$ </m> and $\<m>sigma^2$ </m> when $\<m>phi$ </m> is known. Discuss how you would use these expressions to find maximum likelihood estimators numerically when $\<m>phi$ </m> is unknown. |
- (6) Is the following a legitimate correlation function for a one-dimensional spatial process <latex>$S(x) : x \in \IRR$</latex>? Give either a proof or a counter-example.$$\\ |
\<m> rho(u) = \left\delim{ |
\beginlbrace}{arraymatrix{2}{rcl1} |
{{1-u & : & 0 \leq <= u \leq <= 1\\ |
}{0 & : & u>1 |
\end}}}{array} |
</m>\\right. |
$$ |
- (7) Consider the following method of simulating a realisation of a one-dimensional spatial process on <latex>$S(x) : x \in \IRR$</latex>, with mean zero, variance 1 and correlation function $\<m>rho(u)$</m>. Choose a set of points <latex>$x_i \in \IR R : i=1,\ldots,n$</latex>. Let $<m>R$ </m> denote the correlation matrix of <latex>$S=\{S(x_1),\ldots,S(x_n)\}$</latex>. Obtain the singular value decomposition of $<m>R$ </m> as <latex>$R = D \Lambda D^\prime$</latex> where $\lambda$ <m>Lambda</m> is a diagonal matrix whose non-zero entries are the eigenvalues of $<m>R$</m>, in order from largest to smallest. Let <latex>$Y=\{Y_1,\ldots,Y_n\}$</latex> be an independent random sample from the standard Gaussian distribution, <latex>${\rm N}(0,1)$</latex>. Then the simulated realisation is <latex>$S = D \Lambda^{\frac{1}{2}} Y$</latex> |
- (7) Write an ''R'' function to simulate realisations using the above method for any specified set of points $<m>x_i$ </m> and a range of correlation functions of your choice. Use your function to simulate a realisation of $<m>S$ </m> on (a discrete approximation to) the unit interval $<m>(0,1)$</m>. |
- (7) Now investigate how the appearance of your realisation $<m>S$ </m> changes if in the equation above you replace the diagonal matrix $\<m>Lambda$ </m> by truncated form in which you replace the last $<m>k$ </m> eigenvalues by zeros. |
|
|
==== Semana 3 ==== |
- (8) Fit a model to the surface elevation data assuming a linear trend model on the coordinates and a Matérn correlation function with parameter <m>kappa=2.5</m>. Use the fitted model as the true model and perform a simulation study (i.e. simulate from this model) to compare parameter estimation based on maximum likelihood, restricted maximum likelihood and variograms. |
- (9) Simulate 200 points in the unit square from the Gaussian model without measurement error, constant mean equals to zero, unit variance and exponential correlation function with $\<m>phi=0.25$ </m> and anisotropy parameters $<m>(\psi_A=\pi/3, \psi_R=2)$</m>. Obtain parameter estimates (using maximum likelihood): |
* assuming a isotropic model |
* try to estimate the anisotropy parameters \\ Compare the results and repeat the exercise for $\<m>phi_R=4$</m>. |
- (10) Consider a stationary trans-Gaussian model with known transformation function <latex>$h(\cdot)$</latex>, let $x$ be an arbitrary |
location within the study region and define $<m>T=h^{-1}{(S(x)}$)</m>. Find explicit expressions for <latex>${\rm P}(T>c|Y)$</latex> where |
$<m>Y=(Y_1,...,Y_n)$ </m> denotes the observed measurements on the untransformed scale and: |
* $<m>h(u)=u$</m> |
* $<m>h(u) = \log{u$}</m> |
* $<m>h(u) = \sqrt{u}$</m>. |
- (11) Analyse the Paraná data-set or any other data set of your choice assuming priors obtaining: |
* a map of the predicted values over the area |
* a map of the predicted std errors over the area |
* a map of the probabilities of being above a certain (arbitrarily) choosen chosen threshold over the area |
* a map of the 10th, 25th, 50th, 75th and 90th percentiles over the area |
* the predictive distribution of the porportion proportion of the area with the value of the study variable below a certain threshold. (as a suggestion you can use the 30th percentile of the data as the value of such a threshold) |
|
|
==== Semana 4 ==== |
|
- (12) Consider the stationary Gaussian model in which $<m>Y_i = \beta + S(x_i) + Z_i :i=1,\ldots...,n$</m>, where $<m>S(x)$ </m> is a stationary Gaussian process with mean zero, variance $\<m>sigma^2$ </m> and correlation function $\<m>rho(u)$</m>, whilst the $<m>Z_i$ </m> are mutually independent <latex>${\rm N}(0,\tau^2)$</latex> random variables. Assume that all parameters except $\<m>beta$ </m> are known. Derive the Bayesian predictive distribution of $<m>S(x)$ </m> for an arbitrary location $<m>x$ </m> when $\<m>beta$ </m> is assigned an improper uniform prior, $\<m>pi(\beta)$ </m> constant for all real $\<m>beta$</m>. Compare the result with the ordinary kriging formulae. |
- (13) For the model assumed in the previous exercise, assuming a correlation function parametrised by a scalar parameter $\<m>phi$ </m> obtain the posterior distribution for: |
* a normal prior for $\<m>beta$ </m> and assuming the remaining parameters are known |
* a normal-scaled-inverse-<latex>$\chi^2$</latex> prior for <latex>$(\beta, \sigma^2)$</latex> and assuming the correlation parameter is known |
* a normal-scaled-inverse-$<m>chi^2$ </m> prior for $<m>(\beta, \sigma^2|\phi)$ </m> and assuming a generic prior $<m>p(\phi)$ </m> for correlation parameter. |
- (14) Analyse Analise the Paraná data-set or any other data set of your choice assuming priors for the model parameters and obtaining: |
* the posterior distribution for the model parameters |
* a map of the predictive mean over the area |
- (15) Obtain simulations from the Poison model as shown in Figure 4.1 of the text book for the course. |
- (15) Try to reproduce or mimic the results shown in Figure 4.2 of the text book for the course simulating a data set and obtaining a similar data-analysis. **Note:** for the example in the book we have used //set.seed(34)//. |
- (16) Reproduce the simulated binomial data shown in Figure 4.6. Use the package //geoRglm// in conjunction with priors of your choice to obtain predictive distributions for the signal $<m>S(x)$ </m> at locations <latex>$x=(0.6, 0.6)$</latex> and <latex>$x=(0.9, 0.5)$</latex>. Compare the predictive inferences which you obtained in the previous exercise with those obtained by fitting a linear Gaussian model to the empirical logit transformed data, $\<m>log\{(y+0.5)/(n-y+0.5)\}$</m>. Compare the results of the two previous analysis and comment generally. |
|
==== Semana 5 ==== |