Universidade Federal do Paraná
Curso de Estatística
CE 089 - Estatística Computacional II - 2014/2
Prof. Dr. Walmes Marques Zeviani

Aula 19 e 20

Tabela de conteúdo

Estimação de parâmetros de uma população Weibull
Estimação em modelos de regressão linear - MMQ

Weibull

Função densidade de probabilidade: \[ f(y) = \dfrac{\theta_a}{\theta_b}\left(\dfrac{y}{\theta_b}\right)^{\theta_a-1} \exp\left\{-\left(\dfrac{y}{\theta_b}\right)^{\theta_a}\right\}, \quad y>0, \theta_a>0, \theta_b>0. \]

Função de verossimilhança: \[ L(\theta; y) = \prod_{i=1}^n \,\, \dfrac{\theta_a}{\theta_b}\left(\dfrac{y_i}{\theta_b}\right)^{\theta_a-1} \exp\left\{-\left(\dfrac{y_i}{\theta_b}\right)^{\theta_a}\right\}. \]

Função de log-verossimilhança: \[ \begin{align*} \ell(\theta; y) &= \sum_{i=1}^n \,\, \log(\theta_a)-\log(\theta_b)+ (\theta_a-1)\log(y_i)-(\theta_a-1)\log(\theta_b)- \left(\frac{y_i}{\theta_b}\right )^{\theta_a} \newline &= \sum_{i=1}^{n} \log(\theta_a)-\theta_a \log(\theta_b)+(\theta_a-1) \log(y_i)-\left(\frac{y_i}{\theta_b} \right)^{\theta_a}. \end{align*} \]

Colar as expressões abaixo no campo de busca do Wolfram|Alpha.

Para o gradiente.
- \(\frac{\partial \ell}{\partial a}\): d/da log(a)+(a-1)*log(y)-a*log(b)-(y/b)^a.
- \(\frac{\partial \ell}{\partial b}\): d/db log(a)+(a-1)*log(y)-a*log(b)-(y/b)^a.
Para o hessiano.
- \(\frac{\partial^2 \ell}{\partial a^2}\): d^2/da^2 log(a)+(a-1)*log(y)-a*log(b)-(y/b)^a.
- \(\frac{\partial^2 \ell}{\partial b^2}\): d^2/db^2 log(a)+(a-1)*log(y)-a*log(b)-(y/b)^a.
- \(\frac{\partial^2 \ell}{\partial a \partial b}\): d^2/dadb log(a)+(a-1)*log(y)-a*log(b)-(y/b)^a.

##-----------------------------------------------------------------------------
## Amostra da Weibull.

## help(dweibull, help_type="html")

y <- c(13.39, 10.601, 12.523, 10.568, 10.247, 13.049, 12.853, 10.643,
       12.142, 13.349, 12.714, 10.248, 12.115, 13.438, 8.248, 9.8,
       11.097, 8.691, 11.789, 12.153, 11.646, 12.048, 11.084, 11.662,
       11.799, 9.404, 11.321, 11.38, 11.534, 12.232)

## Expressão da log-verossimilhança (sem o somatório).
ll <- expression(log(a)+(a-1)*log(y)-a*log(b)-(y/b)^a)

## Derivadas parciais.
D(ll, "a")

## 1/a + log(y) - log(b) - (y/b)^a * log((y/b))

D(ll, "b")

## -(a * (1/b) - (y/b)^(a - 1) * (a * (y/b^2)))

## Função gradiente.
G <- function(theta, y){
    a <- theta[1]; b <- theta[2]
    ## Do Wolfram.
    ## part.a <- sum((y/b)^a*(-log(y/b))+1/a-log(b)+log(y))
    ## part.b <- sum(a*b^(-a-1)*(y^a-b^a))
    part.a <- 1/a + log(y) - log(b) - (y/b)^a * log((y/b))
    part.b <- -(a * (1/b) - (y/b)^(a - 1) * (a * (y/b^2)))
    return(rbind(sum(part.a), sum(part.b)))
}

G(c(2,3), y)

##        [,1]
## [1,] -547.7
## [2,]  275.6

## Derivadas parciais duplas.
D(D(ll, "a"), "a")

## -(1/a^2 + (y/b)^a * log((y/b)) * log((y/b)))

D(D(ll, "b"), "b")

## a * (1/b^2) - ((y/b)^(a - 1) * (a * (y * (2 * b)/(b^2)^2)) + 
##     (y/b)^((a - 1) - 1) * ((a - 1) * (y/b^2)) * (a * (y/b^2)))

D(D(ll, "a"), "b")

## -(1/b - ((y/b)^a * (y/b^2/(y/b)) + (y/b)^(a - 1) * (a * (y/b^2)) * 
##     log((y/b))))

D(D(ll, "b"), "a")

## -((1/b) - ((y/b)^(a - 1) * log((y/b)) * (a * (y/b^2)) + (y/b)^(a - 
##     1) * (y/b^2)))

H <- function(theta, y){
    a <- theta[1]; b <- theta[2]
    ## Obtidas com o Wolfram.
    ## part.aa <- sum((b/y)^(-a)*(-(log(b)-log(y))^2)-1/(a^2))
    ## part.bb <- sum(-a*b^(-a-2)*((a+1)*y^a-b^a))
    ## part.ab <- sum(-(y/b)^a-a*log(b)+(a-1)*log(y)+log(a))
    ## part.ba <- sum(((y/b)^a*(a*log(y/b)+1)-1)/b)
    part.aa <- -(1/a^2 + (y/b)^a * log((y/b)) * log((y/b)))
    part.bb <- a * (1/b^2) - ((y/b)^(a - 1) * (a * (y * (2 * b)/(b^2)^2)) +
                                  (y/b)^((a - 1) - 1) *
                                      ((a - 1) * (y/b^2)) * (a * (y/b^2)))
    part.ab <- -(1/b - ((y/b)^a * (y/b^2/(y/b)) + (y/b)^(a - 1) *
                            (a * (y/b^2)) * log((y/b))))
    part.ba <- -((1/b) - ((y/b)^(a - 1) * log((y/b)) *
                              (a * (y/b^2)) + (y/b)^(a - 1) * (y/b^2)))
    m <- matrix(c(sum(part.aa), sum(part.ab),
                  sum(part.ba), sum(part.bb)),
                2, 2, byrow=TRUE)
    return(m)
}

H(c(2,3), y)

##        [,1]   [,2]
## [1,] -831.7  539.6
## [2,]  539.6 -288.9

##-----------------------------------------------------------------------------

maxiter <- 50; i <- 1          ## Número máximo de iterações e contador.
tol <- 1e-5; error <- 100*tol  ## Tolerância e erro inicial.

theta <- matrix(NA, nrow=1, ncol=2)
theta[1,] <- c(2, 2)
theta[1,] <- c(10, 2)
theta[1,] <- c(2, 12)
theta[1,] <- c(5, 5)
theta[1,] <- c(10, 10)

## Newton-Raphson.
while(i <= maxiter & error>tol){
    theta <- rbind(theta, rep(NA,2))
    theta[i+1,] <- theta[i,]-
        solve(H(theta[i,], y))%*%G(theta[i,], y)
    error <- sum(abs((theta[i+1,]-theta[i,])/theta[i+1,]))
    i <- i+1
    print(c(theta[i,], G(theta[i,], y)))
}

## [1]   8.232  10.279 -11.470  56.433
## [1]  5.594 10.279 -1.948 18.421
## [1] -2.904  7.502 -1.351  7.912

## Warning: NaNs produced
## Warning: NaNs produced

## [1]  5.626 -3.333    NaN    NaN

## Warning: NaNs produced
## Warning: NaNs produced
## Warning: NaNs produced
## Warning: NaNs produced
## Warning: NaNs produced
## Warning: NaNs produced

## [1] NaN NaN NaN NaN

## Error: missing value where TRUE/FALSE needed

theta[i,]

## [1] NaN NaN

require(MASS)

## Loading required package: MASS

mle <- fitdistr(y, dweibull, list(shape=10, scale=12))
str(mle)

## List of 5
##  $ estimate: Named num [1:2] 10.7 12
##   ..- attr(*, "names")= chr [1:2] "shape" "scale"
##  $ sd      : Named num [1:2] 1.56 0.215
##   ..- attr(*, "names")= chr [1:2] "shape" "scale"
##  $ vcov    : num [1:2, 1:2] 2.4329 0.1045 0.1045 0.0463
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "shape" "scale"
##   .. ..$ : chr [1:2] "shape" "scale"
##  $ loglik  : num -49.4
##  $ n       : int 30
##  - attr(*, "class")= chr "fitdistr"

curve(dweibull(x, shape=mle$estimate["shape"],
               scale=mle$estimate["scale"]),
      0, 1.3*max(y))
rug(y)

plot of chunk unnamed-chunk-2

## help(dweibull, help_type="html")
## help(package="MASS", help_type="html")

##-----------------------------------------------------------------------------
## Gráfico da função de log-verossimilhança.

ll <- Vectorize(
    FUN=function(a, b, y){
        sum(log(a)+(a-1)*log(y)-a*log(b)-(y/b)^a)
    },
    c("a","b"))

a <- seq(2, 25, length.out=50)
b <- seq(9, 16, length.out=50)
llab <- outer(a, b, ll, y=y)
m <- max(llab)

## Mas que formato hein?
contour(a, b, llab, levels=seq(m-30, m-1, by=1))

plot of chunk unnamed-chunk-2

contour(log(a), log(b), llab, levels=seq(m-30, m-1, by=1))

plot of chunk unnamed-chunk-2

Regressão não linear

Seja a função objetivo a soma de quadrados entre os desvios (ordinários) de regressão, \(y_i-f(x_i, \theta)\), em função dos parâmetros \(\theta\) da função média \(f(x,\theta)\),

\[ S(\theta) = \sum_{i=1}^{n} (y_i-f(x_i, \theta))^2, \] em que \(y\) representa a variável dependente, \(x\) a variável independente, \(\theta\) o vetor de parâmetros e \(n\) o número de pares \((y_i, x_i)\). O objetivo é determinar \(\hat{\theta}\) que minimize \(S(\theta)\),

\[ \hat{\theta} = \underset{\theta \in \Theta}{\arg \min}\,\, S(\theta). \]

Os valores \(\hat{\theta}\) que correspondem ao mínimo de \(S\) também correspondem à valores de derivadas parciais da função \(S\) iguais a zero, ou seja, \[ \begin{align*} \frac{\partial S}{\partial \theta_1}\bigg|_{\theta=\hat{\theta}} &= 0 \newline \frac{\partial S}{\partial \theta_2}\bigg|_{\theta=\hat{\theta}} &= 0 \newline &\cdots \newline \frac{\partial S}{\partial \theta_p}\bigg|_{\theta=\hat{\theta}} &= 0 \newline \end{align*} \]

Para solucionar (encontrar a raíz) o sistema de \(p\) equações não lineares acima pode-se considerar o método de Newton-Raphson. Para aplicação do método é necessário especificar as expressões para o vetor gradiente de \(p\) dimensões e a matriz hessiana de \(p \times p\) dimensões. Eles são definidos por

\[ \begin{align*} \mathbf{G}(\theta)_{p\times 1} &= \frac{\partial S(\theta)}{\partial \theta^\top} = \left( \frac{\partial S(\theta)}{\partial \theta_1}, \frac{\partial S(\theta)}{\partial \theta_2}, \cdots, \frac{\partial S(\theta)}{\partial \theta_p} \right )^\top, \newline \mathbf{H}(\theta)_{p\times p} &= \frac{\partial^2 S(\theta)}{\partial \theta\partial \theta^\top} = \begin{bmatrix} \frac{\partial^2 S(\theta)}{\partial \theta_1\partial \theta_1} & \cdots & \frac{\partial^2 S(\theta)}{\partial \theta_1 \partial \theta_p} \newline \vdots & \ddots & \vdots \newline \frac{\partial^2 S(\theta)}{\partial \theta_p \partial \theta_1} & \cdots & \frac{\partial^2 S(\theta)}{\partial \theta_p \partial \theta_p} \end{bmatrix}. \end{align*} \]

O método de Newton-Raphson, dado conhecimento de \(\mathbf{G}\) e \(\mathbf{H}\) a partir de um conjunto de valores iniciais \(\theta_i, i=0\), é representado por

\[ \theta_{(i+1)} = \theta_{(i)}-\mathbf{H}^{-1}(\theta_{(i)})\mathbf{G}(\theta_{(i)}), \]

em que o subscrito entre parênteses indica a interação baseada na solução numérica do sistema de equações não lineares.