The mglm4twin
package implements multivariate
generalized linear models for twin and family data as proposed by Bonat
and Hjelmborg (2020). The core fit function mglm4twin
is
employed for fitting a set of models. In this introductory vignette we
restrict ourselves to model multivariate bounded outcomes. We present a
set of special specifications of the standard ACE for multivariate
bounded traits in the context of twin data. This vignette reproduces the
data analysis presented in the Section 5.6 Dataset 6: Multivariate
bounded data of Bonat and Hjelmborg (2020).
To install the stable version of mglm4twin
,
use devtools::install_git()
. For more information, visit mglm4twin/README.
In this example, we simulate a dataset as similar as possible to a
real dataset concernings three bounded outcomes reflecting mental
conditions. The data set consists of 828 (524 DZ and 274 MZ) twin pairs.
The outcomes are obtained based on questionaries and includes the Mini
Mental State Examination (MMSE) taking values on the interval \([0\ldots, 30]\). The second outcome
corresponds to depression symptomatology which was evaluated based on an
adaptation of the depression section of the Cambridge Mental Disorders
of the Elderly Examination (CAMDEX). The depression scale used here is a
composite of responses to 17 depression items and taking integer values
in the interval \([0, \ldots, 30]\) and
is highly left-skewed in distribution in any population-based sample.
The third outcome is a Social Activity scale based on six items that
assess the frequency with which the individual is engaged with others
and mental pursuits. We opted to simulate such a data set because we can
show how to simulate the data and make it available for the users. For
simulating the data set, we are going to use the
SimCorMultRes
package.
## Loading extra packages
require(Matrix)
#> Carregando pacotes exigidos: Matrix
require(SimCorMultRes)
#> Carregando pacotes exigidos: SimCorMultRes
require(mglm4twin)
## Setting model parameters
E <- c(0.75, 0.7, 0.65, -0.3, 0.25, -0.4)
A <- c(0.25,0.3, 0.35, -0.15, 0.20, -0.2)
C <- rep(0, 6)
tau = c(E, A, C)
## Twin structure
DZ = mt_twin(N_DZ = 1, N_MZ = 0, n_resp = 3, model = "ACE")
MZ = mt_twin(N_DZ = 0, N_MZ = 1, n_resp = 3, model = "ACE")
Omega_DZ <- as.matrix(mt_matrix_linear_predictor(tau = tau, Z = DZ))
Omega_MZ <- as.matrix(mt_matrix_linear_predictor(tau = tau, Z = MZ))
## Regression structures
sex <- c(rep("Male", 220/2), rep("Female", 334/2),
rep("Male", 116/2), rep("Female", 158/2))
zyg <- c(rep("DZ", 554/2), rep("MZ", 274/2))
set.seed(123)
Age <- rbeta(414, shape1 = 0.3*2, shape2 = 0.7*2)*20 + 70
age_std <- (Age - mean(Age))/var(Age)
X <- model.matrix(~ sex + age_std)
beta1 <- c(1.6956, 0.0584, -0.2576)
mu1 <- exp(X%*%beta1)/(1+exp(X%*%beta1))
beta2 <- c(-1.7930, 0.0875, 0.2382)
mu2 <- exp(X%*%beta2)/(1+exp(X%*%beta2))
beta3 <- c(-0.5363, -0.05138, -0.1528)
mu3 <- exp(X%*%beta3)/(1+exp(X%*%beta3))
## Marginal distributions
dist <- c("qbeta","qbeta","qbeta")
invcdfnames <- rep(dist, each = 2)
## Simulating data set
Y1_MZ <- list()
Y2_MZ <- list()
Y3_MZ <- list()
Y1_DZ <- list()
Y2_DZ <- list()
Y3_DZ <- list()
set.seed(181185)
phi = 5
for(i in 1:137) {
paramslists <- list(
m1 = list(shape1 = mu1[i]*phi, shape2 = (1-mu1[i])*phi),
m1 = list(shape1 = mu1[i]*phi, shape2 = (1-mu1[i])*phi),
m2 = list(shape1 = mu2[i]*phi, shape2 = (1-mu2[i])*phi),
m2 = list(shape1 = mu2[i]*phi, shape2 = (1-mu2[i])*phi),
m3 = list(shape1 = mu3[i]*phi, shape2 = (1-mu3[i])*phi),
m3 = list(shape1 = mu3[i]*phi, shape2 = (1-mu3[i])*phi))
Y_MZ <- rnorta(R = 1, cor.matrix = Omega_MZ,
distr = invcdfnames, qparameters = paramslists)
Y1_MZ[[i]] <- Y_MZ[1:2]
Y2_MZ[[i]] <- Y_MZ[3:4]
Y3_MZ[[i]] <- Y_MZ[5:6]
}
set.seed(181185)
for(i in 138:414) {
paramslists <- list(
m1 = list(shape1 = mu1[i]*phi, shape2 = (1-mu1[i])*phi),
m1 = list(shape1 = mu1[i]*phi, shape2 = (1-mu1[i])*phi),
m2 = list(shape1 = mu2[i]*phi, shape2 = (1-mu2[i])*phi),
m2 = list(shape1 = mu2[i]*phi, shape2 = (1-mu2[i])*phi),
m3 = list(shape1 = mu3[i]*phi, shape2 = (1-mu3[i])*phi),
m3 = list(shape1 = mu3[i]*phi, shape2 = (1-mu3[i])*phi))
Y_DZ <- rnorta(R = 1, cor.matrix = Omega_DZ,
distr = invcdfnames, qparameters = paramslists)
Y1_DZ[[i]] <- Y_DZ[1:2]
Y2_DZ[[i]] <- Y_DZ[3:4]
Y3_DZ[[i]] <- Y_DZ[5:6]
}
Y1 <- c(do.call(c, Y1_DZ), do.call(c, Y1_MZ))
Y2 <- c(do.call(c, Y2_DZ), do.call(c, Y2_MZ))
Y3 <- c(do.call(c, Y3_DZ), do.call(c, Y3_MZ))
data <- data.frame("Y1" = Y1, "Y2" = Y2, "Y3" = Y3, "twin_id" = rep(1:2, 414),
"zyg" = rep(zyg, each = 2), "sex" = rep(sex, each = 2),
"age_std" = rep(age_std, each = 2))
The model specification is done in three steps:
In the example we have the covariates sex and age for composing the linear predictor. Thus, the linear predictor is given by
The matrix linear predictor describes the twin dependence structure.
The mglm4twin
package provides a set of pre-specified
covariance structures. However, the users can easily specify thein own
covariance structure through a linear combination of known matrices. In
this example, we are going to specify a set of special cases of the
standard ACE models for twin data.
ACE = mt_twin(N_DZ = 554/2, N_MZ = 274/2, n_resp = 3, model = "ACE")
AE = mt_twin(N_DZ = 554/2, N_MZ = 274/2, n_resp = 3, model = "AE")
CE = mt_twin(N_DZ = 554/2, N_MZ = 274/2, n_resp = 3, model = "CE")
E = mt_twin(N_DZ = 554/2, N_MZ = 274/2, n_resp = 3, model = "E")
The function mt_twin()
has four main arguments:
N_DZ
and N_MZ
the number of dizygotic and
monozygotic twin, respectively. n_resp
number of response
variables, in this case just \(1\) and
the model
a string given the model’s name. It is
extreme important to note that the data set should be
ordered in a way that all DZ twin appear first and then all MZ twin and
the twin pair should appear together, i.e. twin 2 after twin 1.
The specificantion of the link and variance functions depends on the
response variable type. In the case of the bounded data the
mglm4twin
package offers the logit
,
probit
, cauchit
, cloglog
and
loglog
link functions. For the variance function the
binomialP
represents the standard binomial variance
function, i.e. \(\mu(1-\mu)\) which is
suitable for bounded outcomes. In this example, we follow Bonat and
Hjelmborg (2020) and use the logit
link function and the
binomialP
variance function.
The main fitting function in the mglm4twin
package is
the mglm4twin()
function. it was designed to be similar to
standard model fitting functions in R
such as
lm()
and glm()
.
fit_ACE <- mglm4twin(linear_pred = c(form_Y1, form_Y2, form_Y3),
matrix_pred = ACE, link = link,
variance = variance,
data = data)
#> Automatic initial values selected.
fit_AE <- mglm4twin(linear_pred = c(form_Y1, form_Y2, form_Y3),
matrix_pred = AE, link = link,
variance = variance,
data = data)
#> Automatic initial values selected.
fit_CE <- mglm4twin(linear_pred = c(form_Y1, form_Y2, form_Y3),
matrix_pred = CE, link = link,
variance = variance,
data = data)
#> Automatic initial values selected.
fit_E <- mglm4twin(linear_pred = c(form_Y1, form_Y2, form_Y3),
matrix_pred = E, link = link,
variance = variance,
data = data)
#> Automatic initial values selected.
Table 9 is done by collecting the results from the model’s summary.
summary(fit_AE, model = "AE", biometric = TRUE)
#> Call: Y1 ~ sex + age_std
#>
#> Link function: logit
#> Variance function: binomialP
#> Regression:
#> Estimates std.error z value Pr(>|z|)
#> (Intercept) 1.74420183 0.05291741 32.9608287 2.959761e-238
#> sexMale -0.06859125 0.08198530 -0.8366286 4.028013e-01
#> age_std -0.24957571 0.21050178 -1.1856228 2.357713e-01
#>
#> Call: Y2 ~ sex + age_std
#>
#> Link function: logit
#> Variance function: binomialP
#> Regression:
#> Estimates std.error z value Pr(>|z|)
#> (Intercept) -1.81642618 0.05590683 -32.4902391 1.464717e-231
#> sexMale 0.06108591 0.08667232 0.7047914 4.809401e-01
#> age_std 0.45342969 0.22054316 2.0559681 3.978559e-02
#>
#> Call: Y3 ~ sex + age_std
#>
#> Link function: logit
#> Variance function: binomialP
#> Regression:
#> Estimates std.error z value Pr(>|z|)
#> (Intercept) -0.54371923 0.03970218 -13.694945 1.088421e-42
#> sexMale -0.06694236 0.06272402 -1.067252 2.858579e-01
#> age_std -0.24058640 0.16243509 -1.481123 1.385737e-01
#>
#> Dispersion:
#> Genetic component:
#> Estimates std.error z value Pr(>|z|)
#> A1 0.027929924 0.011012562 2.5361876 0.011206667
#> A2 0.041948185 0.011088036 3.7831934 0.000154829
#> A3 0.029001292 0.011134358 2.6046668 0.009196365
#> A12 -0.015019172 0.008190355 -1.8337632 0.066689151
#> A13 0.007991468 0.008219853 0.9722154 0.330943424
#> A23 -0.013718879 0.008734612 -1.5706340 0.116267687
#>
#> Environment component:
#> Estimates std.error z value Pr(>|z|)
#> E1 0.12812751 0.014343170 8.932998 4.146175e-19
#> E2 0.11544285 0.012164565 9.490093 2.308273e-21
#> E3 0.13215173 0.011462188 11.529363 9.383491e-31
#> E12 -0.04475663 0.008457948 -5.291666 1.212074e-07
#> E13 0.04644685 0.008850462 5.247958 1.537946e-07
#> E23 -0.06735482 0.009130621 -7.376806 1.621321e-13
#>
#> Measures associated with genetic structure:
#> Genetic correlation:
#> Estimates std.error z value Pr(>|z|)
#> rho12 -0.4387878 0.1877430 -2.337172 0.01943023
#> rho13 0.2807909 0.2475721 1.134178 0.25671972
#> rho23 -0.3933266 0.1795943 -2.190084 0.02851811
#>
#> Heritability:
#> Estimates std.error z value Pr(>|z|)
#> h1 0.1789721 0.06973874 2.566322 1.027833e-02
#> h2 0.2665221 0.06729049 3.960769 7.470886e-05
#> h3 0.1799612 0.06729608 2.674171 7.491434e-03
#>
#> Bivariate heritability:
#> Estimates std.error z value Pr(>|z|)
#> h12 0.2512584 0.1313564 1.9127994 0.05577373
#> h13 0.1467986 0.1479034 0.9925303 0.32093892
#> h23 0.1692149 0.1046257 1.6173365 0.10580566
#>
#> Measures associated with environment structure:
#> Environment correlation:
#> Estimates std.error z value Pr(>|z|)
#> rho12 -0.3680041 0.05815352 -6.328149 2.481196e-10
#> rho13 0.3569428 0.05709776 6.251433 4.067047e-10
#> rho23 -0.5453166 0.04741133 -11.501822 1.291602e-30
#>
#> Environmentality:
#> Estimates std.error z value Pr(>|z|)
#> e1 0.8210279 0.06973874 11.77291 5.383495e-32
#> e2 0.7334779 0.06729049 10.90017 1.150392e-27
#> e3 0.8200388 0.06729608 12.18553 3.712093e-34
#>
#> Bivariate environmentality:
#> Estimates std.error z value Pr(>|z|)
#> e12 0.7487416 0.1313564 5.700079 1.197522e-08
#> e13 0.8532014 0.1479034 5.768640 7.991358e-09
#> e23 0.8307851 0.1046257 7.940548 2.012905e-15
#>
#> Algorithm: chaser
#> Correction: FALSE
#> Number iterations: 6