This vignette describes how to use the R
package to estimate treatment effects for a new single-arm study with an external control arm. See vignette("methodology")
for details on the methodology.
Note that you will need to install JAGS as well as the rjags
package (see the home page). We will also use bayesplot
and ggplot2
to visualize our Bayesian models.
We start by simulating an example set of 15 reference studies with no bias (\(\mu = 0\)) and between study variability based on hazard ratios (HRs) uniformly distributed between 0.7 and 1.3. Standard errors of the log HRs are set based on the number of events in each arm (internal control and external control) ranging uniformly between 100 and 800.
# Simulated example data
N <- 15
mu <- 0 # True bias
sigma <- .16 # True between study variability
n_events_ec <- runif(N, 100, 800)
n_events_ic <- runif(N, 100, 800)
loghr_ic_ec_var <- 1/n_events_ic + 1/n_events_ec
loghr_ic_ec_est <- rnorm(n = N, mean = mu, sd = sqrt(sigma^2 + loghr_ic_ec_var))
paste0("Observed bias: ", round(mean(loghr_ic_ec_est), 2))
## [1] "Observed bias: 0.02"
## [1] "Observed variability: 0.21"
We estimate \(\mu\) and \(\sigma\) with ecmeta()
, using JAGS to perform Markov Chain Monte Carlo (MCMC) sampling. We run 5 chains to help assess convergence. The thinning intervals and number of burn-in iterations can be controlled, but the model converges very quickly so we use every draw (i.e., thin = 1
and n_burnin = 0
). A uniform, Student t, or inverse gamma distribution can be used for the dispersion parameter (\(\sigma\) with the uniform and Student t priors; \(\sigma^2\) with an inverse gamma prior); we use a half-Cauchy distribution here, which is a Student t distribution with 1 degree of freedom.
loghr_ic_ec <- loghr_data(loghr_ic_ec_est, sqrt(loghr_ic_ec_var))
loghr_ecmeta <- ecmeta(
loghr_ic_ec,
n_iter = 1000,
n_chains = 5, n_burnin = 0, thin = 1,
quiet = TRUE,
prior_mu = normal(0, 100),
prior_scale = student_t(0, 10, 1) # half-Cauchy
)
loghr_ecmeta
## Iterations = 1001:2000
## Thinning interval = 1
## Number of chains = 5
## Sample size per chain = 1000
##
## param mean se_naive se_ts sd 2.5% 25%
## 1 mu 0.01247164 0.0008369513 0.0009339481 0.05918140 -0.1076152 -0.02560784
## 2 sigma 0.20665742 0.0007392423 0.0013680606 0.05227232 0.1278520 0.16990830
## 50% 75% 97.5% rhat
## 1 0.01349039 0.04996254 0.1283144 1.001139
## 2 0.19867955 0.23504764 0.3322990 1.011384
The default print method summarizes the posterior distributions of the parameters.
loghr_ecmeta
## Iterations = 1001:2000
## Thinning interval = 1
## Number of chains = 5
## Sample size per chain = 1000
##
## param mean se_naive se_ts sd 2.5% 25%
## 1 mu 0.01247164 0.0008369513 0.0009339481 0.05918140 -0.1076152 -0.02560784
## 2 sigma 0.20665742 0.0007392423 0.0013680606 0.05227232 0.1278520 0.16990830
## 50% 75% 97.5% rhat
## 1 0.01349039 0.04996254 0.1283144 1.001139
## 2 0.19867955 0.23504764 0.3322990 1.011384
Its a good idea to assess the model to ensure it converged. We could see from the summaries above that the R-hats were very close to 1, which suggests that the chains were converging.
We also recommend using the bayesplot
package to produce diagnostic plots (note, however, that in order to keep the package light we don’t import bayesplot
, so you will need to install it yourself). Objects returned by ecmeta()
have as.array()
methods so that they can be passed to any bayesplot
function that accepts an object with an as.array()
method (e.g., bayesplot::mcmc_trace()
). The trace plot suggests that the chain mixed well and that there were no issues with convergence.
mcmc_trace(loghr_ecmeta)
We can also plot the draws from the posterior distribution of \(\mu\) and \(\sigma\).
mcmc_hist(loghr_ecmeta, binwidth = .0005)
We now consider a new study in which the estimated HR was 0.8 and there were 250 events in the treatment arm and 300 events in the control arm. We can therefore approximate the variance of the log HR as 1/250 + 1/300 (although in a real application with propensity score methods the standard error would likely be estimated via bootstrapping or from a closed form approximation).
new_loghr_trt_ec <- loghr_data(
estimate = log(.8),
standard_error = sqrt(1/250 + 1/300)
)
Predictions of the true log HR for a hypothetical scenario in which an internal control arm existed are predicted using (i) the estimated log HR above, \(\hat{\lambda}_{TRTvEC} = 0.8\) and (ii) the model for \(\hat{\lambda}_{ICvEC}\) fit above with ecmeta()
using the reference studies.
Specifically, we first use MCMC to draw samples from \(\lambda_{TRTvEC}\) where \(\hat{\lambda}\) is an estimate of a log HR and \(\lambda\) is the true underlying parameter (see vignette("methodology")
for an explanation of the distinction between true and estimated HRs). These draws are then combined with the draws of \(mu\) and \(\sigma\) to (i) draw samples from the posterior predictive distribution of \(\lambda_{ICvEC}\) and (ii) then, to compute, \(\lambda_{TRTvIC} = \lambda_{TRTvEC} - \lambda_{ICvEC}\). The quantity \(\lambda_{TRTvIC}\) effectively adjusts the distribution of \(\lambda_{TRTvEC}\) for the bias and between study variability estimated from the reference studies.
loghr_new <- predict(
loghr_ecmeta,
newdata = new_loghr_trt_ec,
quiet = TRUE
)
loghr_new
## Posterior sampling was performed for the true log HR of the treatment
## relative to the external control ('loghr_trt_ec') with the following settings:
##
## Iterations = 1:1000
## Thinning interval = 1
## Number of chains = 5
## Sample size per chain = 1000
##
## The estimates of 'mu' and 'sigma' were used to draw the true log HR of the
## internal control relative to the external control ('loghr_ic_ec'), which was
## in turn, used to adjust 'log_trt_ec' to obtain the predicted log HR
## for the treatment relative to a hypothetical internal control ('loghr_trt_ic').
##
## Summaries of the posterior distributions are displayed below:
## param mean se_naive sd 2.5% 25%
## 1 loghr_trt_ic -0.23564840 0.003394812 0.24004948 -0.7087490 -0.3853649
## 2 loghr_trt_ec -0.22384135 0.001210969 0.08562847 -0.3934591 -0.2797593
## 3 loghr_ic_ec 0.01180705 0.003175226 0.22452241 -0.4456103 -0.1303848
## 50% 75% 97.5% rhat
## 1 -0.23916606 -0.08321095 0.24634629 NA
## 2 -0.22281606 -0.16577454 -0.05581878 1.002426
## 3 0.01219141 0.14879948 0.45655704 NA
Similar to above, we inspect the trace of \(\lambda_{TRTvEC}\) and detect no issues.
mcmc_trace(loghr_new)
Thesummary()
method for the prediction object can exponentiate the log HRs to produce posterior summaries of the HRs. The observed bias in our reference studies was small (\(\mu\) close to 0), so the means of \(\lambda_{TRTvIC}\) and \(\lambda_{TRTvEC}\) are very similar as well. However, \(\lambda_{TRTvIC}\) has a considerably wider distribution due to the assumed between study variability.
summary(loghr_new, exponentiate = TRUE)
## param mean se_naive sd 2.5% 25% 50%
## 1 hr_trt_ic 0.8134237 0.0028816307 0.20376206 0.4922596 0.6802024 0.7872841
## 2 hr_trt_ec 0.8023740 0.0009718095 0.06871731 0.6747189 0.7559656 0.8002620
## 3 hr_ic_ec 1.0375320 0.0033325990 0.23565033 0.6404333 0.8777576 1.0122660
## 75% 97.5% rhat
## 1 0.9201570 1.2793426 NA
## 2 0.8472372 0.9457105 1.002426
## 3 1.1604403 1.5786295 NA
In some cases (e.g., when running simulations) it might be useful to simulate predictions for the new study very quickly. In this case, we can use a maximum likelihood based approach instead of the Bayesian approach described above.
loghr_ecmeta_ml <- ecmeta(loghr_ic_ec, method = "ml")
loghr_ecmeta_ml
## Parameters:
## param estimate se 2.5% 97.5%
## 1 mu 0.01207288 0.05049343 -0.08689243 0.1110382
## 2 sigma 0.17843211 0.03999656 0.11499300 0.2768692
The results are similar to those from the Bayesian approach, although the estimates of \(\sigma\) are smaller using maximum likelihood, a discrepancy that becomes increasingly small as the number of reference studies increases.
loghr_new_ml <- predict(loghr_ecmeta_ml, newdata = new_loghr_trt_ec)
summary(loghr_new, exponentiate = TRUE)
## param mean se_naive sd 2.5% 25% 50%
## 1 hr_trt_ic 0.8134237 0.0028816307 0.20376206 0.4922596 0.6802024 0.7872841
## 2 hr_trt_ec 0.8023740 0.0009718095 0.06871731 0.6747189 0.7559656 0.8002620
## 3 hr_ic_ec 1.0375320 0.0033325990 0.23565033 0.6404333 0.8777576 1.0122660
## 75% 97.5% rhat
## 1 0.9201570 1.2793426 NA
## 2 0.8472372 0.9457105 1.002426
## 3 1.1604403 1.5786295 NA
summary(loghr_new_ml, exponentiate = TRUE)
## param mean se_naive sd 2.5% 25% 50%
## 1 hr_trt_ic 0.8130503 0.005437369 0.17194471 0.5320176 0.6942279 0.7943682
## 2 hr_trt_ec 0.8026759 0.002162067 0.06837055 0.6820539 0.7522157 0.7994061
## 3 hr_ic_ec 1.0234666 0.006258237 0.19790285 0.7029402 0.8850287 1.0048856
## 75% 97.5%
## 1 0.9087986 1.1781100
## 2 0.8463504 0.9479453
## 3 1.1408093 1.4759664