Calibrate trial specification — calibrate

This function calibrates a trial specification using a Gaussian process-based Bayesian optimisation algorithm. The function calibrates an input trial specification object (using repeated calls to run_trials() while adjusting the trial specification) to a target value within a search_range in a single input dimension (x) in order to find an optimal value (y).
The default (and expectedly most common use case) is to calibrate a trial specification to adjust the superiority and inferiority thresholds to obtain a certain probability of superiority; if used with a trial specification with identical underlying outcomes (no between-arm differences), this probability is an estimate of the Bayesian analogue of the total type-1 error rate for the outcome driving the adaptations, and if between-arm differences are present, this corresponds to an estimate of the Bayesian analogue of the power.
The default is to perform the calibration while varying single, constant, symmetric thresholds for superiority / inferiority throughout a trial design, as described in Details, and the default values have been chosen to function well in this case.
Advanced users may use the function to calibrate trial specifications according to other metrics - see Details for how to specify a custom function used to modify (or recreate) a trial specification object during the calibration process.
The underlying Gaussian process model and its control hyperparameters are described under Details, and the model is partially based on code from Gramacy 2020 (with permission; see References).

Usage

calibrate_trial(
  trial_spec,
  n_rep = 1000,
  cores = NULL,
  base_seed = NULL,
  fun = NULL,
  target = 0.05,
  search_range = c(0.9, 1),
  tol = target/10,
  dir = 0,
  init_n = 2,
  iter_max = 25,
  resolution = 5000,
  kappa = 0.5,
  pow = 1.95,
  lengthscale = 1,
  scale_x = TRUE,
  noisy = is.null(base_seed),
  narrow = !noisy & !is.null(base_seed),
  prev_x = NULL,
  prev_y = NULL,
  path = NULL,
  overwrite = FALSE,
  version = NULL,
  compress = TRUE,
  sparse = TRUE,
  progress = NULL,
  export = NULL,
  export_envir = parent.frame(),
  verbose = FALSE,
  plot = FALSE
)

Arguments

trial_spec: trial_spec object, generated and validated by the setup_trial(), setup_trial_binom() or setup_trial_norm() function.
n_rep: single integer, the number of simulations to run at each evaluation. Values < 100 are not permitted; values < 1000 are permitted but recommended against.
cores: NULL or single integer. If NULL, a default value/cluster set by setup_cluster() will be used to control whether simulations are run in parallel on a default cluster or sequentially in the main process; if a cluster/value has not been specified by setup_cluster(), cores will then be set to the value stored in the global "mc.cores" option (if previously set by options(mc.cores = <number of cores>), and 1 if that option has not been specified.
If the resulting number of cores = 1, computations will be run sequentially in the primary process, and if cores > 1, a new parallel cluster will be setup using the parallel library and removed once the function completes. See setup_cluster() for details.
base_seed: single integer or NULL (default); the random seed used as the basis for all simulation runs (see run_trials()) and random number generation within the rest of the calibration process; if used, the global random seed will be restored after the function has been run.
Note: providing a base_seed is highly recommended, as this will generally lead to faster and more stable calibration.
fun: NULL (the default), in which case the trial specification will be calibrated using the default process described above and further in Details; otherwise a user-supplied function used during the calibration process, which should have a structure as described in Details.
target: single finite numeric value (defaults to 0.05); the target value for y to calibrate the trial_spec object to.
search_range: finite numeric vector of length 2; the lower and upper boundaries in which to search for the best x. Defaults to c(0.9, 1.0).
tol: single finite numeric value (defaults to target / 10); the accepted tolerance (in the direction(s) specified by dir) accepted; when a y-value within the accepted tolerance of the target is obtained, the calibration stops.
Note: tol should be specified to be sensible considering n_rep; e.g., if the probability of superiority is targeted with n_rep == 1000, a tol of 0.01 will correspond to 10 simulated trials.
A too low tol relative to n_rep may lead to very slow calibration or calibration that cannot succeed regardless of the number of iterations.
Important: even when a large number of simulations are conducted, using a very low tol may lead to calibration not succeeding as it may also be affected by other factors, e.g., the total number of simulated patients, the possible maximum differences in simulated outcomes, and the number of posterior draws (n_draws in the setup_trial() family of functions), which affects the minimum differences in posterior probabilities when simulating trials and thus can affect calibration, including when using the default calibration function. Increasing the number of posterior draws or the number of repetitions should be attempted if the desired tolerance cannot be achieved with lower numbers.
dir: single numeric value; specifies the direction(s) of the tolerance range. If 0 (the default) the tolerance range will be target - tol to target + tol. If < 0, the range will be target - tol to target, and if > 0, the range will be target to target + tol.
init_n: single integer >= 2. The number of initial evaluations evenly spread over the search_range, with one evaluation at each boundary (thus, the default value of 2 is the minimum permitted value; if calibrating according to a different target than the default, a higher value may be sensible).
iter_max: single integer > 0 (default 25). The maximum number of new evaluations after the initial grid (with size specified by init_n) has been set up. If calibration is unsuccessful after the maximum number of iterations, the prev_x and prev_y arguments (described below) may be used to to start a new calibration process re-using previous evaluations.
resolution: single integer (defaults to 5000), size of the grid at which the predictions used to select the next value to evaluate at are made.
Note: memory use will substantially increase with higher values. See also the narrow argument below.
kappa: single numeric value > 0 (default 0.5); corresponding to the width of the uncertainty bounds used to find the next target to evaluate. See Details.
pow: single numerical value in the [1, 2] range (default 1.95), controlling the smoothness of the Gaussian process. See Details.
lengthscale: single numerical value (defaults to 1) or numerical vector of length 2; values must be finite and non-negative. If a single value is provided, this will be used as the lengthscale hyperparameter; if a numerical vector of length 2 is provided, the second value must be higher than the first and the optimal lengthscale in this range will be found using an optimisation algorithm. If any value is 0, a small amount of noise will be added as lengthscales must be > 0. Controls smoothness in combination with pow. See Details.
scale_x: single logical value; if TRUE (the default) the x-values will be scaled to the [0, 1] range according to the minimum/maximum values provided. If FALSE, the model will use the original scale. If distances on the original scale are small, scaling may be preferred. The returned values will always be on the original scale. See Details.
noisy: single logical value; if FALSE, a noiseless process is assumed, and interpolation between values is performed (i.e., with no uncertainty at the x-values assumed). If TRUE, the y-values are assumed to come from a noisy process, and regression is performed (i.e., some uncertainty at the evaluated x-values will be assumed and included in the predictions). Specifying FALSE requires a base_seed supplied, and is generally recommended, as this will usually lead to faster and more stable calibration. If a low n_rep is used (or if trials are calibrated to other metrics other than the default), specifying TRUE may be necessary even when using a valid base_seed. Defaults to TRUE if a base_seed is supplied and FALSE if not.
narrow: single logical value. If FALSE, predictions are evenly spread over the full x-range. If TRUE, the prediction grid will be spread evenly over an interval consisting of the two x-values with corresponding y-values closest to the target in opposite directions. Can only be TRUE when a base_seed is provided and noisy is FALSE (the default value is TRUE in that case, otherwise it is FALSE), and only if the function can safely be assumed to be only monotonically increasing or decreasing (which is generally reasonable if the default is used for fun), in which case this will lead to a faster search and a smoother prediction grid in the relevant region without increasing memory use.
prev_x, prev_y: numeric vectors of equal lengths, corresponding to previous evaluations. If provided, these will be used in the calibration process (added before the initial grid is setup, with values in the grid matching values in prev_x leading to those evaluations being skipped).
path: single character string or NULL (the default); if a valid file path is provided, the calibration results will either be saved to this path (if the file does not exist or if overwrite is TRUE, see below) or the previous results will be loaded and returned (if the file exists, overwrite is FALSE, and if the input trial_spec and central control settings are identical to the previous run, otherwise an error is produced). Results are saved/loaded using the saveRDS() / readRDS() functions.
overwrite: single logical, defaults to FALSE, in which case previous results are loaded if a valid file path is provided in path and the object in path contains the same input trial_spec and the previous calibration used the same central control settings (otherwise, the function errors). If TRUE and a valid file path is provided in path, the complete calibration function will be run with results saved using saveRDS(), regardless of whether or not a previous result was saved in path.
version: passed to saveRDS() when saving calibration results, defaults to NULL (as in saveRDS()), which means that the current default version is used. Ignored if calibration results are not saved.
compress: passed to saveRDS() when saving calibration results, defaults to TRUE (as in saveRDS()), see saveRDS() for other options. Ignored if calibration results are not saved.
sparse, progress, export, export_envir: passed to run_trials(), see description there.
verbose: single logical, defaults to FALSE. If TRUE, the function will print details on calibration progress.
plot: single logical, defaults to FALSE. If TRUE, the function will print plots of the Gaussian process model predictions and return them as part of the final object; requires the ggplot2 package installed.

Value

A list of special class "trial_calibration", which contains the following elements that can be extracted using $ or [[:

success: single logical, TRUE if the calibration succeeded with the best result being within the tolerance range, FALSE if the calibration process ended after all allowed iterations without obtaining a result within the tolerance range.
best_x: single numerical value, the x-value (on the original, input scale) at which the best y-value was found, regardless of success.
best_y: single numerical value, the best y-value obtained, regardless of success.
best_trial_spec: the best calibrated version of the original trial_spec object supplied, regardless of success (i.e., the returned trial specification object is only adequately calibrated if success is TRUE).
best_sims: the trial simulation results (from run_trials()) leading to the best y-value, regardless of success. If no new simulations have been conducted (e.g., if the best y-value is from one of the prev_y-values), this will be NULL.
evaluations: a two-column data.frame containing the variables x and y, corresponding to all x-values and y-values (including values supplied through prev_x/prev_y).
input_trial_spec: the unaltered, uncalibrated, original trial_spec-object provided to the function.
elapsed_time: the total run time of the calibration process.
control: list of the most central settings provided to the function.
fun: the function used for calibration; if NULL was supplied when starting the calibration, the default function (described in Details) is returned after being used in the function.
adaptr_version: the version of the adaptr package used to run the calibration process.
plots: list containing ggplot2 plot objects of each Gaussian process suggestion step, only included if plot is TRUE.

Details

Default calibration

If fun is NULL (as default), the default calibration strategy will be employed. Here, the target y is the probability of superiority (as described in check_performance() and summary()), and the function will calibrate constant stopping thresholds for superiority and inferiority (as described in setup_trial(), setup_trial_binom(), and setup_trial_norm()), which corresponds to the Bayesian analogues of the type 1 error rate if there are no differences between arms in the trial specification, which we expect to be the most common use case, or the power, if there are differences between arms in the trial specification.

The stopping calibration process will, in the default case, use the input x as the stopping threshold for superiority and 1 - x as the stopping threshold for inferiority, respectively, i.e., stopping thresholds will be constant and symmetric.

The underlying default function calibrated is typically essentially noiseless if a high enough number of simulations are used with an appropriate random base_seed, and generally monotonically decreasing. The default values for the control hyperparameters have been set to normally work well in this case (including init_n, kappa, pow, lengthscale, narrow, scale_x, etc.). Thus, few initial grid evaluations are used in this case, and if a base_seed is provided, a noiseless process is assumed and narrowing of the search range with each iteration is performed, and the uncertainty bounds used in the acquisition function (corresponding to quantiles from the posterior predictive distribution) are relatively narrow.

Specifying calibration functions

A user-specified calibration function should have the following structure:

# The function must take the arguments x and trial_spec
# trial_spec is the original trial_spec object which should be modified
# (alternatively, it may be re-specified, but the argument should still
# be included, even if ignored)
function(x, trial_spec) {
  # Calibrate trial_spec, here as in the default function
  trial_spec$superiority <- x
  trial_spec$inferiority <- 1 - x

  # If relevant, known y values corresponding to specific x values may be
  # returned without running simulations (here done as in the default
  # function). In that case, a code block line the one below can be included,
  # with changed x/y values - of note, the other return values should not be
  # changed
  if (x == 1) {
    return(list(sims = NULL, trial_spec = trial_spec, y = 0))
  }

  # Run simulations - this block should be included unchanged
  sims <- run_trials(trial_spec, n_rep = n_rep, cores = cores,
                     base_seed = base_seed, sparse = sparse,
                     progress = progress, export = export,
                     export_envir = export_envir)

 # Return results - only the y value here should be changed
 # summary() or check_performance() will often be used here
 list(sims = sims, trial_spec = trial_spec,
      y = summary(sims)$prob_superior)
}

Note: changes to the trial specification are not validated; users who define their own calibration function need to ensure that changes to calibrated trial specifications does not lead to invalid values; otherwise, the procedure is prone to error when simulations are run. Especially, users should be aware that changing true_ys in a trial specification generated using the simplified setup_trial_binom() and setup_trial_norm() functions requires changes in multiple places in the object, including in the functions used to generate random outcomes, and in these cases (and otherwise if in doubt) re-generating the trial_spec instead of modifying should be preferred as this is safer and leads to proper validation.

Note: if the y values corresponding to certain x values are known, then the user may directly return these values without running simulations (e.g., in the default case an x of 1 will require >100% or <0% probabilities for stopping rules, which is impossible, and hence the y value in this case is by definition 1).

Gaussian process optimisation function and control hyperparameters

The calibration function uses a relatively simple Gaussian optimisation function with settings that should work well for the default calibration function, but can be changed as required, which should be considered if calibrating according to other targets (effects of using other settings may be evaluated in greater detail by setting verbose and plot to TRUE).
The function may perform both interpolation (i.e., assuming a noiseless, deterministic process with no uncertainty at the values already evaluated) or regression (i.e., assuming a noisy, stochastic process), controlled by the noisy argument.

The covariance matrix (or kernel) is defined as:

exp(-||x - x'||^pow / lengthscale)

with ||x -x'|| corresponding to a matrix containing the absolute Euclidean distances of values of x (and values on the prediction grid), scaled to the [0, 1] range if scale_x is TRUE and on their original scale if FALSE. Scaling i generally recommended (as this leads to more comparable and predictable effects of pow and lengthscale, regardless of the true scale), and also recommended if the range of values is smaller than this range. The absolute distances are raised to the power pow, which must be a value in the [1, 2] range. Together with lengthscale, pow controls the smoothness of the Gaussian process model, with 1 corresponding to less smoothing (i.e., piecewise straight lines between all evaluations if lengthscale is 1) and values > 1 corresponding to more smoothing. After raising the absolute distances to the chosen power pow, the resulting matrix is divided by lengthscale. The default is 1 (no change), and values < 1 leads to faster decay in correlations and thus less smoothing (more wiggly fits), and values > 1 leads to more smoothing (less wiggly fits). If a single specific value is supplied for lengthscale this is used; if a range of values is provided, a secondary optimisation process determines the value to use within that range.

Some minimal noise ("jitter") is always added to the diagonals of the matrices where relevant to ensure numerical stability; if noisy is TRUE, a "nugget" value will be determined using a secondary optimisation process

Predictions will be made over an equally spaced grid of x values of size resolution; if narrow is TRUE, this grid will only be spread out between the x values with corresponding y values closest to and below and closes to and above target, respectively, leading to a finer grid in the range of relevance (as described above, this should only be used for processes that are assumed to be noiseless and should only be used if the process can safely be assumed to be monotonically increasing or decreasing within the search_range). To suggest the next x value for evaluations, the function uses an acquisition function based on bi-directional uncertainty bounds (posterior predictive distributions) with widths controlled by the kappa hyperparameter. Higher kappa/wider uncertainty bounds leads to increased exploration (i.e., the algorithm is more prone to select values with high uncertainty, relatively far from existing evaluations), while lower kappa/narrower uncertainty bounds leads to increased exploitation (i.e., the algorithm is more prone to select values with less uncertainty, closer to the best predicted mean values). The value in the x grid leading with one of the boundaries having the smallest absolute distance to the target is chosen (within the narrowed range, if narrow is TRUE). See Greenhill et al, 2020 under References for a general description of acquisition functions.

IMPORTANT: we recommend that control hyperparameters are explicitly specified, even for the default calibration function. Although the default values should be sensible for the default calibration function, these may change in the future. Further, we generally recommend users to perform small-scale comparisons (i.e., with fewer simulations than in the final calibration) of the calibration process with different hyperparameters for specific use cases beyond the default (possibly guided by setting the verbose and plot options to TRUE) before running a substantial number of calibrations or simulations, as the exact choices may have important influence on the speed and likelihood of success of the calibration process.
It is the responsibility of the user to specify sensible values for the settings and hyperparameters.

References

Gramacy RB (2020). Chapter 5: Gaussian Process Regression. In: Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences. Chapman Hall/CRC, Boca Raton, Florida, USA. Available online.

Greenhill S, Rana S, Gupta S, Vellanki P, Venkatesh S (2020). Bayesian Optimization for Adaptive Experimental Design: A Review. IEEE Access, 8, 13937-13948. doi:10.1109/ACCESS.2020.2966228

Examples

if (FALSE) {
# Setup a trial specification to calibrate
# This trial specification has similar event rates in all arms
# and as the default calibration settings are used, this corresponds to
# assessing the Bayesian type 1 error rate for this design and scenario
binom_trial <- setup_trial_binom(arms = c("A", "B"),
                                 true_ys = c(0.25, 0.25),
                                 data_looks = 1:5 * 200)

# Run calibration using default settings for most parameters
res <- calibrate_trial(binom_trial, n_rep = 1000, base_seed = 23)

# Print calibration summary result
res
}