Setup default cluster for use in parallelised adaptr functions
Source:R/setup_cluster.R
setup_cluster.RdThis function setups (or removes) a default cluster for use in all
parallelised functions in adaptr using the parallel package. The function
also exports objects that should be available on the cluster and sets the
random number generator appropriately. See Details for further info on
how adaptr handles sequential/parallel computation.
Usage
setup_cluster(cores, export = NULL, export_envir = parent.frame())Arguments
- cores
can be either unspecified,
NULL, or a single integer> 0. IfNULLor1, an existing default cluster is removed (if any), and the default will subsequently be to run functions sequentially in the main process ifcores = 1, and according togetOption("mc.cores")ifNULL(unless otherwise specified in individual functions calls). Theparallel::detectCores()function may be used to see the number of available cores, although this comes with some caveats (as described in the function documentation), including that the number of cores may not always be returned and may not match the number of cores that are available for use. In general, using less cores than available may be preferable if other processes are run on the machine at the same time.- export
character vector of names of objects to export to each parallel core when running in parallel; passed as the
varlistargument toparallel::clusterExport(). Defaults toNULL(no objects exported), ignored ifcores == 1. See Details below.- export_envir
environmentwhere to look for the objects defined inexportwhen running in parallel andexportis notNULL. Defaults to the environment from where the function is called.
Value
Invisibly returns the default parallel cluster or NULL, as
appropriate. This may be used with other functions from the parallel
package by advanced users, for example to load certain libraries on the
cluster prior to calling run_trials().
Details
Using sequential or parallel computing in adaptr
All parallelised adaptr functions have a cores argument that defaults to
NULL. If a non-NULL integer > 0 is provided to the cores argument in
any of those (except setup_cluster()), the package will run calculations
sequentially in the main process if cores = 1, and otherwise initiate a new
cluster of size cores that will be removed once the function completes,
regardless of whether or not a default cluster or the global "mc.cores"
option have been specified.
If cores is NULL in any adaptr function (except setup_cluster()), the
package will use a default cluster if one exists or run computations
sequentially if setup_cluster() has last been called with cores = 1.
If setup_cluster() has not been called or last called with cores = NULL,
then the package will check if the global "mc.cores" option has been
specified (using options(mc.cores = <number of cores>)). If this option has
been set with a value > 1, then a new, temporary cluster of that size is
setup, used, and removed once the function completes. If this option has not
been set or has been set to 1, then computations will be run sequentially
in the main process.
Generally, we recommend using the setup_cluster() function as this avoids
the overhead of re-initiating new clusters with every call to one of the
parallelised adaptr functions. This is especially important when exporting
many or large objects to a parallel cluster, as this can then be done only
once (with the option to export further objects to the same cluster when
calling run_trials()).
Type of clusters used and random number generation
The adaptr package solely uses parallel socket clusters (using
parallel::makePSOCKcluster()) and thus does not use forking (as this is not
available on all operating systems and may cause crashes in some situations).
As such, user-defined objects that should be used by the adaptr functions
when run in parallel need to be exported using either setup_cluster() or
run_trials(), if not included in the generated trial_spec object.
The adaptr package uses the "L'Ecuyer-CMRG" kind (see RNGkind()) for
safe random number generation for all parallelised functions. This is also
the case when running adaptr functions sequentially with a seed provided,
to ensure that the same results are obtained regardless of whether sequential
or parallel computation is used. All functions restore both the random number
generator kind and the global random seed after use if called with a seed.
Examples
# Setup a cluster using 2 cores
setup_cluster(cores = 2)
# Get existing default cluster (printed here as invisibly returned)
print(setup_cluster())
#> socket cluster with 2 nodes on host ‘localhost’
# Remove existing default cluster
setup_cluster(cores = NULL)
# Specify preference for running computations sequentially
setup_cluster(cores = 1)
# Remove default cluster preference
setup_cluster(cores = NULL)
# Set global option to default to using 2 new clusters each time
# (only used if no default cluster preference is specified)
options(mc.cores = 2)