The main script is ./dp_playground.py. For
command line arguments, see the according parse_args
function.
Modify the build_model
function according to the desired
architecture. Adjusting the learning rate schedule in the build_opt
function may also prove worthwhile.
While the best models are obtained after multiple learning rate waves, training may also be stopped after 30000 steps. An example training command looks like this:
python dp_playground.py --M 5 --steps 200000 --batch_size 32 \
--lambda_real_interval -100 0 --lambda_imag_interval -10 0
Training can be continued using the --model_path
argument. Simply
load a model checkpoint and continue as desired. Note that the
optimizer state is not included in saved checkpoints, so it is
recommended to schedule the learning rate so training only starts
after the optimizer has adapted a bit (when using an adaptive one).
Example usage:
python dp_playground.py --M 5 --steps 200000 --batch_size 32 \
--lambda_real_interval -100 0 --lambda_imag_interval -10 0 \
--model_path best_dp_model_diag_M_5_re_-100.0_0.0_im_-10.0_0.0_loss_[...].npy
The structure of the preconditioner, the model's input as well as the loss function used for training may be changed with their corresponding arguments. We will describe each of them in detail here.
The --prec_type
argument specifies which preconditioner to use. All
nonzero values of the resulting matrix will be optimized. The default
is diag
.
prec_type |
Description |
---|---|
diag | Diagonal matrix |
lower_diag | Diagonal matrix with the diagonal lowered by an offset of 1 |
lower_tri | Lower triangular matrix |
strictly_lower_tri | Strictly lower triangular matrix |
The --input_type
argument specifies which inputs to give the model.
The default is lambda
.
input_type |
Description |
---|---|
lambda | Only λ |
residual | Initial residual (of the initial guess in relation to u0) |
lambda_u | λ and the initial guess |
f | f(u) = λu |
num_iters | Number of iteration steps already taken |
The --loss_type
argument specifies which loss function to use for
training the model. The default is spectral_radius
.
loss_type |
Description |
---|---|
spectral_radius | Minimize the spectral radius of the iteration matrix |
residual | Minimize the residual after a fixed number of iteration steps |
To optimize the residual after 10 steps, handling multiple u_init
:
python dp_playground.py --steps 200000 --batch_size 32 \
--lambda_real_interval -100 -100 --u_real_interval -1 1 \
--input_type f --loss_type residual --num_iters 10
To obtain a model that optimizes the preconditioner's parameters
directly, you would give the argument --optimize_directly True
.
An example training to optimize for a single λ value with M = 5 would be started like this. Note also that we set the batch size to 1 to avoid redundant work:
python dp_playground.py --M 5 --steps 200000 --optimize_directly True \
--batch_size 1 --lambda_real_interval -1 -1 --lambda_imag_interval 0 0
To optimize a strictly lower triangular preconditioner, testing on some additional preconditioners:
python dp_playground.py --steps 200000 --optimize_directly True \
--batch_size 1 --prec_type strictly_lower_tri --extensive_tests True \
--lambda_real_interval -1 -1 --lambda_imag_interval 0 0
We can also optimize one diagonal for each iteration step, handling
multiple u_init
:
python dp_playground.py --steps 200000 --optimize_directly True \
--batch_size 32 --lambda_real_interval -100 -100 \
--u_real_interval -1 1 --num_iters 10 --input_type num_iters
Simply set the number of training steps to 0 (--steps 0
) – the
(possibly loaded) model will be used as is.
The files to share when sharing a model for continued training are
those ending in .npy
, .structure
and .steps
. When only
interested in interference, the file ending in .steps
does not need
to be shared.
The main script is ./rl_playground.py. For command line arguments, see ./utils/arguments.py. There are some recommended defaults to set below.
The command line arguments given are automatically saved upon script start. Most files saved are stored with the starting time of the script as a timestamp, so all files for one experiment should be immediately recognizable by having the same timestamp (except for TensorBoard logs for now).
python rl_playground.py --envname sdc-v1 --num_envs 8 \
--model_class PPG --activation_fn ReLU \
--collect_states True --reward_iteration_only False --norm_obs True
To accelerate learning, increase the batch size if possible. Here is an example for PPG:
PPG has a default batch size of 64 (given as the keyword argument
batch_size
), so we could use a batch size of 512 like this:
python rl_playground.py --model_class PPG \
--model_kwargs '{"batch_size": 512}'
This will, however, most likely harm your training success due to
executing fewer training steps, as we process much more data (and thus
more environmental timesteps) in each training step. A good heuristic
to help against this problem is scaling the learning rate
proportionally to the batch size. The default learning rate we give is
25e-5, so scaling it to the increased batch size is 25e-5 * 512 / 64 = 0.002
. Our new command for starting the script becomes the
following:
python rl_playground.py --model_class PPG --learning_rate 0.002 \
--model_kwargs '{"batch_size": 512}'
For PPG, keep in mind that it also uses an auxiliary batch size
(aux_batch_size
)! Half of the normal batch size is a good starting
value for this. The final command is:
python rl_playground.py --model_class PPG --learning_rate 0.002 \
--model_kwargs '{"batch_size": 512, "aux_batch_size": 256}'