-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
8 changed files
with
61 additions
and
44 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -51,13 +51,14 @@ Ready to contribute? Here's how to set up `arlbench` for local development. | |
2. Clone your fork locally: | ||
``` | ||
$ git clone [email protected]:your_name_here/arlbench.git | ||
$ cd arlbench | ||
``` | ||
|
||
3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development: | ||
3. Install your local copy into a conda env: | ||
``` | ||
$ mkvirtualenv arlbench | ||
$ cd arlbench/ | ||
$ python setup.py develop | ||
$ conda create -n arlbench python=3.10 | ||
$ conda activate arlbench | ||
$ make install-dev | ||
``` | ||
|
||
4. Create a branch for local development: | ||
|
@@ -67,15 +68,11 @@ Ready to contribute? Here's how to set up `arlbench` for local development. | |
|
||
Now you can make your changes locally. | ||
|
||
5. When you're done making changes, check that your changes pass ruff, including testing other Python versions with tox: | ||
5. When you're done making changes, check that your changes pass ruff: | ||
``` | ||
$ ruff format arlbench tests | ||
$ python setup.py test or pytest | ||
$ tox | ||
$ make format | ||
``` | ||
|
||
To get flake8 and tox, just pip install them into your virtualenv. | ||
|
||
6. Commit your changes and push your branch to GitHub: | ||
``` | ||
$ git add . | ||
|
@@ -93,16 +90,14 @@ Before you submit a pull request, check that it meets these guidelines: | |
2. If the pull request adds functionality, the docs should be updated. Put | ||
your new functionality into a function with a docstring, and add the | ||
feature to the list in README.rst. | ||
3. The pull request should work for Python 3.5, 3.6, 3.7 and 3.8, and for PyPy. Check | ||
https://travis-ci.com/automl/arlbench/pull_requests | ||
and make sure that the tests pass for all supported Python versions. | ||
3. The pull request should work for Python 3.10 and above. This should be tested in the GitHub workflows. | ||
|
||
## Tips | ||
|
||
To run a subset of tests: | ||
|
||
``` | ||
$ pytest tests.test_arlbench | ||
$ make test | ||
``` | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,9 @@ | ||
Using the ARLBench States | ||
========================== | ||
|
||
In addition to providing different objectives, ARLBench also provides insights into the target algorithms' internal states. This is done using so called `StateFeatures`. | ||
As of now, we implement the `GradInfo` state feature which returns the norm the gradients observed during training. | ||
In addition to providing different objectives, ARLBench also provides insights into the target algorithms' **internal states**. This is done using so called `StateFeatures`. | ||
|
||
The used state features can be defined using the `state_features` key in the config passed to the AutoRL Environment. Please include `grad_info` in this list if you want to use this state feature for your approach. | ||
As of now, we implement the `GradInfo` state feature which returns the **norm and variance of the gradients observed during training**. | ||
The used state features can be defined using the `state_features` key in the config passed to the AutoRL Environment. | ||
Please include `grad_info` in this list if you want to use this state feature for your approach. | ||
We are currently working on extending this part of ARLBench to other state features as well. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,28 @@ | ||
ARLBench and Different AutoRL Paradigms | ||
======================================= | ||
|
||
In this chapter, we elaborate on the relationship between ARLBench in various AutoRL Paradigms. | ||
Since there are various AutoRL paradigms in the literature, we mention how ARLBench relates to each one. | ||
|
||
Hyperparameter Optimization (HPO) | ||
--------------------------------- | ||
(Static) Hyperparameter optimization is one of the core use cases of ARLBench. As stated in our examples, ARLBench supports all kinds of black-box optimizers to perform hyperparameter optimization for RL. | ||
Hyperparameter optimization is one of the core use cases of ARLBench. As stated in our examples, ARLBench supports all kinds of black-box optimizers to perform hyperparameter optimization for RL. | ||
This can also be done in a dynamic fashion in ARLBench. | ||
|
||
Dynamic Algorithm Configuration (DAC) | ||
------------------------------------- | ||
When it comes to dynamic approaches, ARLBench supports different kinds of optimization techniques that adapt the current hyperparameter configuration during training. As stated in the examples, | ||
this can be done using the CLI or the AutoRL Environment. Using checkpointing, trainings can be continued seamlessly which allows for flexible dynamic approaches. | ||
this can be done using the CLI or the AutoRL Environment. In DAC specifically, however, the hyperparameter controller learns to adapt hyperparameters based in an algorithm state. | ||
This is supported in ARLBench, but not implemented extensively just yet. At the moment, we only offer a limited amount of gradient features, which might not be enough to learn a reliable hyperparameter controller. | ||
Since DAC has not been applied to RL in this manner yet, however, we are not yet sure which other features are necessary to make DAC work in the context of RL. | ||
|
||
Neural Architecture Search (NAS) | ||
-------------------------------- | ||
In addition to HPO, ARLBench supports NAS approaches that set the size of hidden layers and activation functions. However, as of now this is limited to these two architecture hyperparameters. | ||
In the future, ARLBench could be extended by more powerful search space interfaces for NAS. | ||
Most NAS approaches actually focus on more elaborate search spaces to find architectures tailored to a usecase. This line of research is not very prominent in the context of RL yet, unfortunately. | ||
We hope ARLBench can support such research in the future by extending to standard NAS search spaces like DARTS or novel RL-specific ones. | ||
|
||
Meta-Gradients | ||
-------------- | ||
As of now, ARLBench does not include meta-gradient based approaches for AutoRL. However, we allow for reactive dynamic approaches that use the gradient informatio during training to select the next hyperparameter configuration as stated in our examples. | ||
As of now, ARLBench does not include meta-gradient/second order optimization based approaches for AutoRL. | ||
However, we allow for reactive dynamic approaches that use the gradient informatio during training to select the next hyperparameter configuration as stated in our examples. | ||
Through this interface, we jope to be able to provide an option for second order gradient computation in the future. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,14 @@ | ||
Dynamic Configuration in ARLBench | ||
================================== | ||
|
||
In addition to static approaches, which run the whole training given a fixed configuration, ARLBench supports dynamic configuration approaches. | ||
These methods, in contrast, can adapt the current hyperparameter configuration during training. | ||
In addition to static approaches, which run the whole training given a fixed configuration, ARLBench supports **dynamic configuration approaches**. | ||
These methods, in contrast, can adapt the current hyperparameter configuration **during training**. | ||
To do this, you can use the CLI or the AutoRL Environment as shown in our examples. | ||
|
||
When using the CLI, you have to pass a checkpoint path for the current training state. Then, the training is proceeded using the given configuration. | ||
When using the CLI, you have to **pass a checkpoint path** for the current training state. | ||
Then, the training is proceeded using this training state with a new configuration. | ||
This is especially useful for highly parallelizable dynamic tuning methods, e.g. population based methods. | ||
|
||
For the AutoRL Environment, you can set `n_steps` to the number of configuration updates you want to perform during training. | ||
By adjusting the number of training steps (`n_total_timesteps`) accordingly and calling the `step()` function multiple times to perform dynamic configuration. | ||
For the AutoRL Environment, you can set `n_steps` to the **number of configuration updates** you want to perform during training. | ||
You should also adjust (`n_total_timesteps`) accordingly down to 1/`n_steps` in your settings. | ||
Then calling the `step()` function multiple times until termination will perform the same dynamic configuration as with the CLI. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,17 @@ | ||
The ARLBench Subsets | ||
==================== | ||
|
||
We analyzed the hyperparameter landscapes of PPO, DQN and SAC on 20 environments to select a subset which allows for efficient benchmarking of AutoRL algorithms. These are the resulting subsets: | ||
We analyzed the hyperparameter landscapes of PPO, DQN and SAC on 20 environments to select a subset which allows for efficient benchmarking of AutoRL algorithms. | ||
This subset of 4-5 environments per algorithms matches the overall reward distribution across 128 hyperparameter configurations and 10 seeds:: | ||
|
||
.. image:: ../images/subsets.png | ||
:width: 800 | ||
:alt: Environment subsets for PPO, DQN and SAC | ||
|
||
We strongly recommend you focus your benchmarking on these exact environments to ensure you cover the space total landscape of RL behaviors well. | ||
In our experiments on GPU, all subset together should take about **1.5h to evaluate once**. | ||
This number will need to be multiplied by the number of RL seeds you want to evaluate on, the number of optimizer runs you consider as well as the optimization budget for the total runtime of your experiments. | ||
If this full runtime is too long for your setup, you can also consider evaluating only a subset of algorithms - we strongly recommend you focus your benchmarking **on these exact environments**, however, to ensure you cover the space total landscape of RL behaviors well. | ||
|
||
The data generated for selecting these environments is available on `HuggingFace <https://huggingface.co/datasets/autorl-org/arlbench>`_ for you to use in your experiments. | ||
For more information how the subset selection was done, please refer to our paper. | ||
|
||
For more information on how to evaluate your method on these subsets, please refer to the examples in our GitHub repository. | ||
The examples in our GitHub repository show how you can evaluate your own method using these subsets. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,13 @@ | ||
Considerations for Seeding | ||
============================ | ||
|
||
Seeding is important both on the level of RL algorithms as well as the AutoRL level. In general, we propose to use three different random seeds for training, validation, and testing. | ||
For training and validation, ARLBench takes care of the seeding. When you pass a seed to the AutoRL Environment, it uses this seed for training but `seed + 1` for the validation during training. | ||
Seeding is important both on the level of RL algorithms as well as the AutoRL level. | ||
In general, we propose to use **three different set of random seeds** for training, validation, and testing. | ||
|
||
For **training and validation**, ARLBench takes care of the seeding. When you pass a seed to the AutoRL Environment, it uses this seed for training but `seed + 1` for the validation during training. | ||
We recommend to use seeds `0` - `9` for training and validation, i.e., by passing them to the AutoRL Environment for the tuning process. | ||
You are of course free to increase this range, but we recommend to use **at least 10 different seeds** for reliable results. | ||
|
||
When it comes to testing HPO methods, we provide a evaluation script in our examples. We propose to use seeds `100, 101, ...` here to make sure the method is tested on a different set of random seeds. | ||
When it comes to testing HPO methods, we provide a evaluation script in our examples. | ||
We propose to use seeds `100, 101, ...` here to make sure the method is tested on a different set of random seeds. | ||
Here we suggest **three HPO runs as a minimum** even for stable optimizers - for consistent results with small confidence intervals, you should like aim for more runs. |