-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
9 changed files
with
65 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,4 @@ | ||
Using the ARLBench States | ||
========================== | ||
========================== | ||
|
||
In addition to providing different objectives, ARLBench also provides insights into the target algorithms' internal states. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,4 @@ | ||
ARLBench and Different AutoRL Paradigms | ||
======================================= | ||
======================================= | ||
|
||
TODO: relationship to other AutoRL paradigms |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,4 @@ | ||
Dynamic Configuration in ARLBench | ||
================================== | ||
================================== | ||
|
||
How to dynamic? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,12 @@ | ||
The ARLBench Subsets | ||
==================== | ||
==================== | ||
|
||
We analyzed the hyperparameter landscapes of PPO, DQN and SAC on 20 environments to select a subset which allows for efficient benchmarking of AutoRL algorithms. These are the resulting subsets: | ||
|
||
.. image:: path/subsets.png | ||
:width: 800 | ||
:alt: Alternative text | ||
|
||
We strongly recommend you focus your benchmarking on these exact environments to ensure you cover the space total landscape of RL behaviors well. | ||
The data generated for selecting these environments is available on `HuggingFace <https://huggingface.co/datasets/autorl-org/arlbench>`_ for you to use in your experiments. | ||
For more information how the subset selection was done, please refer to our paper. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,15 @@ | ||
Objectives in ARLBench | ||
====================== | ||
====================== | ||
|
||
ARLBench allows to configure the objectives you'd like to use for your AutoRL methods. | ||
These are selected as a list of keywords in the configuration of the AutoRL Environment, e.g. like this: | ||
|
||
.. code-block:: bash | ||
python arlbench.py autorl.objectives=["reward_mean"] | ||
The following objectives are available at the moment: | ||
- reward_mean: the mean evaluation reward across a number of evaluation episodes | ||
- reward_std: the standard deviation of the evaluation rewards across a number of evaluation episodes | ||
- runtime: the runtime of the training process | ||
- emissions: the CO2 emissions of the training process |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,28 @@ | ||
ARLBench Options | ||
================ | ||
================ | ||
|
||
A given training run in ARLBench can be configured on two levels: the lower one is the configuration that happens via the AutoRL tool we benchmark while the upper level decides the setting we test the AutoRL tool in. | ||
The high level configuration takes place via the 'autorl' keys in the configuration file. These are the available options: | ||
|
||
- **seed**: The seed for the random number generator | ||
- **env_framework**: Environment framework to use. Currently supported: gymnax, envpool, brax, xland | ||
- **env_name**: The name of the environment to use | ||
- **env_kwargs**: Additional keyword arguments for the environment | ||
- **eval_env_kwargs**: Additional keyword arguments for the evaluation environment | ||
- **n_envs**: Number of environments to use in parallel | ||
- **algorithm**: The algorithm to use. Currently supported: dqn, ppo, sac | ||
- **cnn_policy**: Whether to use a CNN policy | ||
- **deterministic_eval**: Whether to use deterministic evaluation. This diables exploration behaviors in evaluation. | ||
- **nas_config**: Configuration for architecture | ||
- **checkpoint**: A list of elements the checkpoint should contain | ||
- **checkpoint_name**: The name of the checkpoint | ||
- **checkpoint_dir**: The directory to save the checkpoint in | ||
- **objectives**: The objectives to optimize for. Currently supported: reward_mean, reward_std, runtime, emissions | ||
- **optimize_objectives**: Whether to maximize or minimize the objectives | ||
- **state_features**: The features of the RL algorithm's state to return | ||
- **n_steps**: The number of steps in the configuration schedule. Using 1 will result in a static configuration | ||
- **n_total_timesteps**: The total number of timesteps to train in each schedule interval | ||
- **n_eval_steps**: The number of steps to evaluate the agent for | ||
- **n_eval_episodes**: The number of episodes to evaluate the agent for | ||
|
||
The low level configuration options can be found in the 'hp_config' key set, containing the configurable hyperparameters and architecture of each algorithm. Please refer to the search space overview for more information. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,4 @@ | ||
Considerations for Seeding | ||
============================ | ||
============================ | ||
|
||
Seeding is important both on the level of RL algorithms as well as the AutoRL level. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.