diff --git a/docs/advanced_usage/algorithm_states.rst b/docs/advanced_usage/algorithm_states.rst index b0f2a40f9..92591589d 100644 --- a/docs/advanced_usage/algorithm_states.rst +++ b/docs/advanced_usage/algorithm_states.rst @@ -1,2 +1,4 @@ Using the ARLBench States -========================== \ No newline at end of file +========================== + +In addition to providing different objectives, ARLBench also provides insights into the target algorithms' internal states. \ No newline at end of file diff --git a/docs/advanced_usage/autorl_paradigms.rst b/docs/advanced_usage/autorl_paradigms.rst index 1bc5a5827..9c8a29ef1 100644 --- a/docs/advanced_usage/autorl_paradigms.rst +++ b/docs/advanced_usage/autorl_paradigms.rst @@ -1,2 +1,4 @@ ARLBench and Different AutoRL Paradigms -======================================= \ No newline at end of file +======================================= + +TODO: relationship to other AutoRL paradigms \ No newline at end of file diff --git a/docs/advanced_usage/dynamic_configuration.rst b/docs/advanced_usage/dynamic_configuration.rst index ae8b3b26d..5d6cde095 100644 --- a/docs/advanced_usage/dynamic_configuration.rst +++ b/docs/advanced_usage/dynamic_configuration.rst @@ -1,2 +1,4 @@ Dynamic Configuration in ARLBench -================================== \ No newline at end of file +================================== + +How to dynamic? \ No newline at end of file diff --git a/docs/basic_usage/env_subsets.rst b/docs/basic_usage/env_subsets.rst index 7734e9d75..69aeafa24 100644 --- a/docs/basic_usage/env_subsets.rst +++ b/docs/basic_usage/env_subsets.rst @@ -1,2 +1,12 @@ The ARLBench Subsets -==================== \ No newline at end of file +==================== + +We analyzed the hyperparameter landscapes of PPO, DQN and SAC on 20 environments to select a subset which allows for efficient benchmarking of AutoRL algorithms. These are the resulting subsets: + +.. image:: path/subsets.png + :width: 800 + :alt: Alternative text + +We strongly recommend you focus your benchmarking on these exact environments to ensure you cover the space total landscape of RL behaviors well. +The data generated for selecting these environments is available on `HuggingFace `_ for you to use in your experiments. +For more information how the subset selection was done, please refer to our paper. \ No newline at end of file diff --git a/docs/basic_usage/index.rst b/docs/basic_usage/index.rst index 621ec6ad4..c1a1cc4bf 100644 --- a/docs/basic_usage/index.rst +++ b/docs/basic_usage/index.rst @@ -9,6 +9,7 @@ Benchmarking AutoRL Methods seeding + ARLBench provides an basis for benchmarking different AutoRL methods. This section of the documentation focuses on the prominent aspect of black-box hyperparameter optimization, since it's the simplest usecase of ARLBench. We discuss the structure of ARLBenchmark, the currently supported objectives, the environment subsets and search spaces we provide and the seeding of the experiments in their own subpages. The most important question, however, is how to actually use ARLBench in your experiments. This is the workflow we propose: diff --git a/docs/basic_usage/objectives.rst b/docs/basic_usage/objectives.rst index 59f2be739..1a59039a6 100644 --- a/docs/basic_usage/objectives.rst +++ b/docs/basic_usage/objectives.rst @@ -1,2 +1,15 @@ Objectives in ARLBench -====================== \ No newline at end of file +====================== + +ARLBench allows to configure the objectives you'd like to use for your AutoRL methods. +These are selected as a list of keywords in the configuration of the AutoRL Environment, e.g. like this: + +.. code-block:: bash + + python arlbench.py autorl.objectives=["reward_mean"] + +The following objectives are available at the moment: +- reward_mean: the mean evaluation reward across a number of evaluation episodes +- reward_std: the standard deviation of the evaluation rewards across a number of evaluation episodes +- runtime: the runtime of the training process +- emissions: the CO2 emissions of the training process \ No newline at end of file diff --git a/docs/basic_usage/options.rst b/docs/basic_usage/options.rst index bf2b2f3f1..3c499bc81 100644 --- a/docs/basic_usage/options.rst +++ b/docs/basic_usage/options.rst @@ -1,2 +1,28 @@ ARLBench Options -================ \ No newline at end of file +================ + +A given training run in ARLBench can be configured on two levels: the lower one is the configuration that happens via the AutoRL tool we benchmark while the upper level decides the setting we test the AutoRL tool in. +The high level configuration takes place via the 'autorl' keys in the configuration file. These are the available options: + +- **seed**: The seed for the random number generator +- **env_framework**: Environment framework to use. Currently supported: gymnax, envpool, brax, xland +- **env_name**: The name of the environment to use +- **env_kwargs**: Additional keyword arguments for the environment +- **eval_env_kwargs**: Additional keyword arguments for the evaluation environment +- **n_envs**: Number of environments to use in parallel +- **algorithm**: The algorithm to use. Currently supported: dqn, ppo, sac +- **cnn_policy**: Whether to use a CNN policy +- **deterministic_eval**: Whether to use deterministic evaluation. This diables exploration behaviors in evaluation. +- **nas_config**: Configuration for architecture +- **checkpoint**: A list of elements the checkpoint should contain +- **checkpoint_name**: The name of the checkpoint +- **checkpoint_dir**: The directory to save the checkpoint in +- **objectives**: The objectives to optimize for. Currently supported: reward_mean, reward_std, runtime, emissions +- **optimize_objectives**: Whether to maximize or minimize the objectives +- **state_features**: The features of the RL algorithm's state to return +- **n_steps**: The number of steps in the configuration schedule. Using 1 will result in a static configuration +- **n_total_timesteps**: The total number of timesteps to train in each schedule interval +- **n_eval_steps**: The number of steps to evaluate the agent for +- **n_eval_episodes**: The number of episodes to evaluate the agent for + +The low level configuration options can be found in the 'hp_config' key set, containing the configurable hyperparameters and architecture of each algorithm. Please refer to the search space overview for more information. diff --git a/docs/basic_usage/seeding.rst b/docs/basic_usage/seeding.rst index 89fa99ad1..983eafe0c 100644 --- a/docs/basic_usage/seeding.rst +++ b/docs/basic_usage/seeding.rst @@ -1,2 +1,4 @@ Considerations for Seeding -============================ \ No newline at end of file +============================ + +Seeding is important both on the level of RL algorithms as well as the AutoRL level. \ No newline at end of file diff --git a/docs/images/subsets.png b/docs/images/subsets.png new file mode 100644 index 000000000..e85f29186 Binary files /dev/null and b/docs/images/subsets.png differ