more doc updates

automl · May 30, 2024 · 3cc13a9 · 3cc13a9
1 parent f580646
commit 3cc13a9
Show file tree

Hide file tree

Showing 9 changed files with 65 additions and 7 deletions.
diff --git a/docs/advanced_usage/algorithm_states.rst b/docs/advanced_usage/algorithm_states.rst
@@ -1,2 +1,4 @@
 Using the ARLBench States
-==========================
+==========================
+
+In addition to providing different objectives, ARLBench also provides insights into the target algorithms' internal states.
diff --git a/docs/advanced_usage/autorl_paradigms.rst b/docs/advanced_usage/autorl_paradigms.rst
@@ -1,2 +1,4 @@
 ARLBench and Different AutoRL Paradigms
-=======================================
+=======================================
+
+TODO: relationship to other AutoRL paradigms
diff --git a/docs/advanced_usage/dynamic_configuration.rst b/docs/advanced_usage/dynamic_configuration.rst
@@ -1,2 +1,4 @@
 Dynamic Configuration in ARLBench
-==================================
+==================================
+
+How to dynamic?
diff --git a/docs/basic_usage/env_subsets.rst b/docs/basic_usage/env_subsets.rst
@@ -1,2 +1,12 @@
 The ARLBench Subsets
-====================
+====================
+
+We analyzed the hyperparameter landscapes of PPO, DQN and SAC on 20 environments to select a subset which allows for efficient benchmarking of AutoRL algorithms. These are the resulting subsets:
+
+.. image:: path/subsets.png
+  :width: 800
+  :alt: Alternative text
+
+We strongly recommend you focus your benchmarking on these exact environments to ensure you cover the space total landscape of RL behaviors well. 
+The data generated for selecting these environments is available on `HuggingFace <https://huggingface.co/datasets/autorl-org/arlbench>`_ for you to use in your experiments.
+For more information how the subset selection was done, please refer to our paper.
diff --git a/docs/basic_usage/index.rst b/docs/basic_usage/index.rst
@@ -9,6 +9,7 @@ Benchmarking AutoRL Methods
    seeding
 
 
+
 ARLBench provides an basis for benchmarking different AutoRL methods. This section of the documentation focuses on the prominent aspect of black-box hyperparameter optimization, since it's the simplest usecase of ARLBench.
 We discuss the structure of ARLBenchmark, the currently supported objectives, the environment subsets and search spaces we provide and the seeding of the experiments in their own subpages. 
 The most important question, however, is how to actually use ARLBench in your experiments. This is the workflow we propose:

diff --git a/docs/basic_usage/objectives.rst b/docs/basic_usage/objectives.rst
@@ -1,2 +1,15 @@
 Objectives in ARLBench
-======================
+======================
+
+ARLBench allows to configure the objectives you'd like to use for your AutoRL methods. 
+These are selected as a list of keywords in the configuration of the AutoRL Environment, e.g. like this:
+
+.. code-block:: bash
+
+    python arlbench.py autorl.objectives=["reward_mean"]
+
+The following objectives are available at the moment:
+- reward_mean: the mean evaluation reward across a number of evaluation episodes
+- reward_std: the standard deviation of the evaluation rewards across a number of evaluation episodes
+- runtime: the runtime of the training process
+- emissions: the CO2 emissions of the training process
diff --git a/docs/basic_usage/options.rst b/docs/basic_usage/options.rst
@@ -1,2 +1,28 @@
 ARLBench Options
-================
+================
+
+A given training run in ARLBench can be configured on two levels: the lower one is the configuration that happens via the AutoRL tool we benchmark while the upper level decides the setting we test the AutoRL tool in.
+The high level configuration takes place via the 'autorl' keys in the configuration file. These are the available options:
+
+- **seed**: The seed for the random number generator 
+- **env_framework**: Environment framework to use. Currently supported: gymnax, envpool, brax, xland
+- **env_name**: The name of the environment to use
+- **env_kwargs**: Additional keyword arguments for the environment
+- **eval_env_kwargs**: Additional keyword arguments for the evaluation environment
+- **n_envs**: Number of environments to use in parallel
+- **algorithm**: The algorithm to use. Currently supported: dqn, ppo, sac
+- **cnn_policy**: Whether to use a CNN policy
+- **deterministic_eval**: Whether to use deterministic evaluation. This diables exploration behaviors in evaluation.
+- **nas_config**: Configuration for architecture
+- **checkpoint**: A list of elements the checkpoint should contain 
+- **checkpoint_name**: The name of the checkpoint
+- **checkpoint_dir**: The directory to save the checkpoint in
+- **objectives**: The objectives to optimize for. Currently supported: reward_mean, reward_std, runtime, emissions
+- **optimize_objectives**: Whether to maximize or minimize the objectives
+- **state_features**: The features of the RL algorithm's state to return
+- **n_steps**: The number of steps in the configuration schedule. Using 1 will result in a static configuration
+- **n_total_timesteps**: The total number of timesteps to train in each schedule interval
+- **n_eval_steps**: The number of steps to evaluate the agent for
+- **n_eval_episodes**: The number of episodes to evaluate the agent for
+
+The low level configuration options can be found in the 'hp_config' key set, containing the configurable hyperparameters and architecture of each algorithm. Please refer to the search space overview for more information.
diff --git a/docs/basic_usage/seeding.rst b/docs/basic_usage/seeding.rst
@@ -1,2 +1,4 @@
 Considerations for Seeding
-============================
+============================
+
+Seeding is important both on the level of RL algorithms as well as the AutoRL level.
diff --git a/docs/images/subsets.png b/docs/images/subsets.png