Skip to content

Commit

Permalink
Theresa Eimer: more doc updates
Browse files Browse the repository at this point in the history
  • Loading branch information
Github Actions committed May 30, 2024
1 parent 2c919c4 commit 0efd5e3
Show file tree
Hide file tree
Showing 25 changed files with 111 additions and 8 deletions.
Binary file modified main/.doctrees/advanced_usage/algorithm_states.doctree
Binary file not shown.
Binary file modified main/.doctrees/advanced_usage/autorl_paradigms.doctree
Binary file not shown.
Binary file modified main/.doctrees/advanced_usage/dynamic_configuration.doctree
Binary file not shown.
Binary file modified main/.doctrees/basic_usage/env_subsets.doctree
Binary file not shown.
Binary file modified main/.doctrees/basic_usage/index.doctree
Binary file not shown.
Binary file modified main/.doctrees/basic_usage/objectives.doctree
Binary file not shown.
Binary file modified main/.doctrees/basic_usage/options.doctree
Binary file not shown.
Binary file modified main/.doctrees/basic_usage/seeding.doctree
Binary file not shown.
Binary file modified main/.doctrees/environment.pickle
Binary file not shown.
4 changes: 3 additions & 1 deletion main/_sources/advanced_usage/algorithm_states.rst.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
Using the ARLBench States
==========================
==========================

In addition to providing different objectives, ARLBench also provides insights into the target algorithms' internal states.
4 changes: 3 additions & 1 deletion main/_sources/advanced_usage/autorl_paradigms.rst.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
ARLBench and Different AutoRL Paradigms
=======================================
=======================================

TODO: relationship to other AutoRL paradigms
4 changes: 3 additions & 1 deletion main/_sources/advanced_usage/dynamic_configuration.rst.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
Dynamic Configuration in ARLBench
==================================
==================================

How to dynamic?
12 changes: 11 additions & 1 deletion main/_sources/basic_usage/env_subsets.rst.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,12 @@
The ARLBench Subsets
====================
====================

We analyzed the hyperparameter landscapes of PPO, DQN and SAC on 20 environments to select a subset which allows for efficient benchmarking of AutoRL algorithms. These are the resulting subsets:

.. image:: path/subsets.png
:width: 800
:alt: Alternative text

We strongly recommend you focus your benchmarking on these exact environments to ensure you cover the space total landscape of RL behaviors well.
The data generated for selecting these environments is available on `HuggingFace <https://huggingface.co/datasets/autorl-org/arlbench>`_ for you to use in your experiments.
For more information how the subset selection was done, please refer to our paper.
1 change: 1 addition & 0 deletions main/_sources/basic_usage/index.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Benchmarking AutoRL Methods
seeding



ARLBench provides an basis for benchmarking different AutoRL methods. This section of the documentation focuses on the prominent aspect of black-box hyperparameter optimization, since it's the simplest usecase of ARLBench.
We discuss the structure of ARLBenchmark, the currently supported objectives, the environment subsets and search spaces we provide and the seeding of the experiments in their own subpages.
The most important question, however, is how to actually use ARLBench in your experiments. This is the workflow we propose:
Expand Down
15 changes: 14 additions & 1 deletion main/_sources/basic_usage/objectives.rst.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,15 @@
Objectives in ARLBench
======================
======================

ARLBench allows to configure the objectives you'd like to use for your AutoRL methods.
These are selected as a list of keywords in the configuration of the AutoRL Environment, e.g. like this:

.. code-block:: bash
python arlbench.py autorl.objectives=["reward_mean"]
The following objectives are available at the moment:
- reward_mean: the mean evaluation reward across a number of evaluation episodes
- reward_std: the standard deviation of the evaluation rewards across a number of evaluation episodes
- runtime: the runtime of the training process
- emissions: the CO2 emissions of the training process
28 changes: 27 additions & 1 deletion main/_sources/basic_usage/options.rst.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,28 @@
ARLBench Options
================
================

A given training run in ARLBench can be configured on two levels: the lower one is the configuration that happens via the AutoRL tool we benchmark while the upper level decides the setting we test the AutoRL tool in.
The high level configuration takes place via the 'autorl' keys in the configuration file. These are the available options:

- **seed**: The seed for the random number generator
- **env_framework**: Environment framework to use. Currently supported: gymnax, envpool, brax, xland
- **env_name**: The name of the environment to use
- **env_kwargs**: Additional keyword arguments for the environment
- **eval_env_kwargs**: Additional keyword arguments for the evaluation environment
- **n_envs**: Number of environments to use in parallel
- **algorithm**: The algorithm to use. Currently supported: dqn, ppo, sac
- **cnn_policy**: Whether to use a CNN policy
- **deterministic_eval**: Whether to use deterministic evaluation. This diables exploration behaviors in evaluation.
- **nas_config**: Configuration for architecture
- **checkpoint**: A list of elements the checkpoint should contain
- **checkpoint_name**: The name of the checkpoint
- **checkpoint_dir**: The directory to save the checkpoint in
- **objectives**: The objectives to optimize for. Currently supported: reward_mean, reward_std, runtime, emissions
- **optimize_objectives**: Whether to maximize or minimize the objectives
- **state_features**: The features of the RL algorithm's state to return
- **n_steps**: The number of steps in the configuration schedule. Using 1 will result in a static configuration
- **n_total_timesteps**: The total number of timesteps to train in each schedule interval
- **n_eval_steps**: The number of steps to evaluate the agent for
- **n_eval_episodes**: The number of episodes to evaluate the agent for

The low level configuration options can be found in the 'hp_config' key set, containing the configurable hyperparameters and architecture of each algorithm. Please refer to the search space overview for more information.
4 changes: 3 additions & 1 deletion main/_sources/basic_usage/seeding.rst.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
Considerations for Seeding
============================
============================

Seeding is important both on the level of RL algorithms as well as the AutoRL level.
1 change: 1 addition & 0 deletions main/advanced_usage/algorithm_states.html
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,7 @@

<section id="using-the-arlbench-states">
<h1>Using the ARLBench States<a class="headerlink" href="#using-the-arlbench-states" title="Link to this heading"></a></h1>
<p>In addition to providing different objectives, ARLBench also provides insights into the target algorithms’ internal states.</p>
</section>


Expand Down
1 change: 1 addition & 0 deletions main/advanced_usage/autorl_paradigms.html
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,7 @@

<section id="arlbench-and-different-autorl-paradigms">
<h1>ARLBench and Different AutoRL Paradigms<a class="headerlink" href="#arlbench-and-different-autorl-paradigms" title="Link to this heading"></a></h1>
<p>TODO: relationship to other AutoRL paradigms</p>
</section>


Expand Down
1 change: 1 addition & 0 deletions main/advanced_usage/dynamic_configuration.html
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,7 @@

<section id="dynamic-configuration-in-arlbench">
<h1>Dynamic Configuration in ARLBench<a class="headerlink" href="#dynamic-configuration-in-arlbench" title="Link to this heading"></a></h1>
<p>How to dynamic?</p>
</section>


Expand Down
6 changes: 6 additions & 0 deletions main/basic_usage/env_subsets.html
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,12 @@

<section id="the-arlbench-subsets">
<h1>The ARLBench Subsets<a class="headerlink" href="#the-arlbench-subsets" title="Link to this heading"></a></h1>
<p>We analyzed the hyperparameter landscapes of PPO, DQN and SAC on 20 environments to select a subset which allows for efficient benchmarking of AutoRL algorithms. These are the resulting subsets:</p>
<a class="reference internal image-reference" href="basic_usage/path/subsets.png"><img alt="Alternative text" src="basic_usage/path/subsets.png" style="width: 800px;" />
</a>
<p>We strongly recommend you focus your benchmarking on these exact environments to ensure you cover the space total landscape of RL behaviors well.
The data generated for selecting these environments is available on <a class="reference external" href="https://huggingface.co/datasets/autorl-org/arlbench">HuggingFace</a> for you to use in your experiments.
For more information how the subset selection was done, please refer to our paper.</p>
</section>


Expand Down
10 changes: 10 additions & 0 deletions main/basic_usage/objectives.html
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,16 @@

<section id="objectives-in-arlbench">
<h1>Objectives in ARLBench<a class="headerlink" href="#objectives-in-arlbench" title="Link to this heading"></a></h1>
<p>ARLBench allows to configure the objectives you’d like to use for your AutoRL methods.
These are selected as a list of keywords in the configuration of the AutoRL Environment, e.g. like this:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python<span class="w"> </span>arlbench.py<span class="w"> </span>autorl.objectives<span class="o">=[</span><span class="s2">&quot;reward_mean&quot;</span><span class="o">]</span>
</pre></div>
</div>
<p>The following objectives are available at the moment:
- reward_mean: the mean evaluation reward across a number of evaluation episodes
- reward_std: the standard deviation of the evaluation rewards across a number of evaluation episodes
- runtime: the runtime of the training process
- emissions: the CO2 emissions of the training process</p>
</section>


Expand Down
25 changes: 25 additions & 0 deletions main/basic_usage/options.html
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,31 @@

<section id="arlbench-options">
<h1>ARLBench Options<a class="headerlink" href="#arlbench-options" title="Link to this heading"></a></h1>
<p>A given training run in ARLBench can be configured on two levels: the lower one is the configuration that happens via the AutoRL tool we benchmark while the upper level decides the setting we test the AutoRL tool in.
The high level configuration takes place via the ‘autorl’ keys in the configuration file. These are the available options:</p>
<ul class="simple">
<li><p><strong>seed</strong>: The seed for the random number generator</p></li>
<li><p><strong>env_framework</strong>: Environment framework to use. Currently supported: gymnax, envpool, brax, xland</p></li>
<li><p><strong>env_name</strong>: The name of the environment to use</p></li>
<li><p><strong>env_kwargs</strong>: Additional keyword arguments for the environment</p></li>
<li><p><strong>eval_env_kwargs</strong>: Additional keyword arguments for the evaluation environment</p></li>
<li><p><strong>n_envs</strong>: Number of environments to use in parallel</p></li>
<li><p><strong>algorithm</strong>: The algorithm to use. Currently supported: dqn, ppo, sac</p></li>
<li><p><strong>cnn_policy</strong>: Whether to use a CNN policy</p></li>
<li><p><strong>deterministic_eval</strong>: Whether to use deterministic evaluation. This diables exploration behaviors in evaluation.</p></li>
<li><p><strong>nas_config</strong>: Configuration for architecture</p></li>
<li><p><strong>checkpoint</strong>: A list of elements the checkpoint should contain</p></li>
<li><p><strong>checkpoint_name</strong>: The name of the checkpoint</p></li>
<li><p><strong>checkpoint_dir</strong>: The directory to save the checkpoint in</p></li>
<li><p><strong>objectives</strong>: The objectives to optimize for. Currently supported: reward_mean, reward_std, runtime, emissions</p></li>
<li><p><strong>optimize_objectives</strong>: Whether to maximize or minimize the objectives</p></li>
<li><p><strong>state_features</strong>: The features of the RL algorithm’s state to return</p></li>
<li><p><strong>n_steps</strong>: The number of steps in the configuration schedule. Using 1 will result in a static configuration</p></li>
<li><p><strong>n_total_timesteps</strong>: The total number of timesteps to train in each schedule interval</p></li>
<li><p><strong>n_eval_steps</strong>: The number of steps to evaluate the agent for</p></li>
<li><p><strong>n_eval_episodes</strong>: The number of episodes to evaluate the agent for</p></li>
</ul>
<p>The low level configuration options can be found in the ‘hp_config’ key set, containing the configurable hyperparameters and architecture of each algorithm. Please refer to the search space overview for more information.</p>
</section>


Expand Down
1 change: 1 addition & 0 deletions main/basic_usage/seeding.html
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,7 @@

<section id="considerations-for-seeding">
<h1>Considerations for Seeding<a class="headerlink" href="#considerations-for-seeding" title="Link to this heading"></a></h1>
<p>Seeding is important both on the level of RL algorithms as well as the AutoRL level.</p>
</section>


Expand Down
2 changes: 1 addition & 1 deletion main/searchindex.js

Large diffs are not rendered by default.

0 comments on commit 0efd5e3

Please sign in to comment.