Skip to content

Commit

Permalink
[docs] Callbacks edits (#49599)
Browse files Browse the repository at this point in the history
I mistakenly approved #49513
before making copy edits. This PR is a mulligan.

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: angelinalg <[email protected]>
Signed-off-by: Sven Mika <[email protected]>
Signed-off-by: angelinalg <[email protected]>
Co-authored-by: Sven Mika <[email protected]>
  • Loading branch information
angelinalg and sven1977 authored Jan 9, 2025
1 parent 2ac04cc commit a3b1265
Show file tree
Hide file tree
Showing 7 changed files with 182 additions and 193 deletions.
2 changes: 1 addition & 1 deletion .vale/styles/config/vocabularies/RLlib/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ pragmas?
(RL|rl)lib
RNN
RLModules?
rollout
[Rr]ollout
(SAC|sac)
SGD
[Ss]ubcomponents?
Expand Down
26 changes: 12 additions & 14 deletions doc/source/rllib/package_ref/callback.rst
Original file line number Diff line number Diff line change
@@ -1,20 +1,18 @@
.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst

.. _rllib-callback-reference-docs:

Callback APIs
=============

RLlib's callback APIs enable you to inject code into your experiment, your Algorithm,
and its subcomponents.
Callback APIs enable you to inject code into an experiment, an Algorithm,
and the subcomponents of an Algorithm.

You can either subclass :py:class:`~ray.rllib.callbacks.callbacks.RLlibCallback` and implement
one or more of its methods, for example :py:meth:`~ray.rllib.callbacks.callbacks.RLlibCallback.on_algorithm_init`,
or you can pass respective arguments to the :py:meth:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig.callbacks`
method of your algorithm's config, for example
``config.callbacks(on_algorithm_init=lambda algorithm, **kw: print('algo initialized!'))``:
one or more of its methods, like :py:meth:`~ray.rllib.callbacks.callbacks.RLlibCallback.on_algorithm_init`,
or pass respective arguments to the :py:meth:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig.callbacks`
method of an Algorithm's config, like
``config.callbacks(on_algorithm_init=lambda algorithm, **kw: print('algo initialized!'))``.

.. tab-set::

Expand Down Expand Up @@ -60,15 +58,15 @@ method of your algorithm's config, for example
config.validate()


See here for :ref:`more details on how to write and configure your own custom callbacks <rllib-callback-docs>`.
See :ref:`Callbacks <rllib-callback-docs>` for more details on how to write and configure callbacks.


Methods you should implement for custom behavior
------------------------------------------------
Methods to implement for custom behavior
----------------------------------------

.. note::

Currently, RLlib only invokes callbacks in :py:class:`~ray.rllib.algorithms.algorithm.Algorithm`
RLlib only invokes callbacks in :py:class:`~ray.rllib.algorithms.algorithm.Algorithm`
and :py:class:`~ray.rllib.env.env_runner.EnvRunner` actors.
The Ray team is considering expanding callbacks onto :py:class:`~ray.rllib.core.learner.learner.Learner`
actors and possibly :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` instances as well.
Expand All @@ -91,7 +89,7 @@ RLlibCallback
Callbacks invoked in Algorithm
------------------------------

The following callback methods are always executed on the main Algorithm process:
The main Algorithm process always executes the following callback methods:

.. autosummary::
:nosignatures:
Expand All @@ -111,7 +109,7 @@ The following callback methods are always executed on the main Algorithm process
Callbacks invoked in EnvRunner
------------------------------

The following callback methods are always executed on EnvRunner actors:
The EnvRunner actors always execute the following callback methods:

.. autosummary::
:nosignatures:
Expand Down
209 changes: 104 additions & 105 deletions doc/source/rllib/rllib-advanced-api.rst

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion doc/source/rllib/rllib-algorithms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
Algorithms
==========

The following table is an overview of all available algorithms in RLlib. Note that all of them support
The following table is an overview of all available algorithms in RLlib. Note that all algorithms support
multi-GPU training on a single (GPU) node in `Ray (open-source) <https://docs.ray.io/en/latest/index.html>`__ (|multi_gpu|)
as well as multi-GPU training on multi-node (GPU) clusters when using the `Anyscale platform <https://www.anyscale.com/platform>`__
(|multi_node_multi_gpu|).
Expand Down
106 changes: 49 additions & 57 deletions doc/source/rllib/rllib-callback.rst
Original file line number Diff line number Diff line change
@@ -1,19 +1,15 @@
.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst

.. _rllib-callback-docs:

RLlib's callback APIs
=====================

RLlib's callbacks are the easiest way for you to inject code into your experiments.
Callbacks
=========

In a nutshell, you define the code you want to execute at certain events and pass it to your
Callbacks are the most straightforward way to inject code into experiments. You can define the code to execute at certain events and pass it to your
:py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig`.

Here is an example of defining a simple lambda that prints out an episode's return after
the episode has terminated:
The following is an example of defining a simple lambda that prints out an episode's return after
the episode terminates:

.. testcode::

Expand All @@ -37,30 +33,30 @@ the episode has terminated:
ppo.stop()


Callback lambdas vs stateful RLlibCallback
------------------------------------------
There are two ways to define custom code and have it executed during the various callback events.
Callback lambdas versus stateful RLlibCallback
----------------------------------------------
There are two ways to define custom code for various callback events to execute.

Callback lambdas
~~~~~~~~~~~~~~~~
If your injected code is rather simple and doesn't need to store temporary information
If the injected code is rather simple and doesn't need to store temporary information
for reuse in succeeding event calls, you can use a lambda
and pass it to the :py:meth:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig.callbacks`
method as previously shown.

See here for a :ref:`complete list of supported callback events <rllib-callback-event-overview>`.
See ref:`Callback events <rllib-callback-event-overview>` for a complete list.
The names of the events always match the argument names for the
:py:meth:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig.callbacks` method.


Stateful RLlibCallback
~~~~~~~~~~~~~~~~~~~~~~
If you need your code to be stateful and be able to temporarily store results for reuse
If the injected code is stateful and temporarily stores results for reuse
in succeeding calls triggered by the same or a different event, you
need to subclass the :py:class:`~ray.rllib.callbacks.callbacks.RLlibCallback` API and then implement
one or more methods, for example :py:meth:`~ray.rllib.callbacks.callbacks.RLlibCallback.on_algorithm_init`:

Here is the same example, printing out a terminated episode's return, but using
The following is the same example that prints out a terminated episode's return, but uses
a subclass of :py:class:`~ray.rllib.callbacks.callbacks.RLlibCallback`.

.. testcode::
Expand Down Expand Up @@ -94,24 +90,11 @@ a subclass of :py:class:`~ray.rllib.callbacks.callbacks.RLlibCallback`.

.. _rllib-callback-event-overview:

Overview of all callback events
-------------------------------

During a training iteration, the Algorithm normally walks through the following event tree. Note
that some of the events in the tree happen simultaneously, on different processes through Ray actors.
For example an EnvRunner actor may trigger its ``on_episode_start`` event while at the same time another
EnvRunner actor may trigger its ``on_sample_end`` event and the main Algorithm process triggers
``on_train_result``.

.. note::

Currently, RLlib only invokes callbacks in :py:class:`~ray.rllib.algorithms.algorithm.Algorithm`
and :py:class:`~ray.rllib.env.env_runner.EnvRunner` actors.
The Ray team is considering expanding callbacks onto :py:class:`~ray.rllib.core.learner.learner.Learner`
actors and possibly :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` instances as well.

Callback events
---------------

Here is a high-level overview of all supported events in RLlib's callbacks system.
During a training iteration, the Algorithm normally walks through the following event tree,
a high-level overview of all supported events in RLlib's callbacks system:

.. code-block:: text
Expand All @@ -136,10 +119,21 @@ Here is a high-level overview of all supported events in RLlib's callbacks syste
`on_episode_end` - After an episode object has terminated (or truncated).
`on_sample_end` - At the end of the ``EnvRunner.sample()`` call.
Note that some of the events in the tree happen simultaneously, on different processes through Ray actors.
For example an EnvRunner actor may trigger its ``on_episode_start`` event while at the same time another
EnvRunner actor may trigger its ``on_sample_end`` event and the main Algorithm process triggers
``on_train_result``.

.. note::

RLlib only invokes callbacks in :py:class:`~ray.rllib.algorithms.algorithm.Algorithm`
and :py:class:`~ray.rllib.env.env_runner.EnvRunner` actors.
The Ray team is considering expanding callbacks onto :py:class:`~ray.rllib.core.learner.learner.Learner`
actors and possibly :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` instances as well.

.. currentmodule:: ray.rllib.callbacks.callbacks

.. dropdown:: Click here to see all Algorithm-bound methods of ``RLlibCallback``
.. dropdown:: Algorithm-bound methods of ``RLlibCallback``

.. autosummary::

Expand All @@ -149,7 +143,7 @@ Here is a high-level overview of all supported events in RLlib's callbacks syste
RLlibCallback.on_env_runners_recreated
RLlibCallback.on_checkpoint_loaded

.. dropdown:: Click here to see all EnvRunner-bound methods of ``RLlibCallback``
.. dropdown:: EnvRunner-bound methods of ``RLlibCallback``

.. autosummary::

Expand All @@ -169,9 +163,9 @@ You can define more than one :py:class:`~ray.rllib.callbacks.callbacks.RLlibCall
You can also send lists of callables, instead of a single callable, to the different
arguments of that method.

For example, assume you already have a subclass of :py:class:`~ray.rllib.callbacks.callbacks.RLlibCallback`
written and would like to reuse it in different experiments. However, one of your experiments
requires some debug callback code you would like to inject only temporarily for a couple of runs.
For example, if you already wrote a subclass of :py:class:`~ray.rllib.callbacks.callbacks.RLlibCallback`
and want to reuse it in different experiments. Because one of your experiments
requires some debug callback code, you want to inject it only temporarily for a couple of runs.

Resolution order of chained callbacks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -180,16 +174,16 @@ RLlib resolves all available callback methods and callables for a given event
as follows:

Subclasses of :py:class:`~ray.rllib.callbacks.callbacks.RLlibCallback` take precedence
over individual or lists of callables provided through the various arguments of
over individual or lists of callables that you provide through the various arguments of
the :py:meth:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig.callbacks` method.

For example, assume the callback event is ``on_train_result``, which fires at the end of
a training iteration and inside the algorithm's process.
a training iteration and inside the algorithm's process:

- RLlib loops through the list of all given :py:class:`~ray.rllib.callbacks.callbacks.RLlibCallback`
subclasses and calls their ``on_train_result`` method. Thereby, it keeps the exact order the user
provided in the list.
- RLlib then loops through the list of all defined ``on_train_result`` callables. The user configured these
- RLlib then loops through the list of all defined ``on_train_result`` callables. You configured these
by calling the :py:meth:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig.callbacks` method
and defining the ``on_train_result`` argument in this call.

Expand Down Expand Up @@ -224,7 +218,7 @@ a training iteration and inside the algorithm's process.
Examples
--------

Here are two examples showing you how to setup custom callbacks on the
The following are two examples showing you how to setup custom callbacks on the
:ref:`Algorithm <rllib-key-concepts-algorithms>` process as well as on the
:ref:`EnvRunner <rllib-key-concepts-env-runners>` processes.

Expand All @@ -236,7 +230,7 @@ The following example demonstrates how to implement a simple custom function wri
contents to disk from time to time.

You normally don't want to write the contents of buffers along with your
:ref:`Algorithm checkpoints <rllib-checkpoints-docs>`, so doing this less often, in a more
:ref:`Algorithm checkpoints <rllib-checkpoints-docs>`, so writing less often, in a more
controlled fashion through a custom callback could be a good compromise.

.. testcode::
Expand Down Expand Up @@ -269,41 +263,39 @@ controlled fashion through a custom callback could be a good compromise.
)
dqn = config.build()

# Train n times. Expect buffer to be written every ith iteration.
# Train n times. Expect RLlib to write buffer every ith iteration.
for _ in range(4):
print(dqn.train())

.. tip::
See :ref:`here for the exact call signatures and expected argument types <rllib-callback-reference-algorithm-bound>`
of all available callbacks.

See :ref:`Callbacks invoked in Algorithm <rllib-callback-reference-algorithm-bound>` for the exact call signatures of all available callbacks and the argument types that they expect.

Example 2: `on_episode_step` and `on_episode_end`
-------------------------------------------------

The following example demonstrates how to implement a custom :py:class:`~ray.rllib.callbacks.callbacks.RLlibCallback` class
computing the average "first-joint angle" of the
`Acrobot-v1 RL environment <https://github.com/Farama-Foundation/Gymnasium/blob/main/gymnasium/envs/classic_control/acrobot.py>`__.
`Acrobot-v1 RL environment <https://github.com/Farama-Foundation/Gymnasium/blob/main/gymnasium/envs/classic_control/acrobot.py>`__:

.. figure:: images/acrobot-v1.png
:width: 150
:align: left

**The Acrobot-v1 environment**: The env's code described the angle you are about to
**The Acrobot-v1 environment**: The env code describes the angle you are about to
compute and log through your custom callback as:

.. code-block:: text
`theta1` is the angle of the first joint, where an angle of 0.0 indicates that the first
link is pointing directly downwards.
The example utilizes RLlib's :py:class:`~ray.rllib.utils.metrics.metrics_logger.MetricsLogger`
API to log the custom computations happening in the injected code your Algorithm's main results system.
This example utilizes RLlib's :py:class:`~ray.rllib.utils.metrics.metrics_logger.MetricsLogger`
API to log the custom computations of the injected code of the Algorithm's main results system.

.. todo: uncomment this once metrics-logger.rst page is online.
Read :ref:`more about the MetricsLogger API here <rllib-metric-logger-doc>`__ or also ...
Also take a look at this more complex example on `how to generate and log a PacMan heatmap (image) to WandB <https://github.com/ray-project/ray/blob/master/rllib/examples/metrics/custom_metrics_in_env_runners.py>`__.
See this more complex example that `generates and logs a PacMan heatmap (image) to WandB <https://github.com/ray-project/ray/blob/master/rllib/examples/metrics/custom_metrics_in_env_runners.py>`__.

.. testcode::

Expand All @@ -316,7 +308,7 @@ Also take a look at this more complex example on `how to generate and log a PacM
def on_episode_step(self, *, episode, env, **kwargs):
# First get the angle from the env (note that `env` is a VectorEnv).
# See https://github.com/Farama-Foundation/Gymnasium/blob/main/gymnasium/envs/classic_control/acrobot.py
# for the env's source code.
# for the env source code.
cos_theta1, sin_theta1 = env.envs[0].unwrapped.state[0], env.envs[0].unwrapped.state[1]
# Convert cos/sin/tan into degree.
deg_theta1 = math.degrees(math.atan2(sin_theta1, cos_theta1))
Expand All @@ -342,7 +334,7 @@ Also take a look at this more complex example on `how to generate and log a PacM
)
ppo = config.build()

# Train n times. Expect `theta1_mean` to be found in the results under:
# Train n times. Expect to find `theta1_mean` in the results under:
# `env_runners/theta1_mean`
for i in range(10):
results = ppo.train()
Expand All @@ -360,6 +352,6 @@ Also take a look at this more complex example on `how to generate and log a PacM
evaluation ``EnvRunner`` actors and False on ``EnvRunner`` actors used to collect
training data.

.. tip::
See :ref:`here for the exact call signatures and expected argument types <rllib-callback-reference-algorithm-bound>`
of all available callbacks.
See :ref:`Callbacks invoked in Algorithm <rllib-callback-reference-algorithm-bound>`
for the exact call signatures of all available callbacks and the argument types they expect.

Loading

0 comments on commit a3b1265

Please sign in to comment.