Transfer Exploration for Reinforcement Learning (TransEx)

Transfer Exploration for Reinforcement Learning is a repository of exploration approach families outlined by our taxonomy of exploration methods. The purpose of this library is to provide a means for comparing the performance of groups of exploreation methods transfer learning problems in reinforcement learning.

TransEx is based on the Reinforcement Learning Transfer Exploration (RLeXplore) library, which focuses is a set of implementations of intrinsic reward driven-exploration approaches in reinforcement learning using PyTorch, and is designed to be well compatible with Stable-Baselines3, providing more stable exploration benchmarks.

See Changelog and Implemented Algorithms;
Code test in progress! Welcome to contribute to this program!

Installation

Get the repository with git:

git clone https://github.com/balloch/rl-exploration-transfer.git

Run the following command to get dependencies:

pip install -r requirements.txt

Usage Example

Due to the large differences in the calculation of different intrinsic reward methods, RLeXplore has the following rules:

In RLeXplore, the environments are assumed to be vectorized;
The compute_irs function of each intrinsic reward module has a mandatory argument rollouts, which is a dict like:

+ observations (n_steps, n_envs, *obs_shape) <class 'numpy.ndarray'>
- actions (n_steps, n_envs, action_shape) <class 'numpy.ndarray'>
+ rewards (n_steps, n_envs, 1) <class 'numpy.ndarray'>

Take RE3 for instance, it computes the intrinsic reward for each state based on the Euclidean distance between the state and its $k$-nearest neighbor within a mini-batch. Thus it suffices to provide observations data to compute the reward. The following code provides a usage example of RE3:

import torch
import numpy as np
from rlexplore.re3 import RE3

if __name__ == '__main__':
    ''' env setup '''
    device = torch.device('cuda:0')
    obs_shape = (4, 84, 84)
    action_shape = 1 # for discrete action space
    n_envs = 16 
    n_steps = 256 
    observations = np.random.randn(
       n_steps, n_envs, *obs_shape).astype('float32') # collected experiences 

    ''' create RE3 instance '''
    re3 = RE3(obs_shape=obs_shape, action_shape=action_shape, device=device,
              latent_dim=128, beta=0.05, kappa=0.00001)

    ''' compute intrinsic rewards '''
    intrinsic_rewards = re3.compute_irs(rollouts={'observations': observations},
        time_steps=25600, k=3, average_entropy=False)

    print(intrinsic_rewards.shape, type(intrinsic_rewards))
    print(intrinsic_rewards)

# Output: (256, 16, 1) <class 'numpy.ndarray'>

Train with Stable-Baselines3 on PyBullet games:

python examples/ppo_re3_bullet.py --action-space cont --env-id AntBulletEnv-v0 --algo ppo --n-envs 10 --exploration re3 --total-time-steps 2000000 --n-steps 128

Implemented Algorithms

Algorithm	Remark	Year	Paper	Code
ICM	Count-based exploration	2017	Curiosity-Driven Exploration by Self-Supervised Prediction	Link
RND	Curiosity-driven exploration	2019	Exploration by Random Network Distillation	Link
GIRM	Curiosity-driven exploration	2020	Intrinsic Reward Driven Imitation Learning via Generative Model	Link
NGU	Memory-based exploration	2020	Never Give Up: Learning Directed Exploration Strategies	Link
RIDE	Procedurally-generated environment	2020	RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments	Link
RE3	Shannon Entropy Maximization	2021	State Entropy Maximization with Random Encoders for Efficient Exploration	Link
RISE	Rényi Entropy Maximization	2022	Rényi State Entropy Maximization for Exploration Acceleration in Reinforcement Learning	Link
REVD	Rényi Divergence Maximization	2022	Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning	Link

Changelog

28/12/2022

Update RE3, RISE, RND, RIDE.
Add a new method entitled REVD.

04/12/2022

Update RND and RIDE.

03/12/2022

We start to reconstruct the project to make it compatible with arbitrary tasks;
Update RE3 and RISE.

27/09/2022

Update the RISE;
Introduce JAX in RISE. See experimental folder.

26/09/2022

Update the RE3;
Try to introduce JAX to accelerate computation. See experimental folder.

Acknowledgments

Some source codes of RLeXplore are built based on the following repositories:

Name		Name	Last commit message	Last commit date
Latest commit History 184 Commits
.idea		.idea
configs		configs
docs		docs
examples		examples
experiments		experiments
rlexplore		rlexplore
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transfer Exploration for Reinforcement Learning (TransEx)

Installation

Usage Example

Implemented Algorithms

Changelog

Acknowledgments

About

Releases

Packages

Languages

License

balloch/rl-exploration-transfer

Folders and files

Latest commit

History

Repository files navigation

Transfer Exploration for Reinforcement Learning (TransEx)

Installation

Usage Example

Implemented Algorithms

Changelog

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages