RLkit-PMOE

Reinforcement learning framework and algorithms implemented in PyTorch.

In addition, this repo includes the official implementation of Probabilistic Mixture-of-Experts (PMOE) based on RLkit. Package dependency is the same as the origin RLkit. The interface keeps consistency to RLkit to enjoy the goodness of RLkit. Thanks to all the contributors of RLkit.

Implemented algorithms:

Probabilistic Mixture-of-Experts (PMOE)
- paper
- example script
Skew-Fit
- example script
- paper
- Documentation
- Requires multiworld to be installed
Reinforcement Learning with Imagined Goals (RIG)
- Special case of Skew-Fit: set power = 0
- paper
Temporal Difference Models (TDMs)
- Only implemented in v0.1.2-. See Legacy Documentation section below.
- paper
- Documentation
Hindsight Experience Replay (HER)
(Double) Deep Q-Network (DQN)
Soft Actor Critic (SAC)
- example script
- original paper and updated version
- TensorFlow implementation from author
- Includes the "min of Q" method, the entropy-constrained implementation, reparameterization trick, and numerical tanh-Normal Jacbian calcuation.
Twin Delayed Deep Determinstic Policy Gradient (TD3)
- example script
- paper

To get started, checkout the example scripts, linked above.

Installation

Install and use the included Ananconda environment

$ conda env create -f environment/[linux-cpu|linux-gpu|mac]-env.yml
$ source activate rlkit
(rlkit) $ python examples/PMOEsac.py

Choose the appropriate .yml file for your system. These Anaconda environments use MuJoCo 1.5 and gym 0.10.5. You'll need to get your own MuJoCo key if you want to use MuJoCo.

(Not Recommended) Add this repo directory to your PYTHONPATH environment variable or simply run:

pip install -e .

Replace all the rlkit import in your code to rlkit_pmoe.

Notably, we recommend you to copy the ./rlkit directory into your working directory for the reason of making the minimal modification to your code.

(Optional) Copy conf.py to conf_private.py and edit to override defaults:

cp rlkit/launchers/conf.py rlkit/launchers/conf_private.py

(Optional) If you plan on running the Skew-Fit experiments or the HER example with the Sawyer environment, then you need to install multiworld.

DISCLAIMER: the mac environment has only been tested without a GPU.

For an even more portable solution, try using the docker image provided in environment/docker. The Anaconda env should be enough, but this docker image addresses some of the rendering issues that may arise when using MuJoCo 1.5 and GPUs. The docker image supports GPU, but it should work without a GPU. To use a GPU with the image, you need to have nvidia-docker installed.

Using a GPU

You can use a GPU by calling

import rlkit.torch.pytorch_util as ptu
ptu.set_gpu_mode(True)

before launching the scripts.

If you are using doodad (see below), simply use the use_gpu flag:

run_experiment(..., use_gpu=True)

Visualizing a policy and seeing results

During training, the results will be saved to a file called under

LOCAL_LOG_DIR/<exp_prefix>/<foldername>

LOCAL_LOG_DIR is the directory set by rlkit.launchers.config.LOCAL_LOG_DIR. Default name is 'output'.
<exp_prefix> is given either to setup_logger.
<foldername> is auto-generated and based off of exp_prefix.
inside this folder, you should see a file called params.pkl. To visualize a policy, run

(rlkit) $ python scripts/run_policy.py LOCAL_LOG_DIR/<exp_prefix>/<foldername>/params.pkl

or

(rlkit) $ python scripts/run_goal_conditioned_policy.py LOCAL_LOG_DIR/<exp_prefix>/<foldername>/params.pkl

depending on whether or not the policy is goal-conditioned.

If you have rllab installed, you can also visualize the results using rllab's viskit, described at the bottom of this page.

Run

python rllab/viskit/frontend.py LOCAL_LOG_DIR/<exp_prefix>/

to visualize all experiments with a prefix of exp_prefix. To only visualize a single run, you can do

python rllab/viskit/frontend.py LOCAL_LOG_DIR/<exp_prefix>/<folder name>

Alternatively, if you don't want to clone all of rllab, a repository containing only viskit can be found here. You can similarly visualize results with.

python viskit/viskit/frontend.py LOCAL_LOG_DIR/<exp_prefix>/

This viskit repo also has a few extra nice features, like plotting multiple Y-axis values at once, figure-splitting on multiple keys, and being able to filter hyperparametrs out.

Visualizing a goal-conditioned policy

To visualize a goal-conditioned policy, run

(rlkit) $ python scripts/run_goal_conditioned_policy.py
LOCAL_LOG_DIR/<exp_prefix>/<foldername>/params.pkl

Launching jobs with `doodad`

The run_experiment function makes it easy to run Python code on Amazon Web Services (AWS) or Google Cloud Platform (GCP) by using this fork of doodad.

It's as easy as:

from rlkit.launchers.launcher_util import run_experiment

def function_to_run(variant):
    learning_rate = variant['learning_rate']
    ...

run_experiment(
    function_to_run,
    exp_prefix="my-experiment-name",
    mode='ec2',  # or 'gcp'
    variant={'learning_rate': 1e-3},
)

You will need to set up parameters in config.py (see step one of Installation). This requires some knowledge of AWS and/or GCP, which is beyond the scope of this README. To learn more, more about doodad, go to the repository, which is based on this original repository.

References

The algorithms are based on the following papers

Probabilistic Mixture-of-Experts for Efficient Deep Reinforcement Learning.
Jie Ren*, Yewen Li*, Zihan Ding, Wei Pan, Hao Dong. arXiv preprint, 2021.

Skew-Fit: State-Covering Self-Supervised Reinforcement Learning.
Vitchyr H. Pong*, Murtaza Dalal*, Steven Lin*, Ashvin Nair, Shikhar Bahl, Sergey Levine. arXiv preprint, 2019.

Visual Reinforcement Learning with Imagined Goals.
Ashvin Nair*, Vitchyr Pong*, Murtaza Dalal, Shikhar Bahl, Steven Lin, Sergey Levine. NeurIPS 2018.

Temporal Difference Models: Model-Free Deep RL for Model-Based Control.
Vitchyr Pong*, Shixiang Gu*, Murtaza Dalal, Sergey Levine. ICLR 2018.

Hindsight Experience Replay.
Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, Wojciech Zaremba. NeurIPS 2017.

Deep Reinforcement Learning with Double Q-learning.
Hado van Hasselt, Arthur Guez, David Silver. AAAI 2016.

Human-level control through deep reinforcement learning.
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis. Nature 2015.

Soft Actor-Critic Algorithms and Applications.
Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, Sergey Levine. arXiv preprint, 2018.

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor.
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. ICML, 2018.

Addressing Function Approximation Error in Actor-Critic Methods Scott Fujimoto, Herke van Hoof, David Meger. ICML, 2018.

Credits

A lot of the coding infrastructure is based on rllab. The serialization and logger code are basically a carbon copy of the rllab versions.

The Dockerfile is based on the OpenAI mujoco-py Dockerfile.

Other major collaborators and contributions:

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
docs		docs
environment		environment
examples		examples
rlkit		rlkit
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RLkit-PMOE

Installation

Using a GPU

Visualizing a policy and seeing results

Visualizing a goal-conditioned policy

Launching jobs with `doodad`

References

Credits

About

Releases

Packages

Languages

License

JieRen98/rlkit-pmoe

Folders and files

Latest commit

History

Repository files navigation

RLkit-PMOE

Installation

Using a GPU

Visualizing a policy and seeing results

Visualizing a goal-conditioned policy

Launching jobs with doodad

References

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Launching jobs with `doodad`

Packages