Imitation learning algorithms (with PPO [1]):
- AIRL [2]
- BC [3]
- DRIL [4] (without BC)
- FAIRL [5]
- GAIL [6]
- GMMIL [7] (including an optional self-similarity term [8])
- nn-PUGAIL [9]
- RED [10]
Options include:
- State-only imitation learning:
state-only: true/false
- R1 gradient regularisation [11]:
r1-reg-coeff: 0.5
Requirements can be installed with:
pip install -r requirements.txt
Notable required packages are PyTorch, OpenAI Gym, D4RL-PyBullet and Hydra. Ax and the Hydra Ax sweeper plugin are required for hyperparameter optimisation; if unneeded they can be removed from requirements.txt
.
The training of each imitation learning algorithm can be started with:
python main.py algorithm=ALG/ENV
where ALG
is one of [AIRL|BC|DRIL|FAIRL|GAIL|GMMIL|PUGAIL|RED]
and ENV
is one of [ant|halfcheetah|hopper|walker2d]
. For example:
python main.py algorithm=AIRL/hopper
Hyperparameters can be found in conf/config.yaml
and conf/algorithm/ALG/ENV.yaml
, with the latter containing algorithm- and environment-specific hyperparameters that were tuned with Ax.
Results will be saved in outputs/ENV_ALGO/m-d_H-M-S
with the last subfolder indicating the current datetime.
Hyperparameter optimisation can be run by adding -m hydra/sweeper=ax hyperparam_opt=ALG
, for example:
python main.py -m algorithm=AIRL/hopper hydra/sweeper=ax hyperparam_opt=AIRL
hyperparam_opt
specifies the hyperparameter search space.
A seed sweep can be performed as follows:
python main.py -m algorithm=AIRL/hopper seed=1,2,3,4,5
or via the existing bash script:
./scripts/run_seed_experiments.sh ALG ENV
The results will be available in ./output/seed_sweeper_ENV_ALG
folder (note that running this code twice will overwrite the previous results).
If you find this work useful and would like to cite it, the following would be appropriate:
@article{arulkumaran2021pragmatic,
author = {Arulkumaran, Kai and Ogawa Lillrank, Dan},
title = {A Pragmatic Look at Deep Imitation Learning},
journal={arXiv preprint arXiv:2108.01867},
year = {2021}
}
[1] Proximal Policy Optimization Algorithms
[2] Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
[3] Efficient Training of Artificial Neural Networks for Autonomous Navigation
[4] Disagreement-Regularized Imitation Learning
[5] A Divergence Minimization Perspective on Imitation Learning Methods
[6] Generative Adversarial Imitation Learning
[7] Imitation Learning via Kernel Mean Embedding
[8] A Pragmatic Look at Deep Imitation Learning
[9] Positive-Unlabeled Reward Learning
[10] Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation
[11] Which Training Methods for GANs do actually Converge?