A Pragmatic Look at Deep Imitation Learning

Imitation learning algorithms (with PPO [1]):

AIRL [2]
BC [3]
DRIL [4] (without BC)
FAIRL [5]
GAIL [6]
GMMIL [7] (including an optional self-similarity term [8])
nn-PUGAIL [9]
RED [10]

Options include:

State-only imitation learning: state-only: true/false
R1 gradient regularisation [11]: r1-reg-coeff: 0.5

Requirements

Requirements can be installed with:

pip install -r requirements.txt

Notable required packages are PyTorch, OpenAI Gym, D4RL-PyBullet and Hydra. Ax and the Hydra Ax sweeper plugin are required for hyperparameter optimisation; if unneeded they can be removed from requirements.txt.

Run

The training of each imitation learning algorithm can be started with:

python main.py algorithm=ALG/ENV

python main.py algorithm=AIRL/hopper

Hyperparameters can be found in conf/config.yaml and conf/algorithm/ALG/ENV.yaml, with the latter containing algorithm- and environment-specific hyperparameters that were tuned with Ax.

Results will be saved in outputs/ENV_ALGO/m-d_H-M-S with the last subfolder indicating the current datetime.

Hyperparameter optimisation

Hyperparameter optimisation can be run by adding -m hydra/sweeper=ax hyperparam_opt=ALG, for example:

python main.py -m algorithm=AIRL/hopper hydra/sweeper=ax hyperparam_opt=AIRL

hyperparam_opt specifies the hyperparameter search space.

Seed sweep

A seed sweep can be performed as follows:

python main.py -m algorithm=AIRL/hopper seed=1,2,3,4,5

or via the existing bash script:

./scripts/run_seed_experiments.sh ALG ENV

The results will be available in ./output/seed_sweeper_ENV_ALG folder (note that running this code twice will overwrite the previous results).

Results

Acknowledgements

@ikostrikov for https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail

Citation

If you find this work useful and would like to cite it, the following would be appropriate:

@article{arulkumaran2021pragmatic,
  author = {Arulkumaran, Kai and Ogawa Lillrank, Dan},
  title = {A Pragmatic Look at Deep Imitation Learning},
  journal={arXiv preprint arXiv:2108.01867},
  year = {2021}
}

References

[1] Proximal Policy Optimization Algorithms
[2] Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
[3] Efficient Training of Artificial Neural Networks for Autonomous Navigation
[4] Disagreement-Regularized Imitation Learning
[5] A Divergence Minimization Perspective on Imitation Learning Methods
[6] Generative Adversarial Imitation Learning
[7] Imitation Learning via Kernel Mean Embedding
[8] A Pragmatic Look at Deep Imitation Learning
[9] Positive-Unlabeled Reward Learning
[10] Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation
[11] Which Training Methods for GANs do actually Converge?

Name		Name	Last commit message	Last commit date
Latest commit History 224 Commits
conf		conf
figures		figures
scripts		scripts
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
environments.py		environments.py
evaluation.py		evaluation.py
main.py		main.py
models.py		models.py
requirements.txt		requirements.txt
training.py		training.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Pragmatic Look at Deep Imitation Learning

Requirements

Run

Hyperparameter optimisation

Seed sweep

Results

Acknowledgements

Citation

References

About

Releases

Packages

Languages

License

wx-b/imitation-learning

Folders and files

Latest commit

History

Repository files navigation

A Pragmatic Look at Deep Imitation Learning

Requirements

Run

Hyperparameter optimisation

Seed sweep

Results

Acknowledgements

Citation

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages