Vizdoom: https://github.com/Marqt/ViZDoom.
I was not able to make ppaquette_gym_doom work on my machine. Thus, in the interests of time I implemented lightweight gym-like interface for Vizdoom (which is slightly different from OpenAI Gym's interface).
A2C (Advantage Actor - Critic) + LSTM.
Main point of reference: https://arxiv.org/pdf/1609.05521v1.pdf.
- Implement unsupervised auxilary tasks to mitigate the reward sparseness in the environment (https://arxiv.org/abs/1611.05397).
- Apply generalized advantage estimation.
- Try out TRPO and Natural Gradient techniques.
- Learning from demonstration for pre-training.