Skip to content

Work in progress (unfinished) implementation of A3C

Notifications You must be signed in to change notification settings

bnurbekov/DeepRL-A3C-LSTM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dependencies

Vizdoom: https://github.com/Marqt/ViZDoom.

Environment

I was not able to make ppaquette_gym_doom work on my machine. Thus, in the interests of time I implemented lightweight gym-like interface for Vizdoom (which is slightly different from OpenAI Gym's interface).

Model

A2C (Advantage Actor - Critic) + LSTM.

Main point of reference: https://arxiv.org/pdf/1609.05521v1.pdf.

Further improvements

  1. Implement unsupervised auxilary tasks to mitigate the reward sparseness in the environment (https://arxiv.org/abs/1611.05397).
  2. Apply generalized advantage estimation.
  3. Try out TRPO and Natural Gradient techniques.
  4. Learning from demonstration for pre-training.

About

Work in progress (unfinished) implementation of A3C

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages