DRL university course lecture notes & exercises
Chapter | Sections recap |
---|---|
Hello world | Basic terminology and definitions (based on spinning up RL, by openAI) |
RL Basics | MDPs, Polciy/Value-Iteration, MC, SARSA & Q-Learning |
DQN & it's derivatives | Deep Q-Network (DQN), Double DQN, Dueling-DQN |
Policy Gradients | REINFORCE, REINFORCE with Baseline, Actor-Critic methods |
Imitation Learning | Apprenticeship, Supervised and forward learning. Dagger, Dagger with coaching |
Multi-Armed Bandit | Bandit algorithm, Gradient based algorithm, contextual bandits, Thompson sampling |
RL use-case: AlphaGo | Monte Carlo Tree Search, AlphaGo, AlphaZero |
Meta and Transfer Learning | Concepts in Meta learning and Transfer learning in the context of RL |
Large action spaces | Examining some papers discussing handling with large action spaces |
Advanced model learning & exploration | Learning in latent space, next states predictions, exploration schemes |
Exercise | Description |
---|---|
ex1 | Q-Learning and Deep-Q-Learning (DQN) implementations from scratch |
ex2 | REINFORCE (with and without baseline) and Monte Carlo Actor-Critic implementations from scratch |