Skip to content
This repository has been archived by the owner on Jan 17, 2023. It is now read-only.

Latest commit

 

History

History
21 lines (15 loc) · 1.09 KB

Lab_3.md

File metadata and controls

21 lines (15 loc) · 1.09 KB

Lab 3

In this lab we will continue our work on the FrozenLake environment by introducing MC-Control and Q-learning.

Task 1:

In the last lab we calculated Q-values using MC prediction. We will now extend this code to implement MC control as we have seen it in the lecture:

  • Take the code from the file 3_FrozenLake_Control.py as starting point. This code uses a random policy and plots the collected rewards over time.
  • Integrate the code for calculating Q-values after every episode from the last lab.
  • Change the play_episode method such that it uses an epsilon-greedy policy based on the current Q-values.
  • Try out the following epsilons: [0.01, 0.1, 0.5, 1.0] and show all results for all epsilons together in one plot (i.e. every epsilon one curve in the plot).

Task 2:

Implement now Q-learning as comparison control strategy:

  • Redo task 1 using Q-Learning instead of MC control (use alpha=0.5).
  • As above, try out the different epsilons and compare them in one plot.