This repository contains implementations specifically Single-step Semi-Markov Decision Process (SMDP) Q-learning and intra-option Q-learning, applied to the Taxi-v3 environment from the OpenAI Gymnasium library.
Make sure you have the following libraries installed:
numpy
matplotlib
gymnasium
You can install them using pip:
pip install numpy matplotlib gymnasium
The Taxi-v3 environment is a grid-based simulation where a taxi must navigate to pick up and drop off a passenger at designated locations. The environment consists of a 5x5 grid with 500 discrete states, where:
- The taxi can be in any of the 25 positions.
- The passenger can be at one of 5 locations (including in the taxi).
- There are 4 possible drop-off destinations.
- Passenger Locations:
- 0: Red
- 1: Green
- 2: Yellow
- 3: Blue
- 4: In taxi
- Destinations:
- 0: Red
- 1: Green
- 2: Yellow
- 3: Blue
- -1 per step unless a different reward is triggered.
- +20 for successfully delivering a passenger.
- -10 for illegal "pickup" and "drop-off" actions.
The discount factor (γ) is set to 0.9.
-
Primitive Actions:
- 0: Move South
- 1: Move North
- 2: Move East
- 3: Move West
- 4: Pick passenger up
- 5: Drop passenger off
-
Options: Options are available to move the taxi to each of the four designated locations when the taxi is not already there.
- Implement Single-step SMDP Q-learning for the taxi problem.
- Implement Intra-option Q-learning for the same environment.
- For both algorithms, plot reward curves and visualize the learned Q-values.
- Provide a written description of the learned policies and reasoning.
- Explore alternate sets of mutually exclusive options and compare the performance with the original options.