-
Reinforcement Learning - Developing Intelligent Agents
-
Deep Learning Course 5 of 6 - Level: Advanced
-
Ref: DeepLizard – Reinforcement Learning - Developing Intelligent Agents
We're going to be building and training a deep Q-network to learn to balance a pole on a moving cart. This is widely known as the cart and pole problem.
We'll be using OpenAI's Gym toolkit to set up our cart and pole environment.
Image Snapped from Deeplizard
Image Snapped from Packt – Hands-On Q-Learning with Python
-
Initialize replay memory capacity.
-
Initialize the policy network with random weights.
-
Clone the policy network, and call it the target network.
-
For each episode:
-
Initialize the starting state.
-
For each time step:
- Select an action.
- Via exploration or exploitation
-
Execute selected action in an emulator.
-
Observe reward and next state.
-
Store experience in replay memory.
-
Sample random batch from replay memory.
-
Preprocess states from batch.
-
Pass batch of preprocessed states to policy network.
-
Calculate loss between output Q-values and target Q-values.
- Requires a pass to the target network for the next state
- Gradient descent updates weights in the policy network to minimize loss.
- After
x
time steps, weights in the target network are updated to the weights in the policy network.