This project implements a Reinforcement Learning (RL) agent capable of playing Checkers using Adversarial Neural Networks (GANs). The agent learns optimal strategies through iterative training, leveraging deep learning techniques to approximate state-action values.
Reinforcement Learning (RL) is a machine learning paradigm that enables autonomous agents to make decisions, learn optimal policies, and maximize rewards through interaction with an environment. This project focuses on:
- Developing an RL agent to play 8×8 American Checkers.
- Using Generative Adversarial Networks (GANs) to approximate Q-values.
- Training the agent via iterative self-play to maximize its win rate.
- Agent – The checkers-playing AI.
- Actions – Moves made on the board.
- Environment – The game of Checkers.
- Rewards – Feedback from winning, losing, or achieving favorable board positions.
- The game is modeled using an 8×8 matrix where each cell represents a piece:
- Men (regular pieces): Can move diagonally forward.
- Kings: Can move diagonally in both directions.
- Captured pieces: Are removed from the board.
- A compressed 32×1 array is used to store only relevant board positions.
- States: Represent the board position at any given time.
- Actions: Possible moves available for a player.
- Rewards: Positive for capturing pieces and winning, negative for poor moves.
Traditional Q-learning requires a Q-table, which is impractical due to Checkers' vast state space. Instead, a neural network approximates Q-values to predict the best action for a given board state.
The agent updates Q-values iteratively using:
[ Q_{new}(s_t,a_t) = Q_{old}(s_t,a_t) + \alpha [ R(s_t) + \gamma \cdot \max_a Q(s_{t+1}, a) - Q_{old}(s_t,a_t)] ]
Where:
- ( \alpha ) = Learning rate
- ( \gamma ) = Discount factor
- ( Q(s,a) ) = Action-value function estimating future rewards.
- A Generative Model (
gen_model
) learns to generate board states. - A Discriminative Model (
board_model
) evaluates whether a board position is real or generated. - The adversarial training refines the ability of the RL agent to distinguish optimal moves.
A custom heuristic function evaluates board positions using:
- Number of captured pieces.
- Number of potential moves.
- Number of men vs. kings.
- Number of safe and vulnerable pieces.
- Positional advantage on the board.
- Initialize Agent with a neural network-based Q-function.
- Self-play to explore different board states.
- Update Q-values using the Bellman Equation.
- Adversarial Training refines the agent’s decision-making.
- Evaluate performance using win rate metrics.
- The RL agent achieved ~80% win rate after sufficient training.
- The agent learned strategic positioning and piece sacrifices to gain advantage.
- Initial training phases showed high variability, but performance stabilized after multiple generations.
- High training time (~1 hour for 100 generations with 100 games each).
- Hyperparameter tuning for learning rate, discount factor, and network depth.
- Exploring other RL techniques like AlphaZero-style self-play.
- Deep RL can effectively train an AI to play Checkers with high efficiency.
- GAN-based adversarial training enhances learning beyond traditional Q-learning.
- Reward shaping and state representation significantly impact training efficiency.
- Hyperparameter tuning is crucial to achieving stable performance.
- Python 3.8+
- TensorFlow / PyTorch
- NumPy, Matplotlib
- Jupyter Notebook (optional for visualization)
- Mnih et al., Deep Q-Networks
- Bellman Equation in RL
- Henning, RL-Checkers
- Goodfellow et al., Generative Adversarial Networks
This project is part of my Machine Learning Portfolio, showcasing expertise in Reinforcement Learning, Deep Learning, and Game AI.