Skip to content

Experimental Analysis of Adversarial Reinforcement Learning for Checkers

Notifications You must be signed in to change notification settings

karkidilochan/AdversarialRL-Checkers

Repository files navigation

Reinforcement Learning Agent for Checkers Using Adversarial Neural Networks 🏆🧠

This project implements a Reinforcement Learning (RL) agent capable of playing Checkers using Adversarial Neural Networks (GANs). The agent learns optimal strategies through iterative training, leveraging deep learning techniques to approximate state-action values.


Project Overview 🎯

Reinforcement Learning (RL) is a machine learning paradigm that enables autonomous agents to make decisions, learn optimal policies, and maximize rewards through interaction with an environment. This project focuses on:

  • Developing an RL agent to play 8×8 American Checkers.
  • Using Generative Adversarial Networks (GANs) to approximate Q-values.
  • Training the agent via iterative self-play to maximize its win rate.

Core RL Components:

  1. Agent – The checkers-playing AI.
  2. Actions – Moves made on the board.
  3. Environment – The game of Checkers.
  4. Rewards – Feedback from winning, losing, or achieving favorable board positions.

Game Representation & Modeling 🎲

Checkers Board Representation

  • The game is modeled using an 8×8 matrix where each cell represents a piece:
    • Men (regular pieces): Can move diagonally forward.
    • Kings: Can move diagonally in both directions.
    • Captured pieces: Are removed from the board.
  • A compressed 32×1 array is used to store only relevant board positions.

State, Actions & Rewards in RL

  • States: Represent the board position at any given time.
  • Actions: Possible moves available for a player.
  • Rewards: Positive for capturing pieces and winning, negative for poor moves.

Deep Reinforcement Learning Approach 🤖

1. Neural Network-Based Q-Learning

Traditional Q-learning requires a Q-table, which is impractical due to Checkers' vast state space. Instead, a neural network approximates Q-values to predict the best action for a given board state.

2. Bellman Equation for Q-Function Update

The agent updates Q-values iteratively using:

[ Q_{new}(s_t,a_t) = Q_{old}(s_t,a_t) + \alpha [ R(s_t) + \gamma \cdot \max_a Q(s_{t+1}, a) - Q_{old}(s_t,a_t)] ]

Where:

  • ( \alpha ) = Learning rate
  • ( \gamma ) = Discount factor
  • ( Q(s,a) ) = Action-value function estimating future rewards.

3. Generative Adversarial Networks (GANs) for Training

  • A Generative Model (gen_model) learns to generate board states.
  • A Discriminative Model (board_model) evaluates whether a board position is real or generated.
  • The adversarial training refines the ability of the RL agent to distinguish optimal moves.

Implementation Details ⚙️

Metrics for Board Evaluation

A custom heuristic function evaluates board positions using:

  • Number of captured pieces.
  • Number of potential moves.
  • Number of men vs. kings.
  • Number of safe and vulnerable pieces.
  • Positional advantage on the board.

Training Pipeline 🏋️

  1. Initialize Agent with a neural network-based Q-function.
  2. Self-play to explore different board states.
  3. Update Q-values using the Bellman Equation.
  4. Adversarial Training refines the agent’s decision-making.
  5. Evaluate performance using win rate metrics.

Results & Observations 📊

Performance Metrics

  • The RL agent achieved ~80% win rate after sufficient training.
  • The agent learned strategic positioning and piece sacrifices to gain advantage.
  • Initial training phases showed high variability, but performance stabilized after multiple generations.

Challenges & Future Work 🔍

  • High training time (~1 hour for 100 generations with 100 games each).
  • Hyperparameter tuning for learning rate, discount factor, and network depth.
  • Exploring other RL techniques like AlphaZero-style self-play.

Key Takeaways 🌟

  • Deep RL can effectively train an AI to play Checkers with high efficiency.
  • GAN-based adversarial training enhances learning beyond traditional Q-learning.
  • Reward shaping and state representation significantly impact training efficiency.
  • Hyperparameter tuning is crucial to achieving stable performance.

Requirements 📋

  • Python 3.8+
  • TensorFlow / PyTorch
  • NumPy, Matplotlib
  • Jupyter Notebook (optional for visualization)

References 📚


Author ✍️

This project is part of my Machine Learning Portfolio, showcasing expertise in Reinforcement Learning, Deep Learning, and Game AI.


About

Experimental Analysis of Adversarial Reinforcement Learning for Checkers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published