This repository contains implementations of various AI techniques for playing Tic-Tac-Toe (or Noughts and Crosses)
Recursive brute-force search of the solution space, but using a pruning method that ignores parts of the tree that cannot effect the outcome of a search.
Using a training set of (state, value) pairs provided by an oracle (currently set to 10% of the state space), we train a neural network (either MLP or ConvNet) that generalises to the rest of the state space.
AlphaZero / Lc0 style self-play reinforcement learning. Starting from a randomly initialised network, an agent plays itself over and over, updating both move predictions and state-value estimations.
- Iterated Distillation and Amplification
- AlphaZero Paper on ArXiv
- Dominik Klein's Neural Networks for Chess (from which our implementation was essentially cloned, with some stylistic changes)