Deep Learning and AlphaZero
AlphaZero is a deep learning tool developed by DeepMind that specialises in learning how to play two-player board games solely using self-play (reinforcement learning). Remarkably, after less than a day of training it was able to achieve superhuman results in Chess, Go and Shogi. Further details are available in this technical report, this Nature paper, and these blogs give a layperson’s view on AlphaZero’s success in Chess: AlphaZero Playing Chess and AlphaZero’s Brain.
The goal of this project is to recreate AlphaZero, and experiment further with it on a subset of Chess. This blog is an excellent starting point on what you should do:
https://web.stanford.edu/~surag/posts/alphazero.html
Essentially AlphaZero uses a neural network to assess the board position and suggest moves that are likely to be good, combined with a more traditional tree search (Monte Carlo Tree Search(MCTS)) to analyse concrete sequences of moves.
Of course you won’t have access to the hardware AlphaZero used, and will have to make do with a single GPU for training. However AlphaZero running on a single GPU should be able to learn a simpler form of chess, e.g. positions with only Rooks and Pawns (“Rook+Pawn endings”) on the board.
The exact goals of the project will emerge as it proceeds, but suggested ideas are:
* Implement AlphaZero and train it on Rook+Pawn endings.
* Compare the strength of the complete AlphaZero approach with (1) MCTS on its own and (2) the neural network on its own.
* A human learning Rook+Pawn endings typically does so by learning a series of concepts. Analyse the games AlphaZero plays to detect at what point in its training it learns these concepts and how its learning compare to that of a human player.
* Train it on e.g. Queen+Pawn endings, then train it on Queen+Pawn endings while trying to reuse relevant parts of the Rook+Pawn network, and compare the results.
This project would suit a student who wants to learn more about convolutional neural networks and deep learning, and is very motivated to do so. A lot of time will be spent studying and understanding the field. The software development involved is actually quite straightforward, once the concepts have been well understood.