21 Dec 2018
I found the alpha-zero-general project very helpful to learn the ideas in the AlphaGo Zero paper. (Use the link at the bottom of the page.) I have some different ideas for the project structure and API, so I’ve decided to write a new library using the same ideas.
26 Dec 2018
I’ve got Tic Tac Toe implemented, because it’s nice and small for writing unit tests. So far, it’s only the game logic and a human player interface. I also added Connect 4 as a slightly deeper game that can have more interesting board positions than Tic Tac Toe.
29 Dec 2018
Basic Monte Carlo Tree Search is now working! There’s no neural network yet. I ended up just using a tree structure for the search nodes, so I don’t need all the hash tables that alpha-zero-general used.
I plotted the win rates of two MCTS players with different numbers of search iterations, and the pattern was as expected. The more searching, the higher the win rate. You can see that there’s a slight advantage for the first player, and player 1’s win rate is 64% when both players search with 128 simulations.
The gradient isn’t as smooth when both players use fewer than 10 simulations, but they’re basically making random moves at that point.