Explore how MCTS builds its search tree and makes decisions in real-time
This is a simple two-player game where players take turns adding either 1 or 2 to a running sum.
Starting from the root, select child nodes using the UCT formula until reaching a leaf node or a node with unexpanded children. The UCT formula balances:
If the selected node has unvisited children, create one new child node for a random unvisited move. This gradually builds out the tree in promising directions.
From the new node, play out a random game to completion (called a rollout). This gives a quick estimate of the position's value without needing to build the full tree.
Update statistics for all nodes in the path from the simulated node back to the root:
The exploitation term \(\frac{w_i}{n_i}\) favors moves that have worked well in the past. Higher win rates lead to higher values.
The exploration term \(c\sqrt{\frac{\ln N}{n_i}}\) ensures all moves are tried occasionally. Increases for rarely-visited nodes and decreases with more visits.