Step through how search trees are expanded, simulated, and scored.
- Starting sum is 0.
- Players alternate turns, with Player A starting.
- Each turn, a player must add either 1 or 2 to the sum.
- The goal is to make the sum exactly 10.
- If a player makes the sum exceed 10, they lose immediately.
Click a node to see details.
Node Colors
Green indicates high win rate, blue is neutral, and red is low win rate.
Node Information
The number inside a node is the current sum. Text below shows win rate and total visits. Highlighted links show the current selection path.
Exploitation Term
The exploitation termfavors moves that have worked well in the past. Higher win rates lead to higher values.
Exploration Term
The exploration termensures all moves are tried occasionally. Increases for rarely-visited nodes and decreases with more visits.
1. Selection
Starting from the root, MCTS selects child nodes using UCT until reaching a leaf node or a node with unexpanded moves.
- Exploitation prefers nodes with high win rates.
- Exploration gives less-visited nodes a chance.
2. Expansion
If the selected node has unvisited moves, the tree creates one new child. This gradually builds the tree in promising directions.
3. Simulation
From the new node, a rollout plays the game to completion. This gives a fast estimate of the position value.
4. Backpropagation and UCT
Visits and wins are updated from the simulated node back to the root..