Monte Carlo Tree Search

Explore how MCTS builds its search tree and makes decisions in real-time

Game Rules

This is a simple two-player game where players take turns adding either 1 or 2 to a running sum.

Starting sum is 0
Players alternate turns (Player A starts)
Each turn, a player must add either 1 or 2 to the sum
The goal is to make the sum exactly 10
If a player makes the sum exceed 10 (e.g., 11), they lose immediately
The player who makes the sum exactly 10 wins

Game State

Sum: 0

Turn: Player A

Status: In Progress

Current Phase

Selection

Traversing the tree...

Statistics

Node Details

Click a node to see details

Controls

Simulation Speed

50ms

Exploration Constant (c)

1.414

Exploitation Exploration

Visualization Guide

Node Colors

High Win Rate

Neutral

Low Win Rate

Node Information

Number inside node: Current sum
Text below node: Win rate % (Total visits)
Highlighted path: Current selection
Dashed green line: Simulation path

How MCTS Works

1. Selection

Starting from the root, select child nodes using the UCT formula until reaching a leaf node or a node with unexpanded children. The UCT formula balances:

Exploitation: Prefer nodes with high win rates
Exploration: Give chance to less-visited nodes

2. Expansion

If the selected node has unvisited children, create one new child node for a random unvisited move. This gradually builds out the tree in promising directions.

3. Simulation

From the new node, play out a random game to completion (called a rollout). This gives a quick estimate of the position's value without needing to build the full tree.

4. Backpropagation

Update statistics for all nodes in the path from the simulated node back to the root:

Increment visit count
Update win count based on simulation result

The UCT Formula

\[UCT = \underbrace{\frac{w_i}{n_i}}_{\text{exploitation}} + \underbrace{c\sqrt{\frac{\ln N}{n_i}}}_{\text{exploration}}\]

\(w_i\): wins for node i
\(n_i\): visits to node i
\(N\): parent visits
\(c\): exploration constant

Exploitation Term

The exploitation term \(\frac{w_i}{n_i}\) favors moves that have worked well in the past. Higher win rates lead to higher values.

Exploration Term

The exploration term \(c\sqrt{\frac{\ln N}{n_i}}\) ensures all moves are tried occasionally. Increases for rarely-visited nodes and decreases with more visits.