Monte Carlo Tree Search

Explore how MCTS builds its search tree and makes decisions in real-time

Game Rules

This is a simple two-player game where players take turns adding either 1 or 2 to a running sum.

  • Starting sum is 0
  • Players alternate turns (Player A starts)
  • Each turn, a player must add either 1 or 2 to the sum
  • The goal is to make the sum exactly 10
  • If a player makes the sum exceed 10 (e.g., 11), they lose immediately
  • The player who makes the sum exactly 10 wins

Game State

Sum: 0
Turn: Player A
Status: In Progress

Current Phase

Selection
Traversing the tree...

Statistics

Node Details

Click a node to see details

Controls

50ms
1.414
Exploitation Exploration

Visualization Guide

Node Colors

High Win Rate
Neutral
Low Win Rate

Node Information

  • Number inside node: Current sum
  • Text below node: Win rate % (Total visits)
  • Highlighted path: Current selection
  • Dashed green line: Simulation path

How MCTS Works

1. Selection

Starting from the root, select child nodes using the UCT formula until reaching a leaf node or a node with unexpanded children. The UCT formula balances:

  • Exploitation: Prefer nodes with high win rates
  • Exploration: Give chance to less-visited nodes

2. Expansion

If the selected node has unvisited children, create one new child node for a random unvisited move. This gradually builds out the tree in promising directions.

3. Simulation

From the new node, play out a random game to completion (called a rollout). This gives a quick estimate of the position's value without needing to build the full tree.

4. Backpropagation

Update statistics for all nodes in the path from the simulated node back to the root:

  • Increment visit count
  • Update win count based on simulation result

The UCT Formula

\[UCT = \underbrace{\frac{w_i}{n_i}}_{\text{exploitation}} + \underbrace{c\sqrt{\frac{\ln N}{n_i}}}_{\text{exploration}}\]
  • \(w_i\): wins for node i
  • \(n_i\): visits to node i
  • \(N\): parent visits
  • \(c\): exploration constant

Exploitation Term

The exploitation term \(\frac{w_i}{n_i}\) favors moves that have worked well in the past. Higher win rates lead to higher values.

Exploration Term

The exploration term \(c\sqrt{\frac{\ln N}{n_i}}\) ensures all moves are tried occasionally. Increases for rarely-visited nodes and decreases with more visits.