You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In g2048.hc_step(), place_tile_at_random_cell() is called beforeupdate_stats(), so it uses a stale empty_count from the previous step. When a merge creates new empty cells, rand() % old_empty_count can never select them — cells that appear later in row-major scan order get 0% placement probability.
// c_step(), line 370-376if (did_move) {
game->moves_made++;
place_tile_at_random_cell(game, get_new_tile()); // ← stale empty_countgame->score+=score_add;
update_stats(game); // ← updated too late
}
Why it matters
In standard 2048, new tiles are placed uniformly across all empty cells. This bug breaks that uniformity. The effect is worst in endgame (1–3 empty cells), where:
2 empty → merge → 4 empty: two cells get 0% probability
The agent learns to exploit this predictable placement, not to play 2048.
Fix
if (did_move) {
game->moves_made++;
update_stats(game); // ← move hereplace_tile_at_random_cell(game, get_new_tile());
game->score+=score_add;
update_stats(game); // ← update again after placement
}
Reproduction (100 episodes each, same checkpoint)
Bug version
Fixed version
≥ 2048
100%
0%
≥ 32768
84%
0%
≥ 65536
33%
0%
Max tile
65536
1024
Avg moves
32,702
502
Both columns are from my own runs with 100 random seeds. The model doesn't degrade gracefully — it completely collapses, meaning the learned policy depends entirely on the biased tile distribution.
Bug
In
g2048.hc_step(),place_tile_at_random_cell()is called beforeupdate_stats(), so it uses a staleempty_countfrom the previous step. When a merge creates new empty cells,rand() % old_empty_countcan never select them — cells that appear later in row-major scan order get 0% placement probability.Why it matters
In standard 2048, new tiles are placed uniformly across all empty cells. This bug breaks that uniformity. The effect is worst in endgame (1–3 empty cells), where:
rand() % 1 = 0, placement becomes 100% deterministicThe agent learns to exploit this predictable placement, not to play 2048.
Fix
Reproduction (100 episodes each, same checkpoint)
Both columns are from my own runs with 100 random seeds. The model doesn't degrade gracefully — it completely collapses, meaning the learned policy depends entirely on the biased tile distribution.
Impact
4.0branch on Jan 24, 2026cc @jsuarez5341