Explore It Till You Solve It

An exploration-only solution for ARC-AGI-3. In the official evaluation of the ARC-AGI-3 Preview Challenge, it solved 12 out of 25 private levels, finishing 3rd.

After correcting the graph-resetting bug post-evaluation (commit aad8145), it solves a median of 17 private levels (one level below the 1st-place solution), with results ranging from 14 to 19 levels across five test runs.

A more detailed description is available in the workshop article, presented at the AAAI 2026 Workshop on AI for Scientific Research.

Quickstart

This repository was originally forked from the challenge repo. The setup mostly mirrors the original one. Alternatively, you can run the code in Google Colab.

Install uv if not already installed.

Clone this repository and enter the directory.

git clone https://github.com/dolphin-in-a-coma/arc-agi-3-just-explore.git
cd arc-agi-3-just-explore

Copy .env.example to .env.

cp .env.example .env

Get an API key from the ARC-AGI-3 Website and set it in your .env file.

export ARC_API_KEY="your_api_key_here"

Run the agent. The command below runs the swarm across all games unless a specific game is provided with --game.

uv run main.py --agent=heuristicagent

For more information, see the original documentation or the tutorial video.

The method

Motivation

The initial idea was to make LLMs interact with the environment more effectively by:

Providing a textual description of the environment.
Introducing meaningful click actions (e.g., click an object instead of raw coordinates).
Building a replay buffer for in-context reinforcement learning.

After experiments on simple levels (passing a winning path from a previous level and providing a list of clickable objects), this direction ended up less promising. In parallel, a brute-force exploration method emerged and performed better for the public tasks.

Description

The method has two parts:

Frame Processor
Level Graph Explorer

Frame Processor

Basic image processing aims to reduce irrelevant visual variability and focus exploration on actionable regions. It's done by:

Segmenting the frame into single-color connected components.
Detecting and masking likely status bars (e.g., remaining steps).
For click-controlled games, grouping segments into five priority tiers based on button likelihood (average size, salient color; lowest tier includes segments likely to be status bars).
Hashing the masked image for use by the graph explorer.

Level Graph Explorer

From each known frame (graph node), the explorer maintains paths to frontier frames—those with untested actions (graph edges). For each frame, it tracks:

The list of possible actions (clicks for ft09/cv33, arrows for ls20).
For each action: priority level, tested flag, transition result, destination frame, and distance to the nearest frontier.

Actions are taken from the highest-priority group with remaining untested actions; only when all such actions are exhausted across the graph do we proceed to lower-priority groups. Some utility functions are duplicated, and distances are recomputed more often than necessary - this can be cleaned up.

Thoughts

This is a limited but effective approach that approaches the limits of brute-force solving for these games. The goal is simply to be more intelligent than a purely random agent.

It can be tricked if the status bar differs significantly from the public games (e.g., integrated into the scene rather than at the edge). In such cases, the method degrades toward more random exploration because the state space implicitly includes many status bar variants. Additionally, large state spaces (e.g., ft09 levels 3–4) can make the method intractable. Non-determinism or partial observations can also cause issues.

A natural extension would be to learn simple world models that predict the next frame from the current frame and action. This could improve sample efficiency by roughly the average number of actions per frame. However, it’s unclear whether such models would help prioritize exploration of “interesting” states in these games by favoring higher uncertainty or surprise for the agent. For example, why should the correct pattern in ft09 be more surprising than an incorrect one?

Citation

If you find the work helpful for your research, please cite:

E. Rudakov, J. Shock, and B. U. Cowley, Graph-Based Exploration for ARC-AGI-3 Interactive Reasoning Tasks, arXiv:2512.24156 (2025). DOI: 10.48550/arXiv.2512.24156

@misc{rudakov2025graphbased,
  author = {Rudakov, Evgenii and Shock, Jonathan and Cowley, Benjamin Ultan},
  title  = {Graph-Based Exploration for ARC-AGI-3 Interactive Reasoning Tasks},
  year   = {2025},
  eprint = {2512.24156},
  doi    = {10.48550/arXiv.2512.24156},
  url    = {https://arxiv.org/abs/2512.24156}
}

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
agents		agents
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
ARC_AGI_3_Solve_by_Exploration.ipynb		ARC_AGI_3_Solve_by_Exploration.ipynb
LICENSE		LICENSE
README.md		README.md
Report_Draft.pdf		Report_Draft.pdf
graph_explorer.py		graph_explorer.py
llms.txt		llms.txt
main.py		main.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Explore It Till You Solve It

Quickstart

The method

Motivation

Description

Frame Processor

Level Graph Explorer

Thoughts

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Explore It Till You Solve It

Quickstart

The method

Motivation

Description

Frame Processor

Level Graph Explorer

Thoughts

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages