Skip to content

Commit 90beb89

Browse files
authored
Adding chess environment (#324)
* Add chess environment powered by moonfish engine A chess reinforcement learning environment for OpenEnv using the moonfish chess engine for opponent play and position evaluation. Features: - Full chess rules via python-chess library - Configurable opponent: moonfish engine, random moves, or self-play (None) - Position evaluation using moonfish's PSQT-based evaluation - Configurable agent color (white/black/alternate each episode) - Custom starting positions via FEN notation - Terminal state detection on reset for custom positions Rewards: +1.0 win, -1.0 loss, 0.0 draw, -0.1 illegal move * Add temporal discounting for credit assignment in chess env - Add gamma parameter (default 0.99) for configurable discounting - Compute discounted rewards at episode end: r_t = γ^(T-1-t) × R_final - Return discounted_rewards in terminal observation metadata - Add tests for discounting formula and behavior - Document the feature in README * Clarify self-play mode in discounting test
1 parent ae45c2e commit 90beb89

File tree

14 files changed

+1029
-1
lines changed

14 files changed

+1029
-1
lines changed

.github/workflows/docker-build.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,8 @@ jobs:
8181
dockerfile: envs/git_env/server/Dockerfile
8282
- name: connect4_env
8383
dockerfile: envs/connect4_env/server/Dockerfile
84+
- name: chess-env
85+
dockerfile: envs/chess_env/server/Dockerfile
8486
- name: tbench2-env
8587
dockerfile: envs/tbench2_env/server/Dockerfile
8688
- name: textarena-env

.github/workflows/test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ jobs:
3333
- name: Install dependencies
3434
run: |
3535
uv sync --all-extras
36-
uv pip install pytest numpy nltk smolagents
36+
uv pip install pytest numpy nltk smolagents python-chess moonfish
3737
3838
- name: Run tests
3939
run: |

docs/environments.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -205,6 +205,25 @@ The OpenEnv community has built a catalog of ready-to-run environments that cove
205205
</div>
206206
</div>
207207

208+
<div class="environment-card">
209+
<div class="environment-card__body">
210+
<span class="environment-card__tag">Chess</span>
211+
<p class="environment-card__description">
212+
Chess RL environment powered by the moonfish engine with configurable opponents, PSQT evaluation, and full rules support.
213+
</p>
214+
</div>
215+
<div class="environment-card__links">
216+
<a class="environment-card__icon" href="/OpenEnv/environments/chess/" aria-label="Chess docs">
217+
<svg viewBox="0 0 24 24" aria-hidden="true" focusable="false">
218+
<path d="M6 3c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h12c1.1 0 2-.9 2-2V9l-6-6H6zm8 1.5L18.5 9H14V4.5z" fill="currentColor"/>
219+
</svg>
220+
</a>
221+
<a class="environment-card__icon environment-card__icon--hf" href="https://huggingface.co/spaces/luccabb/moonfish_chess" target="_blank" rel="noreferrer noopener" aria-label="Chess on Hugging Face">
222+
<img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" alt="" aria-hidden="true" />
223+
</a>
224+
</div>
225+
</div>
226+
208227
<div class="environment-card">
209228
<div class="environment-card__body">
210229
<span class="environment-card__tag">Unity</span>

docs/environments/chess.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
--8<-- "../../envs/chess_env/README.md"
2+

envs/chess_env/README.md

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
# Chess Environment
2+
3+
A chess reinforcement learning environment for OpenEnv, powered by the [moonfish](https://github.com/luccabb/moonfish) chess engine.
4+
5+
## Features
6+
7+
- **Full chess rules** via python-chess library
8+
- **Configurable opponent**: moonfish engine, random moves, or self-play
9+
- **Position evaluation**: Uses moonfish's PSQT-based evaluation
10+
- **Standard OpenEnv interface**: reset(), step(), state
11+
12+
## Quick Start
13+
14+
### Using Docker
15+
16+
```bash
17+
# Build the image
18+
docker build -t chess-env:latest -f envs/chess_env/server/Dockerfile .
19+
20+
# Run the server
21+
docker run -p 8000:8000 chess-env:latest
22+
```
23+
24+
### Using the Client
25+
26+
```python
27+
from envs.chess_env import ChessEnv, ChessAction
28+
29+
# Connect to server
30+
with ChessEnv(base_url="http://localhost:8000") as env:
31+
# Reset for a new game
32+
result = env.reset()
33+
print(f"Starting position: {result.observation.fen}")
34+
print(f"Legal moves: {result.observation.legal_moves}")
35+
36+
# Make a move
37+
result = env.step(ChessAction(move="e2e4"))
38+
print(f"Reward: {result.reward}, Done: {result.done}")
39+
40+
# Play until game ends
41+
while not result.done:
42+
# Your policy here
43+
move = result.observation.legal_moves[0]
44+
result = env.step(ChessAction(move=move))
45+
46+
print(f"Game result: {result.observation.result}")
47+
```
48+
49+
## Observation Space
50+
51+
| Field | Type | Description |
52+
|-------|------|-------------|
53+
| `fen` | str | Board position in FEN notation |
54+
| `legal_moves` | List[str] | Legal moves in UCI format |
55+
| `is_check` | bool | Whether current player is in check |
56+
| `done` | bool | Whether game has ended |
57+
| `reward` | float | Reward for last action |
58+
| `result` | str | Game result ("1-0", "0-1", "1/2-1/2") |
59+
60+
## Action Space
61+
62+
| Field | Type | Description |
63+
|-------|------|-------------|
64+
| `move` | str | UCI format move (e.g., "e2e4", "e7e8q") |
65+
66+
## Rewards
67+
68+
| Outcome | Reward |
69+
|---------|--------|
70+
| Win | +1.0 |
71+
| Loss | -1.0 |
72+
| Draw | 0.0 |
73+
| Illegal move | -0.1 |
74+
75+
## Configuration
76+
77+
The environment supports these configuration options:
78+
79+
| Parameter | Default | Description |
80+
|-----------|---------|-------------|
81+
| `opponent` | "moonfish" | Opponent type: "moonfish", "random", or None |
82+
| `opponent_depth` | 2 | Search depth for moonfish opponent |
83+
| `max_moves` | 500 | Maximum half-moves before draw |
84+
| `agent_color` | None | Agent color: "white", "black", or None (alternate each episode) |
85+
| `gamma` | 0.99 | Discount factor for temporal credit assignment |
86+
87+
## Temporal Discounting
88+
89+
For RL training, the environment computes temporally discounted rewards at episode end. This helps with credit assignment in long games where only the final outcome is known.
90+
91+
When an episode ends, the terminal observation's `metadata` includes:
92+
- `discounted_rewards`: List of discounted rewards for each agent move
93+
- `gamma`: The discount factor used
94+
95+
The formula is `r_t = γ^(T-1-t) × R_final` where:
96+
- `T` = total agent moves
97+
- `t` = move index (0-indexed)
98+
- `R_final` = terminal reward (+1, -1, or 0)
99+
100+
Example for a 5-move win with γ=0.99:
101+
```
102+
Move 0: 0.99^4 × 1.0 = 0.961
103+
Move 1: 0.99^3 × 1.0 = 0.970
104+
Move 2: 0.99^2 × 1.0 = 0.980
105+
Move 3: 0.99^1 × 1.0 = 0.990
106+
Move 4: 0.99^0 × 1.0 = 1.000
107+
```
108+
109+
## Links
110+
111+
- [moonfish GitHub](https://github.com/luccabb/moonfish)
112+
- [Play online](https://huggingface.co/spaces/luccabb/moonfish_chess)

envs/chess_env/__init__.py

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Copyright (c) Meta Platforms, Inc. and affiliates.
2+
# All rights reserved.
3+
#
4+
# This source code is licensed under the BSD-style license found in the
5+
# LICENSE file in the root directory of this source tree.
6+
7+
"""
8+
Chess Environment for OpenEnv.
9+
10+
This module provides OpenEnv integration for chess, using the moonfish
11+
chess engine for position evaluation and opponent play.
12+
13+
Example:
14+
>>> from envs.chess_env import ChessEnv, ChessAction
15+
>>>
16+
>>> # Connect to a running server or start via Docker
17+
>>> env = ChessEnv.from_docker_image("chess-env:latest")
18+
>>>
19+
>>> # Reset and interact
20+
>>> result = env.reset()
21+
>>> print(result.observation.fen)
22+
>>> print(result.observation.legal_moves)
23+
>>>
24+
>>> result = env.step(ChessAction(move="e2e4"))
25+
>>> print(result.reward, result.done)
26+
>>>
27+
>>> # Cleanup
28+
>>> env.close()
29+
"""
30+
31+
from .client import ChessEnv
32+
from .models import ChessAction, ChessObservation, ChessState
33+
34+
__all__ = ["ChessEnv", "ChessAction", "ChessObservation", "ChessState"]

envs/chess_env/client.py

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
# Copyright (c) Meta Platforms, Inc. and affiliates.
2+
# All rights reserved.
3+
#
4+
# This source code is licensed under the BSD-style license found in the
5+
# LICENSE file in the root directory of this source tree.
6+
7+
"""
8+
Chess Environment Client.
9+
10+
This module provides the client for connecting to a Chess Environment server
11+
via WebSocket for persistent sessions.
12+
"""
13+
14+
from __future__ import annotations
15+
16+
from typing import Any, Dict
17+
18+
from openenv.core.client_types import StepResult
19+
from openenv.core.env_client import EnvClient
20+
21+
from .models import ChessAction, ChessObservation, ChessState
22+
23+
24+
class ChessEnv(EnvClient[ChessAction, ChessObservation, ChessState]):
25+
"""
26+
Client for Chess Environment.
27+
28+
This client maintains a persistent WebSocket connection to the environment
29+
server, enabling efficient multi-step interactions with lower latency.
30+
31+
Uses the moonfish chess engine for opponent moves and position evaluation.
32+
33+
Example:
34+
>>> with ChessEnv(base_url="http://localhost:8000") as client:
35+
... result = client.reset()
36+
... print(result.observation.fen)
37+
... print(result.observation.legal_moves)
38+
...
39+
... result = client.step(ChessAction(move="e2e4"))
40+
... print(result.reward, result.done)
41+
"""
42+
43+
def _step_payload(self, action: ChessAction) -> Dict[str, Any]:
44+
"""
45+
Convert ChessAction to JSON payload for step request.
46+
47+
Args:
48+
action: ChessAction instance with UCI move string.
49+
50+
Returns:
51+
Dictionary representation suitable for JSON encoding.
52+
"""
53+
return {
54+
"move": action.move,
55+
}
56+
57+
def _parse_result(self, payload: Dict[str, Any]) -> StepResult[ChessObservation]:
58+
"""
59+
Parse server response into StepResult[ChessObservation].
60+
61+
Args:
62+
payload: JSON response from server.
63+
64+
Returns:
65+
StepResult with ChessObservation.
66+
"""
67+
obs_data = payload.get("observation", {})
68+
69+
observation = ChessObservation(
70+
fen=obs_data.get("fen", ""),
71+
legal_moves=obs_data.get("legal_moves", []),
72+
is_check=obs_data.get("is_check", False),
73+
done=obs_data.get("done", False),
74+
reward=obs_data.get("reward", 0.0),
75+
result=obs_data.get("result"),
76+
metadata=obs_data.get("metadata", {}),
77+
)
78+
79+
return StepResult(
80+
observation=observation,
81+
reward=observation.reward,
82+
done=observation.done,
83+
)
84+
85+
def _parse_state(self, payload: Dict[str, Any]) -> ChessState:
86+
"""
87+
Parse server response into ChessState object.
88+
89+
Args:
90+
payload: JSON response from /state endpoint.
91+
92+
Returns:
93+
ChessState object with environment state information.
94+
"""
95+
return ChessState(
96+
episode_id=payload.get("episode_id", ""),
97+
fen=payload.get("fen", ""),
98+
current_player=payload.get("current_player", "white"),
99+
move_history=payload.get("move_history", []),
100+
step_count=payload.get("step_count", 0),
101+
)

envs/chess_env/models.py

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Copyright (c) Meta Platforms, Inc. and affiliates.
2+
# All rights reserved.
3+
#
4+
# This source code is licensed under the BSD-style license found in the
5+
# LICENSE file in the root directory of this source tree.
6+
7+
"""
8+
Data models for Chess Environment.
9+
10+
This module defines the Action, Observation, and State types for chess games
11+
via the OpenEnv interface. Uses the moonfish chess engine for move search
12+
and position evaluation.
13+
"""
14+
15+
from __future__ import annotations
16+
17+
from typing import List, Optional
18+
19+
from pydantic import Field
20+
21+
from openenv.core.env_server import Action, Observation, State
22+
23+
24+
class ChessAction(Action):
25+
"""
26+
Action for Chess environment.
27+
28+
Attributes:
29+
move: UCI format move string (e.g., "e2e4", "e7e8q" for promotion).
30+
"""
31+
32+
move: str
33+
34+
35+
class ChessObservation(Observation):
36+
"""
37+
Observation for Chess environment.
38+
39+
Attributes:
40+
fen: Board position in FEN notation.
41+
legal_moves: List of legal moves in UCI format.
42+
is_check: Whether the current player is in check.
43+
done: Whether the game is over.
44+
reward: Reward for the last action.
45+
result: Game result string if game is over (e.g., "1-0", "0-1", "1/2-1/2").
46+
"""
47+
48+
fen: str = ""
49+
legal_moves: List[str] = Field(default_factory=list)
50+
is_check: bool = False
51+
result: Optional[str] = None
52+
53+
54+
class ChessState(State):
55+
"""
56+
State for Chess environment.
57+
58+
Attributes:
59+
episode_id: Unique ID for the current game.
60+
fen: Current board position in FEN notation.
61+
current_player: "white" or "black".
62+
move_history: List of moves played in UCI format.
63+
step_count: Number of half-moves played.
64+
"""
65+
66+
fen: str = "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"
67+
current_player: str = "white"
68+
move_history: List[str] = Field(default_factory=list)
69+
step_count: int = 0

0 commit comments

Comments
 (0)