Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
b55b3f6
Integrate SDK tracing with Google ADK
jeffreysijuntan Feb 25, 2026
1461878
add support for openai agents
jeffreysijuntan Feb 25, 2026
87963f6
add openai agents
jeffreysijuntan Feb 25, 2026
1fc5356
Integrate SDK tracing with Strands Agents
jeffreysijuntan Feb 26, 2026
5270f62
support rLLM SDK in unified trainer
jeffreysijuntan Feb 26, 2026
95c1c41
Merge origin/main into dev-sdk
jeffreysijuntan Feb 26, 2026
b1d6bc0
Optimize SDK trainer performance
jeffreysijuntan Feb 26, 2026
7eee3cd
unify data schema between SDK and rLLM
jeffreysijuntan Feb 26, 2026
86f0e85
Remove StepView/TrajectoryView aliases, use Step/Trajectory everywhere
jeffreysijuntan Feb 26, 2026
14f0571
deprecate some old examples
jeffreysijuntan Feb 26, 2026
62043c3
Add TinkerProxyManager to unify SDK proxy pipeline for Tinker backend
jeffreysijuntan Feb 26, 2026
e0a0e5b
Fix Strands math training returning null output by switching to LiteL…
jeffreysijuntan Feb 27, 2026
dc51fe4
Remove fake_stream from TinkerProxyManager config
jeffreysijuntan Feb 27, 2026
28a7b97
Add OpenAI Agents SDK math training example
jeffreysijuntan Feb 27, 2026
061dc72
Add math agent example with ADK
jeffreysijuntan Feb 27, 2026
09cc58f
Add sandboxed agent execution for rLLM SDK
jeffreysijuntan Feb 28, 2026
1fd74fc
Add AWS Bedrock AgentCore sandbox backend
jeffreysijuntan Feb 28, 2026
a239aa4
Improve sandbox reliability with retry logic, async waits, and connec…
jeffreysijuntan Feb 28, 2026
bdc7c33
moving some examples to archive
jeffreysijuntan Mar 1, 2026
1315e2f
Add rllm CLI with dataset management, eval runner, and setup commands
jeffreysijuntan Mar 1, 2026
11be31c
Add AgentFlow and Evaluator abstractions for eval framework
jeffreysijuntan Mar 1, 2026
1154d9f
Add 15 new benchmarks with agents, evaluators, and transform infrastr…
jeffreysijuntan Mar 1, 2026
045d62e
Consolidate CLI into `rllm model` command group with per-provider API…
jeffreysijuntan Mar 1, 2026
0bf7dcb
Add 11 new benchmarks, multilingual aggregation, and fix BFCL/HMMT/Lo…
jeffreysijuntan Mar 1, 2026
23b0746
Add 10 VLM benchmarks with image handling, agents, and transforms
jeffreysijuntan Mar 2, 2026
10b850e
Add `rllm train` CLI command with session-aware proxy tracing
jeffreysijuntan Mar 2, 2026
7e4ba19
Replace dual HTTP hop with direct TinkerProxy for tinker training
jeffreysijuntan Mar 2, 2026
d59d264
Lazy-load heavy imports for faster CLI startup
jeffreysijuntan Mar 2, 2026
32496f8
Add agent/evaluator plugin system with persistent registration, entry…
jeffreysijuntan Mar 2, 2026
8ab7d54
Inline image bytes via Arrow IPC for zero-loss VLM dataset pipeline
jeffreysijuntan Mar 3, 2026
195556b
Fix 7 broken dataset catalog entries and improve gated dataset errors
jeffreysijuntan Mar 3, 2026
6ea04fd
Add 6 VLM benchmarks: AI2D, OCRBench, CharXiv, CC-OCR, CountBenchQA, …
jeffreysijuntan Mar 3, 2026
08238b0
Add AIME 2025 and AIME 2026 math competition benchmarks
jeffreysijuntan Mar 3, 2026
97823f7
Add Geometry3K (geo3k) VLM benchmark for geometry problem solving wit…
jeffreysijuntan Mar 3, 2026
e552e73
Load image_processor for VLM models in Tinker backend to fix training
jeffreysijuntan Mar 3, 2026
3c70bf0
Add search agent and 4 web search benchmarks (browsecomp, seal0, wide…
jeffreysijuntan Mar 3, 2026
57a9b90
Fix EvalRunner thread pool bottleneck and add async AgentFlow support
jeffreysijuntan Mar 3, 2026
8c1cce3
Merge branch 'main' of https://github.com/rllm-org/rllm into dev-cli
jeffreysijuntan Mar 4, 2026
b71a093
Add --ui and --ui-url flags to `rllm train` CLI
jeffreysijuntan Mar 4, 2026
be2de68
Add sandboxed benchmark support with SandboxedAgentFlow, tool system,…
jeffreysijuntan Mar 4, 2026
599168e
Add SWE-bench agent plugin with mini-swe-agent-inspired scaffolding
jeffreysijuntan Mar 4, 2026
ddf717e
Fix advantage estimator return types and UILogger error handling
jeffreysijuntan Mar 4, 2026
042c59b
Add FrozenLake eval framework plugin with procedural dataset generation
jeffreysijuntan Mar 5, 2026
b271bab
Add VLM benchmarks for text recognition, document understanding, and …
jeffreysijuntan Mar 5, 2026
cee82e9
Add general-purpose ReAct agent plugin with TaskSpec-driven eval pipe…
jeffreysijuntan Mar 5, 2026
3eabe3f
Add `rllm init` command to scaffold new agent projects
jeffreysijuntan Mar 5, 2026
16e3a46
Replace 15 single-purpose agents with built-in multi-turn ReAct agent…
jeffreysijuntan Mar 6, 2026
b98d43e
Slim core dependencies for rllm 0.3.0 — move training/reward/tool dep…
jeffreysijuntan Mar 6, 2026
00dda85
feat(eval): add --ui and --ui-url flags to `rllm eval` CLI
Chanbinski Mar 6, 2026
d47c3d3
refactor(cli): simplify --ui/--ui-url flags to just --ui
Chanbinski Mar 6, 2026
725af2d
docs(ui): update rLLM UI descriptions and nav title
Chanbinski Mar 6, 2026
503f44b
feat(tracking): add non-blocking background worker to UILogger
Chanbinski Mar 6, 2026
9d636b3
Merge pull request #424 from Chanbinski/support-eval
jeffreysijuntan Mar 6, 2026
c555e7d
Merge branch 'main' of https://github.com/rllm-org/rllm into dev-cli
jeffreysijuntan Mar 6, 2026
2273bd4
feat(training): unify AgentFlow + Workflow via model gateway
jeffreysijuntan Mar 6, 2026
78458fa
feat(cli): add `rllm login` command for UI authentication
Chanbinski Mar 6, 2026
560fa0e
fix: avoid event loop blocking and support VLM multimodal content in …
jeffreysijuntan Mar 6, 2026
0f42892
Merge pull request #425 from Chanbinski/feat/login
jeffreysijuntan Mar 6, 2026
096511e
feat(eval): support async arun in AgentFlow contract
jeffreysijuntan Mar 6, 2026
5320824
feat: add SmolAgents, Strands, and LangGraph agent plugins with SDK i…
jeffreysijuntan Mar 6, 2026
3f80469
refactor: remove SDK/agent_run_func path from UnifiedTrainer
jeffreysijuntan Mar 7, 2026
136280b
fix: remove unused mock_at_cls variables in train command tests
jeffreysijuntan Mar 7, 2026
96ec517
fix: resolve ruff linting errors across codebase
jeffreysijuntan Mar 7, 2026
0a27d90
deprecate prior integrations
jeffreysijuntan Mar 7, 2026
e19bf4b
Merge branch 'main' of https://github.com/rllm-org/rllm into dev-cli
jeffreysijuntan Mar 11, 2026
790b055
style(cli): redesign banner and dataset list with Rich styling
jeffreysijuntan Mar 11, 2026
3612399
feat(eval): expand provider support with unified registry
jeffreysijuntan Mar 11, 2026
94754db
feat(cli): auto-enable UI logging when user is logged in
jeffreysijuntan Mar 11, 2026
fc06ea6
chore(init): keep only multi-turn ReAct agent template
jeffreysijuntan Mar 11, 2026
8bfa069
Merge branch 'main' of https://github.com/rllm-org/rllm into dev-cli
jeffreysijuntan Mar 12, 2026
df29654
refactor: rename plugins/ to agenthub/ and clean up examples
jeffreysijuntan Mar 12, 2026
e83d33b
feat(ui): progressive batched uploads, session URL, and registration …
Chanbinski Mar 12, 2026
ec80f2f
feat(ui): batch trajectory group uploads
Chanbinski Mar 12, 2026
b10d9b0
clean up dependencies
jeffreysijuntan Mar 12, 2026
9ac5f71
fix: use keyword arguments for Pydantic BaseModel constructors
jeffreysijuntan Mar 13, 2026
9797170
Merge pull request #440 from Chanbinski/feat/ui-logging-improvements
jeffreysijuntan Mar 13, 2026
ad921f1
fix precommit errors
jeffreysijuntan Mar 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions agenthub/frozenlake_agent/agent/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"""FrozenLake agent plugin for rLLM."""

from .agent import FrozenLakeAgentFlow, frozenlake_agent

__all__ = ["FrozenLakeAgentFlow", "frozenlake_agent"]
141 changes: 141 additions & 0 deletions agenthub/frozenlake_agent/agent/agent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
"""FrozenLake AgentFlow — multi-turn grid navigation agent."""

from __future__ import annotations

import logging
import re

import openai

from rllm.experimental.eval.types import AgentConfig
from rllm.types import Episode, Step, Trajectory

from .env import ACTION_INVALID, FrozenLakeEnv

logger = logging.getLogger(__name__)

DIRECTION_MAP = {"left": 1, "down": 2, "right": 3, "up": 4}

SYSTEM_PROMPT = """\
You are a helpful assistant. You are walking on a frozen lake.

FrozenLake Quick Guide
Goal: Reach the goal (G). Player (P) and Goal (G) must overlap.

Symbols:
_ Frozen | O Hole | G Goal | P Player

Rules:
1. Avoid falling into holes (O).
2. Frozen tiles are slippery, you may move perpendicular to your intended direction.

Valid Action (separated by | ):
Up | Down | Left | Right

Rewards:
Fall into hole: 0
Reach goal: +1.0

You will be provided the current observation, please decide on the next Action.
You should show your thought process and then input the final action in ``` ```.
You should only output the NEXT ACTION at each interation in the ``` ```. For example, if you want to move up, you should output ```Up```.
You should plan ahead and need to achieve it in minimum number of steps.
You should be aware that frozen tiles can be slippery, but the chance is small and you should not overthink it.

Please show your thinking process and put the final action in ``` ```. In every turn, the final action MUST be one of Up, Down, Left, Right.
"""

DEFAULT_MAX_STEPS = 10


def _parse_action(response: str) -> int:
"""Extract a direction action from the model response.

Looks for the last ```...``` block and maps its content to an action int.
Returns ACTION_INVALID (0) if parsing fails.
"""
matches = re.findall(r"```(.*?)```", response, re.DOTALL)
if not matches:
return ACTION_INVALID

text = matches[-1].strip().lower()
if text in DIRECTION_MAP:
return DIRECTION_MAP[text]
if text.isdigit() and int(text) in DIRECTION_MAP.values():
return int(text)
return ACTION_INVALID


class FrozenLakeAgentFlow:
"""AgentFlow implementation for the FrozenLake grid navigation task."""

def run(self, task: dict, config: AgentConfig) -> Episode:
seed = task.get("seed", 42)
size = task.get("size", 4)
p = task.get("p", 0.8)
max_steps = task.get("max_steps", DEFAULT_MAX_STEPS)

env = FrozenLakeEnv(size=size, p=p, seed=seed, max_steps=max_steps)
obs = env.reset()

client = openai.OpenAI(base_url=config.base_url, api_key="not-needed")

messages: list[dict[str, str]] = [{"role": "system", "content": SYSTEM_PROMPT}]
steps: list[Step] = []
num_steps = 0

for turn in range(max_steps):
user_content = f"Current Observation ({turn}):\n{obs}\nYou have not achieved the goal, P has not reached G yet. Please give the next action."
if turn > 0 and steps and not steps[-1].metadata.get("action_is_effective", True):
user_content += "\nYour last response is invalid. Your position didn't change at all. You may need to recheck your thinking process, action outputted, and the format of response. Remember, you should only output the NEXT ACTION at each interation in the ``` ```. For example, if you want to move up, you should output ```Up```."
remaining = max_steps - turn
user_content += f"\nThe maximum number of steps remaining is {remaining}."

messages.append({"role": "user", "content": user_content})

response = client.chat.completions.create(
model=config.model,
messages=messages,
temperature=0.0,
)
assistant_text = response.choices[0].message.content or ""
messages.append({"role": "assistant", "content": assistant_text})

action = _parse_action(assistant_text)
obs, reward, done, info = env.step(action)
num_steps += 1

steps.append(
Step(
input=user_content,
output=assistant_text,
action=action,
reward=reward,
done=done,
metadata=info,
)
)

if done:
break

success = env.success()
task_id = task.get("task_id", f"frozenlake_s{seed}")

trajectory = Trajectory(
name="navigator",
task=task,
steps=steps,
reward=1.0 if success else 0.0,
)

return Episode(
id=f"{task_id}:0",
task=task,
trajectories=[trajectory],
artifacts={"success": success, "num_steps": num_steps},
)


# Module-level singleton for plugin entry point
frozenlake_agent = FrozenLakeAgentFlow()
203 changes: 203 additions & 0 deletions agenthub/frozenlake_agent/agent/env.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
"""Lightweight FrozenLake environment — no gymnasium dependency.

Reimplements the core grid-world logic from the legacy
rllm/environments/frozenlake/frozenlake.py using only numpy.
"""

from __future__ import annotations

import numpy as np

# ---------------------------------------------------------------------------
# Map generation
# ---------------------------------------------------------------------------


def _is_valid(board: list[list[str]], max_steps: int) -> bool:
"""DFS check that a path from S to G exists within max_steps."""
arr = np.array(board)
start_r, start_c = np.where(arr == "S")
frontier: list[tuple[int, int, int]] = [(int(start_r[0]), int(start_c[0]), 0)]
discovered: set[tuple[int, int]] = set()
size = len(board)

while frontier:
r, c, steps = frontier.pop()
if steps > max_steps:
continue
if (r, c) in discovered:
continue
discovered.add((r, c))
for dr, dc in [(1, 0), (0, 1), (-1, 0), (0, -1)]:
nr, nc = r + dr, c + dc
if 0 <= nr < size and 0 <= nc < size:
if board[nr][nc] == "G":
return True
if board[nr][nc] != "H":
frontier.append((nr, nc, steps + 1))
return False


def generate_random_map(size: int = 8, p: float = 0.8, seed: int = 0, max_steps: int = 5) -> tuple[list[str], tuple[int, int]]:
"""Generate a random valid FrozenLake map.

Args:
size: Grid side length.
p: Probability a tile is frozen (vs hole).
seed: RNG seed for reproducibility.
max_steps: Maximum steps for path-validity check.

Returns:
(map_rows, goal_position) where map_rows is a list of strings
like ``["SFFF", "FHFH", "FFFH", "HFFG"]`` and goal_position
is ``(row, col)`` of G.
"""
rng = np.random.RandomState(seed)
p = min(1.0, p)

while True:
board = rng.choice(["F", "H"], (size, size), p=[p, 1 - p]).tolist()

# Pick distinct start and goal positions
while True:
sr, sc = int(rng.randint(0, size)), int(rng.randint(0, size))
gr, gc = int(rng.randint(0, size)), int(rng.randint(0, size))
if (sr, sc) != (gr, gc):
break

board[sr][sc] = "S"
board[gr][gc] = "G"

if _is_valid(board, max_steps):
return ["".join(row) for row in board], (gr, gc)


# ---------------------------------------------------------------------------
# Environment
# ---------------------------------------------------------------------------

# Action constants
ACTION_INVALID = 0
ACTION_LEFT = 1
ACTION_DOWN = 2
ACTION_RIGHT = 3
ACTION_UP = 4

ACTION_LOOKUP = {0: "None", 1: "Left", 2: "Down", 3: "Right", 4: "Up"}

# Deltas: (row_delta, col_delta) for each action
_DELTAS = {
ACTION_LEFT: (0, -1),
ACTION_DOWN: (1, 0),
ACTION_RIGHT: (0, 1),
ACTION_UP: (-1, 0),
}

# Render symbols
_GRID_LOOKUP = {
"P": " P \t",
"F": " _ \t",
"H": " O \t",
"G": " G \t",
"X": " X \t", # player fell in hole
"V": " √ \t", # player reached goal
}


class FrozenLakeEnv:
"""Pure-Python FrozenLake grid-world environment."""

def __init__(
self,
size: int = 4,
p: float = 0.8,
seed: int = 42,
max_steps: int = 5,
is_slippery: bool = False,
):
self.size = size
self.p = p
self.seed = seed
self.max_steps = max_steps
self.is_slippery = is_slippery

self._map: list[str] = []
self._goal: tuple[int, int] = (0, 0)
self._player: tuple[int, int] = (0, 0)
self._done = False
self.reset()

# ------------------------------------------------------------------
# Public API
# ------------------------------------------------------------------

def reset(self) -> str:
"""Reset environment and return initial observation."""
self._map, self._goal = generate_random_map(size=self.size, p=self.p, seed=self.seed, max_steps=self.max_steps)
# Find start position
for r, row in enumerate(self._map):
for c, ch in enumerate(row):
if ch == "S":
self._player = (r, c)
break
self._done = False
return self.render()

def step(self, action: int) -> tuple[str, float, bool, dict]:
"""Take an action and return (observation, reward, done, info).

Actions: 1=Left, 2=Down, 3=Right, 4=Up. 0 is invalid (no-op).
"""
if self._done:
return self.render(), 0.0, True, {"action_is_effective": False}

if action == ACTION_INVALID or action not in _DELTAS:
return self.render(), 0.0, False, {"action_is_effective": False}

prev = self._player
dr, dc = _DELTAS[action]
nr, nc = prev[0] + dr, prev[1] + dc

# Boundary check
if 0 <= nr < self.size and 0 <= nc < self.size:
self._player = (nr, nc)

tile = self._map[self._player[0]][self._player[1]]
effective = self._player != prev

if tile == "G":
self._done = True
return self.render(), 1.0, True, {"action_is_effective": effective}
if tile == "H":
self._done = True
return self.render(), 0.0, True, {"action_is_effective": effective}

return self.render(), 0.0, False, {"action_is_effective": effective}

def render(self) -> str:
"""Render the grid as a text string (P=player, _=frozen, O=hole, G=goal)."""
rows = []
for r in range(self.size):
cells = []
for c in range(self.size):
if (r, c) == self._player:
tile = self._map[r][c]
if tile == "H":
cells.append(_GRID_LOOKUP["X"])
elif tile == "G":
cells.append(_GRID_LOOKUP["V"])
else:
cells.append(_GRID_LOOKUP["P"])
else:
ch = self._map[r][c]
# Replace start marker with frozen
sym = "F" if ch == "S" else ch
cells.append(_GRID_LOOKUP[sym])
rows.append("".join(cells))
return "\n".join(rows)

def finished(self) -> bool:
return self._done

def success(self) -> bool:
return self._done and self._map[self._player[0]][self._player[1]] == "G"
5 changes: 5 additions & 0 deletions agenthub/frozenlake_agent/eval/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"""FrozenLake evaluator plugin for rLLM."""

from .evaluator import FrozenLakeEvaluator

__all__ = ["FrozenLakeEvaluator"]
25 changes: 25 additions & 0 deletions agenthub/frozenlake_agent/eval/evaluator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
"""FrozenLake evaluator: scores episodes based on goal-reaching success."""

from __future__ import annotations

from rllm.experimental.eval.types import EvalOutput, Signal
from rllm.types import Episode


class FrozenLakeEvaluator:
"""Evaluator that checks whether the agent reached the goal."""

def evaluate(self, task: dict, episode: Episode) -> EvalOutput:
success = episode.artifacts.get("success", False)
num_steps = episode.artifacts.get("num_steps", 0)

reward = 1.0 if success else 0.0
return EvalOutput(
reward=reward,
is_correct=bool(success),
signals=[
Signal("success", float(success)),
Signal("num_steps", float(num_steps)),
],
metadata={"num_steps": num_steps},
)
Loading
Loading