[Feature]: rLLM CLI, AgentFlow Framework, Model Gateway & Plugin System by jeffreysijuntan · Pull Request #438 · rllm-org/rllm

jeffreysijuntan · 2026-03-12T05:53:17Z

Summary

This is a major release that introduces a full-featured CLI (rllm eval, rllm train, rllm init, rllm login), a comprehensive eval framework with 40+ benchmarks, a model gateway for RL agent training, an agent/evaluator plugin system, and sandboxed execution support. It also slims core dependencies, deprecates legacy APIs, and adds SDK integrations for popular agent frameworks.

Key Changes

CLI (rllm/experimental/cli/)

rllm eval <benchmark> --model <name> — run evaluations against any supported benchmark
rllm train <benchmark> --model <name> — train with session-aware proxy tracing via tinker backend
rllm init — scaffold new agent projects from templates (ReAct, OpenAI Agents, ADK, LangChain, CrewAI)
rllm login — authenticate with the rLLM UI
rllm dataset list/pull — browse and pull datasets from HuggingFace catalog
rllm model setup — configure model providers with per-provider API keys
--ui flag on eval/train for auto-enabling UI logging when logged in
Rich-styled banners and dataset list display
Lazy-loaded heavy imports for faster startup

AgentFlow Abstraction for Eval (rllm/experimental/eval/)

AgentFlow / Evaluator protocol abstractions with async support (arun)
Built-in evaluators: exact match, LLM judge, LLM equality, BFCL, IFEval, translation, WideSearch
EvalRunner with thread pool concurrency and async AgentFlow support
Task specification system (TaskSpec) and eval config management
40+ benchmarks in the dataset registry including math (AIME 2025/2026), code (SWE-bench), VLM (AI2D, OCRBench, CharXiv, Geometry3K, etc.), search (BrowseComp, SEAL, WideSearch, HLE), and multilingual
Arrow IPC pipeline for zero-loss VLM image handling

AgentHub Plugins (agenthub/)

Agent/evaluator plugin architecture with persistent registration and entry-point discovery
Built-in plugins: ReAct agent, FrozenLake, SWE-bench, terminal agent
Framework integration plugins: SmolAgents, Strands, LangGraph
Example agent plugin with tests (examples/cli/agent_plugin/)

Training & RL Improvements

Unified AgentFlow + Workflow training path via model gateway
Refactored RL advantage estimators: REINFORCE++ baseline, RLOO support, role-specific estimator mapping
Sympy timeout to prevent Ray stalls
VLM multimodal content support in training
Fix for ChatTemplateParser in SFT trainer and OpenAI engine

Packaging & Housekeeping

Slim core dependencies for rLLM 0.3.0 — training/reward/tool deps moved to opt-in extras ([verl], [tinker], [sdk], [dev])
StepView/TrajectoryView aliases removed — use Step/Trajectory everywhere
Old examples archived to examples/archive/
Old integrations (rllm/integrations/) deprecated and removed
PR and issue templates added
Claude Code Review and PR Assistant GitHub Actions workflows
Unified data schema between SDK and rLLM
Ruff linting fixes across codebase

Test Plan

[] Run pytest — comprehensive test suite added for CLI commands, eval framework, data pipeline, model gateway, and SDK workflows
Run ruff check . — linting passes
Verify rllm eval gsm8k --model gpt-4o-mini runs end-to-end
Verify rllm init scaffolds a project correctly
Verify rllm dataset list displays catalog
Verify uv pip install -e . installs with slim core deps

Add RLLMTrajectoryHookProvider, a Strands HookProvider that captures LLM calls during agent execution and builds TrajectoryView objects for RL training. Converts Bedrock-style messages to OpenAI Chat Completions format. Includes examples for simple, tool-using, and multi-agent setups. Also harden integration __init__.py imports with broad exception handling to prevent broken optional deps from blocking unrelated integrations. Made-with: Cursor

Resolve conflict in unified_trainer.py: - Adopt main's asyncio.run() pattern (drop background event loop thread) - Keep SDK integration (agent_run_func, SdkWorkflowFactory, post_execute_hook) - Adopt main's TrainerState enhancements (total_steps, reset_batch, etc.) - Keep SDK factory cleanup in shutdown() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The SDK previously maintained separate StepView/TrajectoryView aliases for the canonical Step/Trajectory types from rllm.types. This removes the indirection and uses Step/Trajectory directly across the entire codebase, including integrations, engines, examples, tests, and docs. Also renames trace_to_step_view -> trace_to_step. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace the InferenceAPIServer-based Tinker path with a TinkerProxyManager that follows the same LiteLLM proxy pattern as VerlProxyManager. This ensures Tinker traces flow through TracingCallback (metadata routing, session context) instead of being stored directly by the inference server. Key changes: - New TinkerBackendServer: lightweight FastAPI wrapping TinkerEngine as an OpenAI-compatible /v1/chat/completions endpoint with token IDs and logprobs embedded as top-level choice fields (LiteLLM auto-collects these into provider_specific_fields) - New TinkerProxyManager: starts backend server, generates LiteLLM config with hosted_vllm/ prefix, manages lifecycle of both proxy and backend - Clean LiteLLM message dicts in build_llm_output to match Step's dict[str, str] schema (strip None values, promote reasoning from provider_specific_fields) - Remove InferenceAPIServer usage and inference_server parameter from SdkWorkflowFactory/SdkWorkflow Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…LMModel The root cause was that Strands' OpenAIModel always uses stream=True with the raw OpenAI SDK, but the LiteLLM proxy (hosted_vllm/ + fake_stream) sent SSE chunks lacking a proper finish_reason. Without finish_reason, Strands' process_stream never emitted contentBlockStop, so accumulated text was never finalized — resulting in empty message content. Key changes: - examples/sdk/strands_math: Switch from OpenAIModel to LiteLLMModel - strands.py: Remove `not self._traces` guard from _build_trajectories() so trajectories are always built even when no traces were recorded - strands.py: Add support for plain-string content (OpenAI format) and reasoningContent blocks in message converters - tinker_backend_server.py: Add streaming SSE support and content-block array flattening for multi-format message content - sdk_workflow.py: Add tracer flush retry and positional step matching fallback for Strands hook provider ID mismatches - proxy_manager.py: Enable fake_stream for TinkerProxyManager Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

TinkerBackendServer has native SSE streaming support, so the LiteLLM proxy no longer needs to convert non-streaming responses to SSE. Strands' LiteLLMModel sends stream=True by default, which the backend now handles directly via _stream_sse(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

New example at examples/sdk/openai_agents_math/ following the same agent_run_func pattern as the ADK and Strands examples. Uses OpenAIProvider with use_responses=False to route Chat Completions through the trainer's LiteLLM proxy. Also fix the same _ensure_trajectories guard bug in openai_agents.py that was fixed in strands.py — always build a trajectory even when _traces is empty so output/input are preserved. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Enables running agent code inside isolated environments (local subprocess, Docker, with stubs for Modal/AgentCore) instead of inside the trainer process. Agents expose a `rollout(task, config) -> list[Trajectory]` contract and communicate results back via a SQLite-backed result store through the proxy. Key components: - SandboxOrchestrator with persistent worker pool and per-task modes - ExecutionResultStore (SQLite + WAL) for cross-process result delivery - worker_server.py runner for inside sandboxes (fire-and-forget execution) - Local and Docker sandbox backends - Proxy routes for result submission/retrieval - SdkWorkflow integration with sandboxed execution path - Lazy __getattr__ imports in rllm/__init__.py to avoid torch dependency - Example sandbox_math agent with smoke test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Implement the agentcore sandbox backend that invokes pre-deployed AgentCore Runtimes via boto3 instead of managing local sandboxes. The ACR container runs agentcore_worker.py and POSTs results directly to the proxy's result store — same pattern as local/docker backends. - AgentCoreOrchestrator: bypasses SandboxOrchestrator protocol, invokes ACR via invoke_agent_runtime with rate limiting and adaptive retry - agentcore_worker.py: self-contained BedrockAgentCoreApp entrypoint with inlined metadata slug encoding and result POST (no rllm dep) - base_url override in config.extra for proxy reachability from ACR - Example Dockerfile, requirements, and step-by-step deployment docs using agentcore CLI (configure/deploy workflow) - IAM permissions documentation for InvokeAgentRuntime - Fallback rllm_types.py for agent environments without rllm installed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tion reuse Add exponential-backoff retries to result submission across all worker variants, replace per-request aiohttp sessions with shared sessions, and introduce async event-based waiting in ExecutionResultStore to eliminate thread-blocking polling. Also parallelizes persistent pool worker creation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add the rllm CLI (`rllm setup`, `rllm dataset`, `rllm agent`, `rllm eval`) with Click, including interactive provider configuration, dataset pull/list/info/inspect/remove, agent listing, and LiteLLM proxy-based eval. - Add CLI entry point and commands (setup, dataset, agent, eval) - Add v2 dataset registry with parquet storage and v1 migration - Add DatasetMetadata, DatasetConfig types - Add click and simple-term-menu dependencies - Include registry/*.json as package data - Add tests for CLI commands and dataset registry migration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Introduce a two-stage eval pipeline that separates agent execution from evaluation. AgentFlows produce Episodes (trajectories without rewards), and Evaluators score them independently — enabling swappable evaluation logic, multiple signals, and diverse agent programs (multi-agent, ADK, OpenAI SDK). - Add AgentFlow/Evaluator protocols and AgentConfig, Signal, EvalOutput types - Add built-in evaluators: MathEvaluator, CountdownEvaluator, CodeEvaluator, F1Evaluator, CompoundEvaluator - Rewrite built-in agents as AgentFlow classes (no reward fn imports) - Update EvalRunner for two-stage pipeline (agent.run → evaluator.evaluate) - Add evaluator_loader with registry and catalog auto-resolution - Add signals to EvalItem/EvalResult, artifacts to Episode, auto-gen Episode.id - Add --evaluator CLI option with auto-resolve from datasets.json - Update eval-framework.md documentation - Add comprehensive tests (test_eval_types, test_evaluator_loader, test_eval_runner) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ucture Expand rllm eval framework from 5 to 19 supported benchmarks by adding MCQ (mmlu_pro, mmlu_redux, gpqa_diamond, supergpqa, ceval, mmmlu), math (hmmt), code (humaneval, mbpp, livecodebench), instruction following (ifeval, ifbench), long context (longbench_v2), and agentic (bfcl, multichallenge) benchmarks. All datasets pull from upstream HuggingFace repos with row-level transforms for field normalization. Key additions: - 14 dataset transform functions in rllm/data/transforms.py - 4 new agents: MCQ, IFEval, BFCL function-calling, multi-turn - 4 new evaluators: MCQ, IFEval (vendored verification), BFCL, LLM judge - Extended _pull.py with hf_config, field_map, and transform support - 216 tests passing across all new and existing components Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… keys Replace `rllm setup` with `rllm model setup/swap/show`. Config now stores per-provider API keys so swapping providers doesn't re-prompt for known keys. Add GPT-5 family, o3-pro, Gemini 3 family, and gemini-2.5-flash-lite to supported models. Old config format and `rllm setup` alias are preserved for backward compatibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ngBench configs New benchmarks: HMMT Nov 25, AA-LCR, HLE, MMLU-ProX, INCLUDE, Global PIQA, PolyMATH, WMT24++. Fix LongBench v2 (was QA+F1, now MCQ), fix HMMT split (test→train). Add aggregate_configs support in pull to merge all language configs into a single dataset with a language column. New infrastructure: reasoning agent (CoT), translation agent, LLM equality evaluator, ChrF translation evaluator. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

First vision-capable extension of the eval framework: - Image extraction in pull pipeline (PIL Images → disk PNGs, replaced with relative paths) - 3 VLM agent flows (vlm_mcq, vlm_math, vlm_open) with multimodal OpenAI API messages - 9 dataset transforms (MMMU, MMMU-Pro, MathVision, MathVista, DynaMath, ZEROBench, ZEROBench-Sub, VLMs Are Blind, BabyVision) - Registry entries for 10 datasets and 3 agents under new 'vlm' category - 54 new tests (agent flows, transforms, protocol conformance) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Implements `rllm train <benchmark>` that reuses the eval framework's AgentFlows and Evaluators to run RL training via the Tinker backend. Wraps AgentFlow + Evaluator into an agent_run_func and hands it to the experimental AgentTrainer. Key fixes beyond the initial implementation: - Inject session routing metadata into the base URL so plain OpenAI clients (used by AgentFlows) propagate session_uids to the LiteLLM proxy, enabling trace collection for training episodes. - Pass workflow timeout/gamma/reward_bonus_coeff through SdkWorkflowFactory.get_workflow_args() to prevent training hangs. - Set val_before_train=false in base.yaml default config. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace the two-hop architecture (LiteLLM proxy → TinkerBackendServer → TinkerEngine) with a single lightweight TinkerProxy that calls TinkerEngine directly in-process. This eliminates LiteLLM overhead and one HTTP round-trip per inference call during `rllm train`. - Add rllm/sdk/proxy/tinker_proxy.py: FastAPI server handling OpenAI- compatible chat completions, metadata-slug routing, and trace persistence via SqliteTracer - Rewrite TinkerProxyManager to start TinkerProxy instead of LiteLLM subprocess + TinkerBackendServer - Simplify SdkWorkflowFactory._setup_tinker_proxy() to match new API - Fix trace output format: nest token_ids/response_logprobs under provider_specific_fields so data_process.py extractors find them - Fix flush_tracer: use asyncio.to_thread for queue drain instead of loop.run_until_complete which fails inside a running event loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Defer torch, pandas, polars, litellm, and training module imports from top-level to point-of-use in CLI commands and Dataset class. Replace eager subcommand registration with a _LazyGroup that imports modules on first invocation. Update test mock patch paths to match new import sites. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…-point discovery, and dataset CLI - Agent loader: persistent registry (~/.rllm/agents.json), auto-instantiation of classes, entry-point discovery (rllm.agents group), register/unregister/list_agents API - Evaluator loader: persistent registry (~/.rllm/evaluators.json), entry-point discovery (rllm.evaluators group), register/unregister/list_evaluators API - Dataset CLI: add `rllm dataset register` command (JSON, JSONL, CSV, Parquet) - Agent CLI: show Source column (built-in/registered/plugin) in `rllm agent list` - Examples: agent_plugin (pyproject.toml entry points) and agent_python_api (Python API registration) - Tests: 62 new tests for agent loader, evaluator loader, and dataset commands Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace the PIL→PNG file pipeline with direct raw byte extraction from HuggingFace using Image(decode=False). Store binary image columns in Arrow IPC files instead of writing thousands of individual PNGs to disk. At eval time, base64-encode directly from in-memory bytes instead of reading files from disk. - Add _disable_image_decoding() for Image and Sequence(Image()) columns - Add _flatten_image_dicts() to extract bytes from HF image dicts - Add Arrow IPC save/load to DatasetRegistry with format-aware dispatch - Update VLM agents with _detect_mime_type() and _image_to_data_uri() to handle both bytes (new) and str paths (legacy) transparently - Update dataset inspect CLI to display <N bytes (image)> for binary data - Clean up legacy images/ directory on dataset removal Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Port upstream background worker (#419) and merge with our eval-specific additions (session_type, log_eval_result, error handling improvements).

feat(cli): port non-blocking UILogger, simplify --ui flag, and support eval UI logging

Replace LiteLLM proxy with rllm-model-gateway for AgentFlow-based training. Agents write standard OpenAI client code; the gateway transparently captures token IDs and logprobs for training via post-hoc enrichment. New files: - GatewayManager: manages gateway lifecycle (thread/process) and worker setup - AgentFlowEngine: runs AgentFlows in parallel with gateway trace capture - trace_converter: converts gateway TraceRecord → training Step Key changes: - CLI passes agent_flow + evaluator directly to AgentTrainer - UnifiedTrainer routes to AgentFlowEngine when agent_flow+evaluator provided - TinkerBackendServer outputs logprobs in vLLM standard format for gateway Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- New `rllm login` command that validates API key against UI backend - Persists key in ~/.rllm/config.json with 0o600 permissions - Shows login status if already logged in, --relogin to force - Handles pasted RLLM_API_KEY=... prefix gracefully - UILogger falls back to stored key (env var > config > None) - Fix raw string for banner to suppress SyntaxWarning

…training - Wrap create_session and get_traces in run_in_executor to prevent blocking the asyncio event loop in AgentFlowEngine - Preserve per-trajectory rewards from multi-agent evaluators - Widen Step.chat_completions type to dict[str, Any] for VLM content blocks - Reduce default train_batch_size from 64 to 32 to match CLI default - Downgrade per-task agent flow logs from INFO to DEBUG Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(cli): add `rllm login` command for UI authentication

Add optional async `arun` method to AgentFlow. All three execution paths (eval runner, gateway training engine, tinker CLI) now prefer `arun` when available, falling back to sync `run` in a thread executor. Centralizes dispatch logic in `run_agent_flow()` helper. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ntegrations Add SDK tracing integrations and plugin agent packages for three popular agent frameworks, enabling `rllm eval <benchmark> --agent {smolagents,strands,langgraph}`. SDK integrations (rllm/sdk/integrations/): - smolagents.py: RLLMSmolAgentsTracer — model wrapper that intercepts __call__ - langgraph.py: RLLMTrajectoryCallbackHandler — LangChain BaseCallbackHandler Plugin packages (plugins/): - smolagents_agent: ToolCallingAgent + OpenAIServerModel with VLM image support - strands_agent: Strands Agent + OpenAIModel with VLM ContentBlock support - langgraph_agent: LangGraph StateGraph + ChatOpenAI with native multimodal support All plugins follow the react_agent convention, adapt to any benchmark via TaskSpec, and are discovered via rllm.agents entry points. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The agent_run_func training path is redundant now that the CLI uses AgentFlow + Evaluator natively. This removes Path 2 (SdkWorkflowFactory) from UnifiedTrainer, AgentTrainer, and all launcher classes, along with the make_agent_run_func() bridge function and its tests. The underlying rllm/sdk/ modules are kept intact for standalone SDK usage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove unused variables, chain ImportError exceptions with `from err`, use `X | Y` union syntax in isinstance calls, rename ambiguous variable, and remove unused imports. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Replace plain ASCII banner with block-letter unicode art using a cyan-to-blue gradient, wrapped in a Rich Panel - Redesign `rllm dataset list` to show full catalog by default with datasets grouped by category, emoji icons, and color-coded status indicators (● pulled, ○ available, ◆ local) - Add `--local` flag to show only pulled datasets (replaces old default) - Use Rich Tables with rounded borders and consistent color theme Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace hardcoded provider dicts with a ProviderInfo registry supporting 14 providers: openai, anthropic, gemini, openrouter, deepseek, together, fireworks, groq, cerebras, xai, zhipu, kimi, minimax, and custom OpenAI-compatible endpoints. Add base_url config field for custom endpoints, use display labels in CLI menus, and route through correct LiteLLM prefixes. Custom provider bypasses LiteLLM proxy entirely. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Change --ui from a boolean flag to --ui/--no-ui with auto-detection. When neither is passed, UI logging is automatically enabled if the user has a stored ui_api_key (via `rllm login`) or RLLM_API_KEY env var. Users can explicitly disable with --no-ui. Applied to both eval and train commands. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove minimal, langchain, openai-agents, crewai, and google-adk templates from `rllm init`. The react template is now the sole option and is selected automatically without prompting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move agent plugins from plugins/ to agenthub/ for clearer naming. Archive legacy examples (geo3k_tinker, ocr) and remove outdated CLI examples (agent_plugin, agent_python_api). Update imports, docs, and pyproject.toml references accordingly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…nudge - Add progressive episode uploads during eval via on_episode_complete callback with thread-safe buffer (batch size 50) instead of sending all after run - Add POST /api/episodes/batch support in UILogger with fallback to individual POSTs - Print clickable wandb-style session URL on UILogger init (rllm-ui: View run at ...) with local dev detection (localhost → frontend port 5173) - Add registration nudge in eval/train CLI when user is not logged in - Add nudge in Tracking.__init__ when 'ui' not in logger list

Same pattern as episodes — POST /api/trajectory-groups/batch with fallback to individual POSTs for backward compatibility.

Action and SWEAction are now Pydantic BaseModels which require keyword arguments. Updated all positional argument instantiations across agents and workflows. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(ui): progressive batched uploads, session URL, and registration nudge

jeffreysijuntan and others added 30 commits February 25, 2026 00:33

Integrate SDK tracing with Google ADK

b55b3f6

add support for openai agents

1461878

add openai agents

87963f6

support rLLM SDK in unified trainer

5270f62

Optimize SDK trainer performance

b1d6bc0

unify data schema between SDK and rLLM

7eee3cd

deprecate some old examples

14f0571

Add math agent example with ADK

061dc72

moving some examples to archive

bdc7c33

Chanbinski and others added 21 commits March 5, 2026 18:15

docs(ui): update rLLM UI descriptions and nav title

725af2d

feat(tracking): add non-blocking background worker to UILogger

503f44b

Port upstream background worker (#419) and merge with our eval-specific additions (session_type, log_eval_result, error handling improvements).

Merge pull request #424 from Chanbinski/support-eval

9d636b3

feat(cli): port non-blocking UILogger, simplify --ui flag, and support eval UI logging

Merge branch 'main' of https://github.com/rllm-org/rllm into dev-cli

c555e7d

Merge pull request #425 from Chanbinski/feat/login

0f42892

feat(cli): add `rllm login` command for UI authentication

fix: remove unused mock_at_cls variables in train command tests

136280b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: resolve ruff linting errors across codebase

96ec517

Remove unused variables, chain ImportError exceptions with `from err`, use `X | Y` union syntax in isinstance calls, rename ambiguous variable, and remove unused imports. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

deprecate prior integrations

0a27d90

Merge branch 'main' of https://github.com/rllm-org/rllm into dev-cli

e19bf4b

Merge branch 'main' of https://github.com/rllm-org/rllm into dev-cli

8bfa069

jeffreysijuntan changed the title ~~[Feature]: CLI, AgentFlow Framework, Model Gateway & Plugin System~~ [Feature]: rLLM CLI, AgentFlow Framework, Model Gateway & Plugin System Mar 12, 2026

Chanbinski and others added 6 commits March 12, 2026 15:35

feat(ui): batch trajectory group uploads

ec80f2f

Same pattern as episodes — POST /api/trajectory-groups/batch with fallback to individual POSTs for backward compatibility.

clean up dependencies

b10d9b0

fix: use keyword arguments for Pydantic BaseModel constructors

9ac5f71

Action and SWEAction are now Pydantic BaseModels which require keyword arguments. Updated all positional argument instantiations across agents and workflows. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge pull request #440 from Chanbinski/feat/ui-logging-improvements

9797170

feat(ui): progressive batched uploads, session URL, and registration nudge

fix precommit errors

ad921f1

jeffreysijuntan marked this pull request as ready for review March 13, 2026 08:16

jeffreysijuntan merged commit 6f94c4c into main Mar 13, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: rLLM CLI, AgentFlow Framework, Model Gateway & Plugin System#438

[Feature]: rLLM CLI, AgentFlow Framework, Model Gateway & Plugin System#438
jeffreysijuntan merged 77 commits intomainfrom
dev-cli

jeffreysijuntan commented Mar 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jeffreysijuntan commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Test Plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jeffreysijuntan commented Mar 12, 2026 •

edited

Loading