Skip to content

fix(metrics): include dropped workflow episodes in denominators#442

Merged
jeffreysijuntan merged 1 commit intorllm-org:mainfrom
rajatbeladiya:fix/workflow-metrics-include-dropped-episodes-382
Mar 16, 2026
Merged

fix(metrics): include dropped workflow episodes in denominators#442
jeffreysijuntan merged 1 commit intorllm-org:mainfrom
rajatbeladiya:fix/workflow-metrics-include-dropped-episodes-382

Conversation

@rajatbeladiya
Copy link
Contributor

Issue #382 reports that workflow episodes that terminate before producing any steps (e.g. max_prompt_length_exceeded) are silently dropped (repeat_counts=0) and not counted in metrics, inflating pass@k and hiding termination reasons.

This PR:

  • Tracks dropped episodes in AgentWorkflowEngine.transform_results_for_verl and returns them in meta_info.dropped_episodes.
  • Updates AgentWorkflowTrainer to:
    • include dropped termination reasons in termination_counts
    • use num_tasks as denominator for termination metrics
    • account for dropped episodes in validation pass@k by treating them as incorrect.
  • Adds a regression test to ensure dropped episodes are surfaced in meta_info.

Fixes #382

Workflow episodes that terminate before producing any steps were previously dropped (repeat_counts=0) and excluded from termination and pass@k metrics, inflating accuracy and hiding max_prompt_length_exceeded. Track dropped episodes in meta_info and include them in trainer metrics.

Fixes rllm-org#382.
@jeffreysijuntan jeffreysijuntan merged commit 65f728d into rllm-org:main Mar 16, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Biased pass@1 metrics due to silent episode dropping

2 participants