fix: snapshot workspace before parallel grading to prevent race condition#376
fix: snapshot workspace before parallel grading to prevent race condition#376ScuttleBot wants to merge 2 commits intopinchbench:mainfrom
Conversation
…tion When parallel grading is enabled, grade_task() runs in a background thread while the main thread prepares the next task's workspace. The background grader then reads the rebuilt workspace instead of the completed task's files, causing transcript mismatches and incorrect automated scores. Fix: before submitting grade_task to the ThreadPoolExecutor, deep-copy the workspace directory into an isolated temp directory via shutil.copytree. A shallow copy of execution_result with the snapshot path is passed to the background grader. The snapshot is cleaned up in _wait_for_pending_grade() after grading completes. The synchronous grading path is unchanged. Closes pinchbench#365
| workspace_path = result.get("workspace", "") | ||
| if workspace_path and os.path.isdir(workspace_path): | ||
| snapshot_dir = tempfile.mkdtemp(prefix="pinchbench_grade_") | ||
| shutil.copytree(workspace_path, os.path.join(snapshot_dir, "ws"), dirs_exist_ok=True) |
There was a problem hiding this comment.
WARNING: shutil.copytree is not wrapped in a try/except here. If it raises (e.g. disk full, broken symlink, permission error), the exception propagates uncaught through the task loop and crashes the entire benchmark run — losing all results collected so far. The temp dir created on the line above would also be leaked.
Consider wrapping the snapshot block in a try/except that falls back to using the original result (accepting the small race-condition risk) rather than aborting the run:
try:
snapshot_dir = tempfile.mkdtemp(prefix="pinchbench_grade_")
shutil.copytree(workspace_path, os.path.join(snapshot_dir, "ws"), dirs_exist_ok=True)
snapshot_workspace = os.path.join(snapshot_dir, "ws")
grade_result_copy = dict(result)
grade_result_copy["workspace"] = snapshot_workspace
grade_kwargs["execution_result"] = grade_result_copy
pending_grade_snapshot_dir = snapshot_dir
except Exception as exc:
logger.warning("Workspace snapshot failed, grading with live path: %s", exc)
pending_grade_snapshot_dir = None
Code Review SummaryStatus: 1 Issue Found | Recommendation: Address before merge Overview
Issue Details (click to expand)WARNING
Files Reviewed (2 files)
The race-condition fix is well-reasoned and correctly scoped. The snapshot + cleanup pattern is clean. The only concern is the missing error guard around Fix these issues in Kilo Cloud Reviewed by claude-4.6-sonnet-20260217 · 159,069 tokens |
|
Closing in favor of #372 which has a cleaner implementation (dedicated helper function, better documentation, no unrelated README changes). Both PRs solved the same race condition the same way. |
Problem
When parallel grading is enabled (
ThreadPoolExecutor, introduced in #351),grade_task()runs in a background thread while the main thread immediately moves on to the next task and callsprepare_task_workspace(), which rebuilds the shared agent workspace directory. The background grader then reads the next task's workspace files instead of the current one's, causing:task_sanityscoring 0 because its workspace state is gone)Fix
Before submitting
grade_taskto theThreadPoolExecutor, snapshot the workspace directory into an isolated temp directory viashutil.copytree. A shallow copy ofexecution_resultwith the snapshot path replaces the live workspace reference, so the background grader reads stable files regardless of what the main thread does next.Cleanup happens in
_wait_for_pending_grade()after grading completes (shutil.rmtree).The synchronous grading path is unchanged.
Changes
shutilandtempfileimports_snapshot_workspace_for_gradinglogic inline in the parallel submission blockpending_grade_snapshot_dirtracking variable_wait_for_pending_grade()Closes #365