[Jobs] Fix git commit metadata not captured in managed jobs#9184
[Jobs] Fix git commit metadata not captured in managed jobs#9184kevinzwang merged 3 commits intomasterfrom
Conversation
The dashboard's "Git Commit" field for managed jobs was not being populated reliably. Three issues: 1. Server-side re-validation overwrites client value: when the API server calls expand_and_validate_workdir() on the remapped blob directory (not a git repo), get_git_commit() returns None and overwrites the valid commit captured client-side. 2. No git commit captured from YAML file location: when launching with a YAML file tracked in a git repo but no workdir, load_chain_dag_from_yaml had no mechanism to capture the commit. 3. No git commit without workdir: bare CLI commands like `sky jobs launch -- echo hi` never captured any commit. Fix (1) by only writing git_commit when get_git_commit() returns a non-None value, so the server can't clobber the client's value. Fix (2) by capturing git commit from the YAML file's directory in load_chain_dag_from_yaml, with workdir commit taking priority. Tested: unit tests for all metadata capture paths and serialization round-trip; smoke test that launches a job and verifies git_commit appears in the queue response. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the traceability of managed jobs by ensuring that git commit metadata is consistently and correctly captured. It resolves a bug where server-side processing could lose client-provided commit information and introduces a mechanism to derive commit hashes from job YAML file locations, with clear rules for prioritizing different sources of commit data. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new feature to automatically capture and store the git commit hash associated with a task. The commit is determined either from the task's work directory or the location of its defining YAML file, and this metadata is then made available in the job queue. The changes include modifications to sky/task.py to handle workdir-based commit capture, updates to sky/utils/dag_utils.py for YAML-based commit capture, and the addition of comprehensive unit and smoke tests in tests/smoke_tests/test_managed_job.py and tests/unit_tests/test_sky/test_git_commit_metadata.py to validate this functionality. Feedback suggests improving the robustness of git commit discovery by using os.path.realpath for YAML file paths and refactoring a job record lookup loop for conciseness.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
/smoke-test -k test_managed_job_git_commit_in_queue |
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
lloyd-brown
left a comment
There was a problem hiding this comment.
Implementation looks good, verified that workdir commit takes priority over YAML file commit, just have one quick question I left inline.
|
@lloyd-brown thanks! could I get your approval for merge? |
|
/smoke-test -k test_managed_job_git_commit_in_queue |
Summary
Currently, git commit info for a managed job is only captured when a workdir is specified. This PR also adds it if the YAML is from a git repo. Workdir commit takes priority over YAML file commit when both are available.
Also fixes server-side
expand_and_validate_workdir()overwriting the client-captured git commit withNonewhen re-validating on a non-git blob directoryTest plan
test_git_commit_metadata.py): 11 tests covering workdir capture, YAML file capture, server-side re-validation preservation, serialization round-trip, and priority orderingtest_managed_job_git_commit_in_queue): launches a managed job from a YAML in a git repo and verifiesmetadata.git_commitin the queue response matches the repo HEADsky jobs launch test.yamlwith Tilt dev environment, confirmed git commit appears on dashboard managed jobs page🤖 Generated with Claude Code