Skip to content

[Jobs] Fix git commit metadata not captured in managed jobs#9184

Merged
kevinzwang merged 3 commits intomasterfrom
kev/fix-git-commit-metadata
Mar 25, 2026
Merged

[Jobs] Fix git commit metadata not captured in managed jobs#9184
kevinzwang merged 3 commits intomasterfrom
kev/fix-git-commit-metadata

Conversation

@kevinzwang
Copy link
Collaborator

@kevinzwang kevinzwang commented Mar 25, 2026

Summary

Currently, git commit info for a managed job is only captured when a workdir is specified. This PR also adds it if the YAML is from a git repo. Workdir commit takes priority over YAML file commit when both are available.

Also fixes server-side expand_and_validate_workdir() overwriting the client-captured git commit with None when re-validating on a non-git blob directory

Test plan

  • Unit tests (test_git_commit_metadata.py): 11 tests covering workdir capture, YAML file capture, server-side re-validation preservation, serialization round-trip, and priority ordering
  • Smoke test (test_managed_job_git_commit_in_queue): launches a managed job from a YAML in a git repo and verifies metadata.git_commit in the queue response matches the repo HEAD
  • Manual: sky jobs launch test.yaml with Tilt dev environment, confirmed git commit appears on dashboard managed jobs page

🤖 Generated with Claude Code

The dashboard's "Git Commit" field for managed jobs was not being
populated reliably. Three issues:

1. Server-side re-validation overwrites client value: when the API
   server calls expand_and_validate_workdir() on the remapped blob
   directory (not a git repo), get_git_commit() returns None and
   overwrites the valid commit captured client-side.

2. No git commit captured from YAML file location: when launching
   with a YAML file tracked in a git repo but no workdir,
   load_chain_dag_from_yaml had no mechanism to capture the commit.

3. No git commit without workdir: bare CLI commands like
   `sky jobs launch -- echo hi` never captured any commit.

Fix (1) by only writing git_commit when get_git_commit() returns a
non-None value, so the server can't clobber the client's value.
Fix (2) by capturing git commit from the YAML file's directory in
load_chain_dag_from_yaml, with workdir commit taking priority.

Tested: unit tests for all metadata capture paths and serialization
round-trip; smoke test that launches a job and verifies git_commit
appears in the queue response.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the traceability of managed jobs by ensuring that git commit metadata is consistently and correctly captured. It resolves a bug where server-side processing could lose client-provided commit information and introduces a mechanism to derive commit hashes from job YAML file locations, with clear rules for prioritizing different sources of commit data.

Highlights

  • Server-side Git Commit Preservation: Fixed an issue where the server-side expand_and_validate_workdir() could inadvertently overwrite a client-captured git commit with None during re-validation, especially when the work directory was not a git repository on the server.
  • YAML-based Git Commit Capture: Implemented logic to capture the git commit from the directory containing the YAML file when load_chain_dag_from_yaml is used, ensuring jobs launched from YAML in a git repo record the commit even without an explicit workdir.
  • Git Commit Priority: Established a clear priority rule: if both a workdir and a YAML file provide a git commit, the workdir's commit will take precedence.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new feature to automatically capture and store the git commit hash associated with a task. The commit is determined either from the task's work directory or the location of its defining YAML file, and this metadata is then made available in the job queue. The changes include modifications to sky/task.py to handle workdir-based commit capture, updates to sky/utils/dag_utils.py for YAML-based commit capture, and the addition of comprehensive unit and smoke tests in tests/smoke_tests/test_managed_job.py and tests/unit_tests/test_sky/test_git_commit_metadata.py to validate this functionality. Feedback suggests improving the robustness of git commit discovery by using os.path.realpath for YAML file paths and refactoring a job record lookup loop for conciseness.

@kevinzwang
Copy link
Collaborator Author

/smoke-test -k test_managed_job_git_commit_in_queue

Copy link
Collaborator

@lloyd-brown lloyd-brown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks good, verified that workdir commit takes priority over YAML file commit, just have one quick question I left inline.

@kevinzwang
Copy link
Collaborator Author

@lloyd-brown thanks! could I get your approval for merge?

@kevinzwang kevinzwang enabled auto-merge (squash) March 25, 2026 21:47
@lloyd-brown
Copy link
Collaborator

/smoke-test -k test_managed_job_git_commit_in_queue

@kevinzwang kevinzwang requested a review from lloyd-brown March 25, 2026 21:56
@kevinzwang kevinzwang merged commit ada7f22 into master Mar 25, 2026
25 of 31 checks passed
@kevinzwang kevinzwang deleted the kev/fix-git-commit-metadata branch March 25, 2026 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants