[Jobs] Allow custom job recovery strategy configuration#9154
[Jobs] Allow custom job recovery strategy configuration#9154Michaelvll wants to merge 2 commits intomasterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the job recovery system by making it more extensible and flexible. It allows for the definition and handling of custom, strategy-specific configuration parameters within the job recovery schema, and ensures that client-side validation gracefully handles strategies that might only be known to the server. This change primarily supports plugin-based extensions to job recovery mechanisms without requiring core code modifications for each new strategy. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to allow custom job recovery strategy configurations. The changes include adding a registration function for strategy-specific schema properties, a new method in StrategyExecutor to handle these custom configurations, and making the client-side validation of strategy names more lenient. The implementation is clean and the plugin-based approach for extending the schema is a good design choice. The changes look correct and well-integrated.
402ab3e to
06b34a0
Compare
Add extension points so plugins can register custom recovery strategies with strategy-specific configuration: 1. schemas.py: Add register_job_recovery_property() for plugins to extend the job_recovery JSON schema with custom fields while keeping additionalProperties: False. 2. recovery_strategy.py: Add set_strategy_config() method on StrategyExecutor base class. After make() pops common keys (strategy, max_restarts_on_errors, recover_on_exit_codes), remaining dict keys are passed to the executor via set_strategy_config(). Subclasses override to handle strategy-specific config. 3. resources.py: Make _try_validate_managed_job_attributes lenient for unknown strategy names, deferring validation to the server where plugins may have registered additional strategies. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
06b34a0 to
e9fd20f
Compare
Add warm_nodes field to ProvisionConfig. When set, the K8s provisioner: - Creates additional warm pods with skypilot.co/role=warm label - Only waits for active pods to be Running (warm pods provision async) - Skips Ray worker start on warm pods in instance_setup Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
register_job_recovery_property()inschemas.pyso additional strategy-specific fields can be registered for thejob_recoveryschema while keepingadditionalProperties: Falseset_strategy_config()onStrategyExecutor—make()now passes remaining dict keys (after common ones likestrategy,max_restarts_on_errors,recover_on_exit_codes) to the executor via this method. Subclasses can override to accept custom parameters._try_validate_managed_job_attributeslenient for strategy names not yet in the registry, deferring full validation to the serverTest plan
set_strategy_config()is a no-op for empty config by default🤖 Generated with Claude Code