Identify working alternative instructions through semantic variation testing#16
Draft
shailja-thakur wants to merge 1 commit intogenerative-computing:mainfrom
Conversation
…placement - Add variation_types parameter to run_benchdrift_pipeline() to allow users to customize which semantic variation types to generate (generic, cluster_variations, persona, long_context) - Update test/2_test_instruction_replacement.py to demonstrate variation_types usage - Add docs/INSTRUCTION_REPLACEMENT.md with comprehensive documentation for instruction replacement workflow - Enables discovery of validated alternative instructions with customizable variation strategies 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Member
|
Marking as draft; please mark ready for review when benchdrift is open-sourced. |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🔴 Enforce conventional commitThis rule is failing.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR enables discovering validated alternative instructions by testing Mellea m-programs against semantically equivalent variations of a problem. Users can find alternative phrasings that work reliably with their m-programs.
What This Enables
Key Components
run_benchdrift_pipeline(): Orchestrates semantic variation generation and m-program testingextract_replacement_instructions(): Extracts variations where m-program succeededBenefits