-
Notifications
You must be signed in to change notification settings - Fork 229
[Teams Bot] Evaluation for AI search knowledge and the reference #13694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds offline evaluation support for checking whether the bot’s generated answers include the expected references and retrieve the expected knowledge items from AI Search context.
Changes:
- Extend the evaluation pipeline to run two new evaluators:
reference_matchandknowledge_match. - Update test datasets to provide structured
expected_referencesandexpected_knowledges(title/link pairs) instead of URL-only expectations. - Add a new
AzureBotReferenceEvaluatorand wire it into the eval runner + result reporting.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/sdk-ai-bots/offline-evaluation.yml | Adds reference_match and knowledge_match to the CI eval run. |
| tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_versioning.jsonl | Migrates test records to expected_references / expected_knowledges. |
| tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_typespec-client.jsonl | Migrates test record to expected_references / expected_knowledges. |
| tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_tsv-check.jsonl | Migrates test records to expected_references / expected_knowledges. |
| tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_tspconfig.jsonl | Migrates test record to expected_references / expected_knowledges. |
| tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_pipeline-failure.jsonl | Migrates test records to expected_references / expected_knowledges. |
| tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_model.jsonl | Migrates test records to expected_references / expected_knowledges. |
| tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_migration.jsonl | Migrates test records to expected_references / expected_knowledges. |
| tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_decorator.jsonl | Migrates test record to expected_references / expected_knowledges. |
| tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_core.jsonl | Migrates test records to expected_references / expected_knowledges. |
| tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_breaking_change.jsonl | Migrates test record to expected_references / expected_knowledges. |
| tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/python_core.jsonl | Migrates test records to expected_references / expected_knowledges. |
| tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/onboarding_core.jsonl | Migrates test records to expected_references / expected_knowledges. |
| tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/results/typespec-test.json | Updates cached results but currently contains unresolved conflict markers. |
| tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/evals_run.py | Registers and wires reference_match + knowledge_match evaluators. |
| tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/eval/evaluator/azure_bot_reference_evaluator.py | New evaluator that compares expected vs actual reference/knowledge entries. |
| tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/eval/evaluator/azure_bot_evaluator.py | Removes old URL-based reference matching from the composite evaluator. |
| tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/eval/init.py | Exports AzureBotReferenceEvaluator. |
| tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/_evals_runner.py | Extracts {title, link} from bot references and from AI Search context for evaluation inputs. |
| tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/_evals_result.py | Updates result rendering to show expected/actual references + knowledges. |
Comments suppressed due to low confidence (1)
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/_evals_runner.py:29
- The docstring for
extract_title_and_link_from_referencesis now misleading: it says it returns a string array of links, but the function returns a list of{title, link}dicts. Please update the docstring/return description to match the new behavior.
def extract_title_and_link_from_references(references: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""
Map an array of reference objects to a string array of their 'link' properties.
Args:
references: List of reference objects, each containing a 'link' field
Returns:
List of link strings extracted from the reference objects
"""
Fix https://github.com/Azure/azure-sdk-pr/issues/2508
Fix https://github.com/Azure/azure-sdk-pr/issues/2382