Skip to content

Conversation

@chunyu3
Copy link
Member

@chunyu3 chunyu3 commented Jan 23, 2026

Fix https://github.com/Azure/azure-sdk-pr/issues/2508
Fix https://github.com/Azure/azure-sdk-pr/issues/2382

  • implement an evaluator to evaluate the document completion
  • evaluate completion for knowledge return from AI search
  • evaluate references LLM refer when generate answer
  • implement suppression logic: excluded suppressed evaluators or testcases from failure counting

Copilot AI review requested due to automatic review settings January 23, 2026 05:11
@chunyu3 chunyu3 changed the title Evaluation for AI search knowledge and the reference [Teams Bot] Evaluation for AI search knowledge and the reference Jan 23, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds offline evaluation support for checking whether the bot’s generated answers include the expected references and retrieve the expected knowledge items from AI Search context.

Changes:

  • Extend the evaluation pipeline to run two new evaluators: reference_match and knowledge_match.
  • Update test datasets to provide structured expected_references and expected_knowledges (title/link pairs) instead of URL-only expectations.
  • Add a new AzureBotReferenceEvaluator and wire it into the eval runner + result reporting.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tools/sdk-ai-bots/offline-evaluation.yml Adds reference_match and knowledge_match to the CI eval run.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_versioning.jsonl Migrates test records to expected_references / expected_knowledges.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_typespec-client.jsonl Migrates test record to expected_references / expected_knowledges.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_tsv-check.jsonl Migrates test records to expected_references / expected_knowledges.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_tspconfig.jsonl Migrates test record to expected_references / expected_knowledges.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_pipeline-failure.jsonl Migrates test records to expected_references / expected_knowledges.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_model.jsonl Migrates test records to expected_references / expected_knowledges.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_migration.jsonl Migrates test records to expected_references / expected_knowledges.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_decorator.jsonl Migrates test record to expected_references / expected_knowledges.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_core.jsonl Migrates test records to expected_references / expected_knowledges.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_breaking_change.jsonl Migrates test record to expected_references / expected_knowledges.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/python_core.jsonl Migrates test records to expected_references / expected_knowledges.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/onboarding_core.jsonl Migrates test records to expected_references / expected_knowledges.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/results/typespec-test.json Updates cached results but currently contains unresolved conflict markers.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/evals_run.py Registers and wires reference_match + knowledge_match evaluators.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/eval/evaluator/azure_bot_reference_evaluator.py New evaluator that compares expected vs actual reference/knowledge entries.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/eval/evaluator/azure_bot_evaluator.py Removes old URL-based reference matching from the composite evaluator.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/eval/init.py Exports AzureBotReferenceEvaluator.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/_evals_runner.py Extracts {title, link} from bot references and from AI Search context for evaluation inputs.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/_evals_result.py Updates result rendering to show expected/actual references + knowledges.
Comments suppressed due to low confidence (1)

tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/_evals_runner.py:29

  • The docstring for extract_title_and_link_from_references is now misleading: it says it returns a string array of links, but the function returns a list of {title, link} dicts. Please update the docstring/return description to match the new behavior.
def extract_title_and_link_from_references(references: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
    """
    Map an array of reference objects to a string array of their 'link' properties.

    Args:
        references: List of reference objects, each containing a 'link' field

    Returns:
        List of link strings extracted from the reference objects
    """

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant