[Teams Bot] Evaluation for AI search knowledge and the reference #13694

chunyu3 · 2026-01-23T05:11:15Z

Fix https://github.com/Azure/azure-sdk-pr/issues/2508
Fix https://github.com/Azure/azure-sdk-pr/issues/2382

implement an evaluator to evaluate the document completion
evaluate completion for knowledge return from AI search
evaluate references LLM refer when generate answer
implement suppression logic: excluded suppressed evaluators or testcases from failure counting

…bot-evals

Copilot

Pull request overview

Adds offline evaluation support for checking whether the bot’s generated answers include the expected references and retrieve the expected knowledge items from AI Search context.

Changes:

Extend the evaluation pipeline to run two new evaluators: reference_match and knowledge_match.
Update test datasets to provide structured expected_references and expected_knowledges (title/link pairs) instead of URL-only expectations.
Add a new AzureBotReferenceEvaluator and wire it into the eval runner + result reporting.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tools/sdk-ai-bots/offline-evaluation.yml	Adds `reference_match` and `knowledge_match` to the CI eval run.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_versioning.jsonl	Migrates test records to `expected_references` / `expected_knowledges`.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_typespec-client.jsonl	Migrates test record to `expected_references` / `expected_knowledges`.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_tsv-check.jsonl	Migrates test records to `expected_references` / `expected_knowledges`.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_tspconfig.jsonl	Migrates test record to `expected_references` / `expected_knowledges`.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_pipeline-failure.jsonl	Migrates test records to `expected_references` / `expected_knowledges`.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_model.jsonl	Migrates test records to `expected_references` / `expected_knowledges`.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_migration.jsonl	Migrates test records to `expected_references` / `expected_knowledges`.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_decorator.jsonl	Migrates test record to `expected_references` / `expected_knowledges`.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_core.jsonl	Migrates test records to `expected_references` / `expected_knowledges`.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/typespec_breaking_change.jsonl	Migrates test record to `expected_references` / `expected_knowledges`.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/python_core.jsonl	Migrates test records to `expected_references` / `expected_knowledges`.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/tests/onboarding_core.jsonl	Migrates test records to `expected_references` / `expected_knowledges`.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/results/typespec-test.json	Updates cached results but currently contains unresolved conflict markers.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/evals_run.py	Registers and wires `reference_match` + `knowledge_match` evaluators.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/eval/evaluator/azure_bot_reference_evaluator.py	New evaluator that compares expected vs actual reference/knowledge entries.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/eval/evaluator/azure_bot_evaluator.py	Removes old URL-based reference matching from the composite evaluator.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/eval/init.py	Exports `AzureBotReferenceEvaluator`.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/_evals_runner.py	Extracts `{title, link}` from bot references and from AI Search context for evaluation inputs.
tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/_evals_result.py	Updates result rendering to show expected/actual references + knowledges.

Comments suppressed due to low confidence (1)

tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/_evals_runner.py:29

The docstring for extract_title_and_link_from_references is now misleading: it says it returns a string array of links, but the function returns a list of {title, link} dicts. Please update the docstring/return description to match the new behavior.

def extract_title_and_link_from_references(references: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
    """
    Map an array of reference objects to a string array of their 'link' properties.

    Args:
        references: List of reference objects, each containing a 'link' field

    Returns:
        List of link strings extracted from the reference objects
    """

tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/eval/evaluator/azure_bot_evaluator.py

tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/results/typespec-test.json

tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/eval/evaluator/azure_bot_reference_evaluator.py

tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/_evals_runner.py

tools/sdk-ai-bots/azure-sdk-qa-bot-evaluation/eval/evaluator/azure_bot_reference_evaluator.py

chunyu3 added 13 commits December 18, 2025 10:30

bot reference evaluator

95793bf

Merge branch 'main' of https://github.com/Azure/azure-sdk-tools into …

a65cf4a

…bot-evals

Merge branch 'main' of https://github.com/Azure/azure-sdk-tools into …

a07d99e

…bot-evals

use title and url to refer a reference

8b15b6d

remove unused code

931257e

change url to link

4883380

Merge branch 'main' of https://github.com/Azure/azure-sdk-tools into …

4615d51

…bot-evals

add knowledges

b81a72f

record knowledges

8e01809

enable evaluation for knowledges from ai search

019df12

handle duplicate references

375afb4

handle matched reference in unexpected_references issue

d85dc4d

update the evaluation tests to include expected knowledge and references

25397c3

chunyu3 requested review from JiaqiZhang-Dev and lirenhe as code owners January 23, 2026 05:11

Copilot AI review requested due to automatic review settings January 23, 2026 05:11

chunyu3 changed the title ~~Evaluation for AI search knowledge and the reference~~ [Teams Bot] Evaluation for AI search knowledge and the reference Jan 23, 2026

Copilot started reviewing on behalf of chunyu3 January 23, 2026 05:11 View session

Copilot AI reviewed Jan 23, 2026

View reviewed changes

chunyu3 added 4 commits January 23, 2026 14:42

add suppress logic

8dfcadb

resolve comment

406444e

show warning

3722257

update test cases

259b0d7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Teams Bot] Evaluation for AI search knowledge and the reference #13694

[Teams Bot] Evaluation for AI search knowledge and the reference #13694

Uh oh!

chunyu3 commented Jan 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Teams Bot] Evaluation for AI search knowledge and the reference #13694

Are you sure you want to change the base?

[Teams Bot] Evaluation for AI search knowledge and the reference #13694

Uh oh!

Conversation

chunyu3 commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chunyu3 commented Jan 23, 2026 •

edited

Loading