Skip to content

Python: AzureAISearchContextProvider agentic mode: source_data is always None on references — missing include_reference_source_data parameter #5095

@dantelmomsft

Description

@dantelmomsft

Bug Report

Description

When using AzureAISearchContextProvider in agentic mode, the source_data field on KnowledgeBaseReference objects in the retrieval response is always None, even when the knowledge source has source_data_fields correctly configured (e.g. id, content, source_file_name, offset, page_number).

This prevents consumers from accessing structured citation metadata (document names, page numbers, content excerpts) from agentic retrieval references.

Root Cause

The Azure AI Search agentic retrieval API requires callers to opt-in to receiving source data on references at request time by passing include_reference_source_data=True via knowledge_source_params on the KnowledgeBaseRetrievalRequest.

The source_data_fields on the knowledge source definition only controls which fields to return — the caller must also set this runtime flag per request.

_agentic_search() (line ~808 in _context_provider.py) builds the KnowledgeBaseRetrievalRequest without knowledge_source_params, so the flag is never set:

# Current code — no knowledge_source_params
retrieval_request = KnowledgeBaseRetrievalRequest(
    messages=kb_messages,
    retrieval_reasoning_effort=reasoning_effort,
    output_mode=output_mode,
    include_activity=True,
    # knowledge_source_params is missing
)

Expected Behavior

When a knowledge source has source_data_fields configured, ref.source_data on each KnowledgeBaseSearchIndexReference should contain a dict with the configured fields, e.g.:

{
    'id': 'aHR0cHM6Ly8...',
    'content': 'Overview\nIntroducing PerksPlus...',
    'source_file_name': 'integration-test/PerksPlus.pdf',
    'offset': 0,
    'page_number': 3,
}

Actual Behavior

ref.source_data is always None for every reference returned by agentic retrieval.

Steps to Reproduce

  1. Create a search index with a knowledge source that has source_data_fields configured
  2. Create a knowledge base referencing that knowledge source
  3. Use AzureAISearchContextProvider in agentic mode with knowledge_base_name pointing to the existing KB
  4. Run a query — inspect ref.source_data on the returned references → always None

Verified Fix

Passing knowledge_source_params with include_reference_source_data=True on the retrieval request resolves the issue:

from azure.search.documents.knowledgebases.models import SearchIndexKnowledgeSourceParams

retrieval_request = KnowledgeBaseRetrievalRequest(
    messages=kb_messages,
    retrieval_reasoning_effort=reasoning_effort,
    output_mode=output_mode,
    include_activity=True,
    knowledge_source_params=[
        SearchIndexKnowledgeSourceParams(
            knowledge_source_name="<knowledge-source-name>",
            include_reference_source_data=True,
        )
    ],
)

With this change, ref.source_data is correctly populated with all configured fields.

Suggested Enhancement

AzureAISearchContextProvider should:

  1. Accept an optional knowledge_source_name (or knowledge_source_names: list[str]) parameter
  2. Accept an optional include_reference_source_data: bool = False parameter
  3. When enabled, inject knowledge_source_params with include_reference_source_data=True into the KnowledgeBaseRetrievalRequest built by _agentic_search()

Note: when _use_existing_knowledge_base=False (auto-creation path), the knowledge source name is already computed as f"{self.index_name}-source" and could be reused automatically.

Additionally, _parse_references_to_annotations() already handles source_data correctly:

if ref.source_data:
    extra["source_data"] = ref.source_data

So the downstream parsing is ready — only the request construction is missing the opt-in flag.

Current Workaround

Subclass AzureAISearchContextProvider and override _agentic_search to inject the parameter:

from agent_framework.azure import AzureAISearchContextProvider
from azure.search.documents.knowledgebases.models import (
    KnowledgeBaseRetrievalRequest,
    SearchIndexKnowledgeSourceParams,
)

class AgenticSearchContextProvider(AzureAISearchContextProvider):
    def __init__(self, *, knowledge_source_name: str | None = None, **kwargs):
        super().__init__(**kwargs)
        self._knowledge_source_name = knowledge_source_name

    async def _agentic_search(self, messages):
        # ... same as parent but with knowledge_source_params added

Environment

  • agent-framework: 1.0.0
  • agent-framework-azure-ai-search: 1.0.0b260402
  • azure-search-documents: 11.7.0b2
  • API version: 2025-11-01-preview
  • Python: 3.13
  • OS: Windows 11

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions