Python: Fix FoundryAgent telemetry gaps for name and model (Fixes #5088)#5094
Python: Fix FoundryAgent telemetry gaps for name and model (Fixes #5088)#5094Charanvardhan wants to merge 2 commits intomicrosoft:mainfrom
Conversation
…rosoft#5088) Resolves gen_ai.agent.name rendering as UUID and gen_ai.request.model rendering as 'unknown' by falling back to the client agent definition and lazily resolving the Azure AI Projects SDK request model on first fetch.
|
@Charanvardhan please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
There was a problem hiding this comment.
Pull request overview
This PR aims to close telemetry gaps in the Python Foundry agent integration by ensuring gen_ai.agent.name uses a stable, human-readable fallback (agent name) and by resolving gen_ai.request.model from the Azure AI Projects agent definition (rather than reporting "unknown").
Changes:
- Add a fallback for
FoundryAgent.nameto use the underlying client’sagent_namewhenname=None. - Add lazy model resolution via
project_client.agents.get_agent(...)and attempt to patch the current OTel span with the resolved model. - Add a unit test validating the name fallback and lazy model fetch behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
python/packages/foundry/agent_framework_foundry/_agent.py |
Adds name fallback behavior and introduces lazy model resolution + span mutation logic. |
python/packages/foundry/tests/foundry/test_foundry_agent.py |
Adds a test covering the new name/model telemetry defaults behavior. |
|
|
||
| @model.setter | ||
| def model(self, value: str) -> None: | ||
| pass |
There was a problem hiding this comment.
model is defined as a property, but the setter is a no-op. Because RawFoundryAgentChatClient subclasses RawOpenAIChatClient, RawOpenAIChatClient.__init__ assigns self.model = ... (openai/_chat_client.py:446); with the current no-op setter, that assignment is silently discarded, and any later client.model = ... will also do nothing. Implement the setter to persist the value (e.g., set the backing field used by the getter) or remove the property and use a normal attribute/backing field so assignments behave correctly.
| pass | |
| self._fetched_model = value |
| # Lazily fetch the model name if not already cached | ||
| if not hasattr(self, "_fetched_model"): | ||
| try: | ||
| agent = await self.project_client.agents.get_agent(self.agent_name) | ||
| self._fetched_model = getattr(agent, "model", "unknown") | ||
| except Exception: | ||
| self._fetched_model = "unknown" | ||
|
|
There was a problem hiding this comment.
The lazy model fetch uses if not hasattr(self, "_fetched_model") and then awaits project_client.agents.get_agent(...). If two calls enter _prepare_options concurrently before _fetched_model is set, both will perform the network call. Consider guarding the lazy initialization with an asyncio.Lock or an in-flight Task/Future so only one fetch occurs and others await it.
| try: | ||
| agent = await self.project_client.agents.get_agent(self.agent_name) | ||
| self._fetched_model = getattr(agent, "model", "unknown") | ||
| except Exception: | ||
| self._fetched_model = "unknown" | ||
|
|
||
| # Try to update the current OpenTelemetry span directly | ||
| if hasattr(self, "_fetched_model") and self._fetched_model != "unknown": | ||
| try: | ||
| from opentelemetry import trace | ||
| from agent_framework.observability import OtelAttr | ||
|
|
||
| current_span = trace.get_current_span() | ||
| if current_span and current_span.is_recording(): | ||
| current_span.set_attribute(OtelAttr.REQUEST_MODEL, self._fetched_model) | ||
| span_name_parts = current_span.name.split(" ", 1) | ||
| if len(span_name_parts) > 0: | ||
| current_span.update_name(f"{span_name_parts[0]} {self._fetched_model}") | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
The except Exception blocks here swallow all failures (including auth/config/service errors) and then suppress them again when updating OTEL, with no logging. This can make real regressions very hard to diagnose and permanently cache "unknown" after a transient error. Prefer catching expected exception types (e.g., ImportError separately, and the specific Azure SDK error types for get_agent) and logging at least a debug/warning message once when resolution fails.
| # Try to update the current OpenTelemetry span directly | ||
| if hasattr(self, "_fetched_model") and self._fetched_model != "unknown": | ||
| try: | ||
| from opentelemetry import trace | ||
| from agent_framework.observability import OtelAttr | ||
|
|
||
| current_span = trace.get_current_span() | ||
| if current_span and current_span.is_recording(): | ||
| current_span.set_attribute(OtelAttr.REQUEST_MODEL, self._fetched_model) | ||
| span_name_parts = current_span.name.split(" ", 1) | ||
| if len(span_name_parts) > 0: | ||
| current_span.update_name(f"{span_name_parts[0]} {self._fetched_model}") | ||
| except Exception: |
There was a problem hiding this comment.
Updating the OpenTelemetry span via trace.get_current_span() won’t work for streaming requests: ChatTelemetryLayer intentionally creates streaming spans without context attachment (core/agent_framework/observability.py:1293-1299), so there may be no “current span” to update. As a result, gen_ai.request.model can still remain unknown for stream=True. Consider a mechanism that ensures the span used for the request is the one being updated (e.g., providing the model before span creation, or passing/propagating the span explicitly).
| agent = FoundryAgent( | ||
| project_client=mock_project, | ||
| agent_name="my-telemetry-agent", | ||
| name=None # Explicitly None to test fallback |
There was a problem hiding this comment.
This multiline FoundryAgent(...) call is missing a trailing comma after the last argument. In this repo most multiline call sites include trailing commas (and formatters like Black will typically add them), so adding it will avoid churn / formatting diffs.
| name=None # Explicitly None to test fallback | |
| name=None, # Explicitly None to test fallback |
|
@copilot apply changes based on the comments in this thread |
Motivation and Context
This PR addresses two telemetry gaps present in the
FoundryAgentimplementation where essential OpenTelemetry instrumentation fields were either rendering erroneously as a UUID or not populating at all.Fixes #5088
Description
This change ensures that
gen_ai.agent.nameandgen_ai.request.modeloutput appropriately for telemetry and Application Insights tracing._agent.py,RawFoundryAgent.__init__now correctly falls back togetattr(client, 'agent_name')ifnameis omitted, rather than allowing the base classes to arbitrarily generate a randomUUID.RawFoundryAgentChatClientnow implements an asynchronous lazy invocation toproject_client.agents.get_agentinside_prepare_options. It securely caches the return response locally, directly updates theopentelemetrycurrent span dynamically before tracing is flushed, and exposes a.modelproperty for all subsequent calls.test_foundry_agent_telemetry_defaultstest usingAsyncMockto rigorously assert these edge configurations.Contribution Checklist