Skip to content

Conversation

@Chibionos
Copy link
Contributor

Summary

This PR adds support for multiple model providers (OpenAI, Anthropic, Gemini, etc.) to the LLM-as-a-judge evaluator, enabling evaluations to work with any model configured in AllowedNormalizedModels.

Changes

  • Normalized API Integration: Switches from OpenAI-specific API to uipath.llm.chat_completions (normalized API) for multi-vendor support
  • Model Detection: Detects OpenAI models to conditionally apply the response_format parameter (unsupported by Anthropic/Gemini)
  • Prompt Engineering: Adds explicit JSON format instructions in system prompt for non-OpenAI models
  • Robust JSON Parsing: Implements cleanup logic for markdown code blocks and extra text that non-OpenAI models may include
  • Enhanced Logging: Adds comprehensive debug and error logging to troubleshoot multi-vendor model integration

Testing Required

This PR requires testing with multiple model providers to ensure compatibility:

  • Test with OpenAI models (gpt-4, gpt-4o, etc.)
  • Test with Anthropic models (claude-3-5-sonnet, claude-3-opus, etc.)
  • Test with Google Gemini models (gemini-pro, gemini-ultra, etc.)
  • Verify JSON parsing works correctly with all model responses
  • Confirm error handling and logging work as expected

Test Files

Sample test configurations have been created in samples/calculator/evaluations/:

  • evaluators/llm-judge-semantic-similarity-claude.json
  • evaluators/llm-judge-semantic-similarity-gemini.json
  • evaluators/llm-judge-strict-json-similarity-claude.json
  • evaluators/llm-judge-strict-json-similarity-gemini.json
  • eval-sets/test-claude-evaluator.json
  • eval-sets/test-gemini-evaluator.json
  • eval-sets/test-gpt-evaluator.json

Backward Compatibility

✅ Fully backward compatible with existing OpenAI-based evaluators. No breaking changes.

🤖 Generated with Claude Code

This change enables the LLM-as-a-judge evaluator to work with multiple
model providers (OpenAI, Anthropic, Gemini, etc.) instead of only OpenAI.

Key changes:
- Switch to normalized API (uipath.llm.chat_completions) for multi-vendor support
- Detect OpenAI models to conditionally apply response_format parameter
- Add explicit JSON format instructions for non-OpenAI models in system prompt
- Implement robust JSON parsing with markdown code block cleanup
- Add comprehensive error and debug logging for troubleshooting

This allows evaluators to use Anthropic Claude, Google Gemini, and other
models configured in AllowedNormalizedModels.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@github-actions github-actions bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository labels Jan 25, 2026
- Fix bare except clause to catch Exception explicitly
- Add type annotation for request_data dict to resolve mypy error
- Apply ruff formatting fixes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant