feat: Add multi-model support for LLM-as-a-judge evaluator #1196

Chibionos · 2026-01-25T20:39:20Z

Summary

This PR adds support for multiple model providers (OpenAI, Anthropic, Gemini, etc.) to the LLM-as-a-judge evaluator, enabling evaluations to work with any model configured in AllowedNormalizedModels.

Changes

Normalized API Integration: Switches from OpenAI-specific API to uipath.llm.chat_completions (normalized API) for multi-vendor support
Model Detection: Detects OpenAI models to conditionally apply the response_format parameter (unsupported by Anthropic/Gemini)
Prompt Engineering: Adds explicit JSON format instructions in system prompt for non-OpenAI models
Robust JSON Parsing: Implements cleanup logic for markdown code blocks and extra text that non-OpenAI models may include
Enhanced Logging: Adds comprehensive debug and error logging to troubleshoot multi-vendor model integration

Testing Required

This PR requires testing with multiple model providers to ensure compatibility:

Test with OpenAI models (gpt-4, gpt-4o, etc.)
Test with Anthropic models (claude-3-5-sonnet, claude-3-opus, etc.)
Test with Google Gemini models (gemini-pro, gemini-ultra, etc.)
Verify JSON parsing works correctly with all model responses
Confirm error handling and logging work as expected

Test Files

Sample test configurations have been created in samples/calculator/evaluations/:

evaluators/llm-judge-semantic-similarity-claude.json
evaluators/llm-judge-semantic-similarity-gemini.json
evaluators/llm-judge-strict-json-similarity-claude.json
evaluators/llm-judge-strict-json-similarity-gemini.json
eval-sets/test-claude-evaluator.json
eval-sets/test-gemini-evaluator.json
eval-sets/test-gpt-evaluator.json

Backward Compatibility

✅ Fully backward compatible with existing OpenAI-based evaluators. No breaking changes.

🤖 Generated with Claude Code

This change enables the LLM-as-a-judge evaluator to work with multiple model providers (OpenAI, Anthropic, Gemini, etc.) instead of only OpenAI. Key changes: - Switch to normalized API (uipath.llm.chat_completions) for multi-vendor support - Detect OpenAI models to conditionally apply response_format parameter - Add explicit JSON format instructions for non-OpenAI models in system prompt - Implement robust JSON parsing with markdown code block cleanup - Add comprehensive error and debug logging for troubleshooting This allows evaluators to use Anthropic Claude, Google Gemini, and other models configured in AllowedNormalizedModels. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

- Fix bare except clause to catch Exception explicitly - Add type annotation for request_data dict to resolve mypy error - Apply ruff formatting fixes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

github-actions bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository labels Jan 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add multi-model support for LLM-as-a-judge evaluator #1196

feat: Add multi-model support for LLM-as-a-judge evaluator #1196

Chibionos commented Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Add multi-model support for LLM-as-a-judge evaluator #1196

Are you sure you want to change the base?

feat: Add multi-model support for LLM-as-a-judge evaluator #1196

Conversation

Chibionos commented Jan 25, 2026

Summary

Changes

Testing Required

Test Files

Backward Compatibility

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant