Link LLM evals to model inventory#3725
Merged
gorkem-bwl merged 12 commits intodevelopfrom Apr 14, 2026
Merged
Conversation
Spec for linking LLM evaluations (experiments + bias audits) to model inventory records via an optional dropdown at eval creation time. Includes evaluations tab on model detail page and risk nudge for flagged results.
9 tasks covering: Alembic migration, EvalServer CRUD + API updates, frontend dropdowns on both experiment and bias audit modals, Servers endpoint for model evaluations, Evaluations tab on model detail page with risk nudge banner, and unlinked indicators on eval lists.
## Changes - bias_audits.py: add model_inventory_id param to create_bias_audit, include in INSERT, all SELECT/RETURNING clauses, and _row_to_dict (as modelInventoryId) - evaluation_logs.py: add model_inventory_id param to create_experiment, include in INSERT/RETURNING, get_experiment_by_id SELECT/return dict, and get_experiments SELECT/return dict
## Changes - Added model_inventory_id field to Experiment interface and createExperiment method in evaluationLogsService.ts - Added getAllEntities import to NewExperimentModal for fetching model inventory - Added modelInventories and selectedModelInventoryId state variables - Added useEffect to fetch model inventories when modal opens - Added optional "Link to model inventory" dropdown in Step 1 (model config section) - Included model_inventory_id in the experiment creation payload - Reset selectedModelInventoryId on form reset ## Benefits - Users can now link experiments to a specific model in their inventory - Dropdown is optional — defaults to None if not selected - Non-critical fetch failure is silently handled (dropdown stays empty)
## Changes - Added `modelInventoryId?: number` to `CreateBiasAuditConfig` interface in biasAuditService.ts - Imported `getAllEntities` from entity.repository in NewBiasAuditModal - Added state for model inventories list and selected model inventory ID - Added useEffect to fetch model inventories when modal opens - Added model inventory link Select dropdown in Step 2 (system info step) - Included `modelInventoryId` in the submit config - Reset `selectedModelInventoryId` on modal close
## Changes - Create Servers/utils/modelEvaluations.utils.ts: raw SQL queries for llm_evals_experiments and llm_evals_bias_audits filtered by model_inventory_id + organization_id - Create Servers/controllers/modelEvaluations.ctrl.ts: thin controller with logProcessing/logSuccess/logFailure and org-scoped auth - Modify Servers/routes/modelInventory.route.ts: register GET /:id/evaluations before /:id to prevent param shadowing ## Benefits - Enables the model inventory page to show linked evaluations and bias audits in a single authenticated, org-isolated request
## Changes - Added getAllLinkedEvaluations() util joining llm_evals_experiments and llm_evals_bias_audits with model_inventories - Added getAllModelEvaluations controller following existing logProcessing/logSuccess/logFailure pattern - Added GET /modelInventory/evaluations route before /:id to avoid Express ID collision - Created modelEvaluations.repository.ts with ModelEvaluation and ModelEvaluationsResponse types - Created ModelEvaluationsTab component with combined sorted table, flagged-risk banner, and empty state - Integrated Evaluations tab into ModelInventory/index.tsx (TabBar, URL detection, isBuiltInTab, content render) ## Benefits - Shows all linked LLM experiments and bias audits in one place on the Model Inventory page - Flags evaluations with failed metrics or flagged bias groups with an amber warning banner - Follows existing tab pattern (model-risks, evidence-hub) — no new patterns introduced
## Changes - Added LINKED MODEL column to experiments table in ProjectExperiments.tsx - Added LINKED MODEL column to bias audits table in BiasAuditsList.tsx - Extended IEvaluationRow interface with linkedModel field - Extended BiasAuditSummary interface with modelInventoryId field - Updated EvaluationTable column widths to accommodate new column - Rendered green "Linked" badge or grey "Unlinked" label based on model_inventory_id / modelInventoryId presence ## Benefits - Users can quickly see at a glance whether an experiment or bias audit is linked to a model inventory record - Consistent visual indicator across both eval list views
Fix Select onChange type mismatch in NewBiasAuditModal and NewExperimentModal by removing explicit type annotation and using String() cast for value comparison.
- Use Promise.all for parallel DB queries in modelEvaluations.utils.ts - Use palette.status.success tokens instead of hardcoded hex colors - Use apiServices from networkServices instead of raw customAxios - Remove redundant JSDoc comments that restate the function name
MuhammadKhalilzadeh
approved these changes
Apr 13, 2026
Collaborator
MuhammadKhalilzadeh
left a comment
There was a problem hiding this comment.
Code-wise file look good to me @gorkem-bwl
@HarshP4585 would you please run this branch for functionality and behavior check, also the migrations
Thank you
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
model_inventory_idcolumn tollm_evals_experimentsandllm_evals_bias_auditsGET /api/modelInventory/evaluationsendpoint (Servers reads fromllm_evals_*tables, no cross-service writes)model_inventory_idDesign decisions
llm_evals_*tables directly (same Postgres)