[OPIK-6177] [BE] fix: dedupe by stable dataset_item_id in experiment comparison#6507
[OPIK-6177] [BE] fix: dedupe by stable dataset_item_id in experiment comparison#6507JetoPistola wants to merge 30 commits intomainfrom
Conversation
…comparison When comparing experiments whose runs span multiple dataset versions of the same item, the comparison endpoint returned one row per (dataset_item_id, dataset_version_id) instead of one row per stable dataset_item_id. The FE then rendered each row separately, showing "skipped" for the experiments not present in that row's run_summaries_by_experiment. Collapse dataset_items_resolved and dataset_items_aggr_resolved CTEs to one row per stable dataset_item_id (latest version) via LIMIT 1 BY dataset_item_id, and drop the now-redundant di.dataset_version_id = ei.resolved_dataset_version_id constraint from the LEFT JOINs so each ei row matches the canonical di row regardless of which version the experiment was linked to. Implements OPIK-6177: Experiment Comparison is duplicating rows in direct comparison of two test suites
…op join Mirror the same change to the push-top-limit branch's `dataset_items_aggr_resolved` join. Without it, when comparing aggregated experiments linked to different dataset versions of the same item, items whose linked version is older than the canonical (latest) one in `dataset_items_aggr_resolved` produced NULL `di_t.*` in the JOIN, making `any(di_t.*)` sort expressions unstable and pagination non-deterministic. Also removes the now-unused INNER JOIN to `experiment_aggregated_scope_ids` — its only purpose was to provide `eas_t.resolved_dataset_version_id` for the predicate. Caught by reviewer.
|
🔄 Test environment deployment process has started Phase 1: Deploying base version You can monitor the progress here. |
|
✅ Test environment is now available! To configure additional Environment variables for your environment, run [Deploy Opik AdHoc Environment workflow] (https://github.com/comet-ml/comet-deployment/actions/workflows/deploy_opik_adhoc_env.yaml) Access Information
The deployment has completed successfully and the version has been verified. |
|
@andrescrz thanks for catching this — you're right that this PR's SQL changes are the inverse of #5852. I dug into OPIK-4518 to compare; both bugs are real, and the two PRs are pulling on opposite ends of the same OR-JOIN ( What I'd like to confirm before deciding the path forward: is OPIK-4518's case still reproducible against current production data, or did the SDK-side change in 1.10.51 make A few signals pointing to the latter:
If new writes are stable-id-only, then the question is whether legacy rows with per-version physical row ids still need to work in the comparison endpoint. If yes, the right fix is probably an upstream resolution CTE that maps either form to the stable id once, then the JOIN simplifies to a single condition (Option 5 below). That's ~10 lines and structurally cleaner than today's OR-JOIN. If OPIK-4518's case is still actively reproducible today, I'll switch to keeping #5852's structure intact (no cross-version dedup, version-scoped JOIN) and add an outer GROUP BY by stable I'm marking the PR as draft until we agree on direction. Happy to jump on a 10-min call if it's faster.
🤖 Reply posted via /address-github-pr-comments |
|
🌙 Nightly cleanup: The test environment for this PR ( |
|
🌙 Nightly cleanup: The test environment for this PR ( |
Replying to your strategic question@JetoPistola You called the gate correctly — "are new writes stable-id-only?" — but the mechanism is BE-side, not SDK-side. Stable-id ingestion was implemented entirely in the backend; no SDK upgrade is required, all customers see the new behavior the moment their BE service rolls past the cutover version. So the bounded population of legacy Why #5852 is load-bearing for legacy, not just guarding The BE cutover normalized the write path; it didn't retroactively rewrite existing Option 5 is the right shape Resolution CTE upstream — look up
Why this fits better than keeping version-scoping forever (Option 2):
Apply to the cousin queries too
Acceptance test (independent of option choice) The new regression test passes because
Without this, a future refactor (or a UUID-collision regression in the resolution CTE) could silently drop legacy rendering and we wouldn't notice. Re @thiagohora's Happy to hop on a call. 🤖 Review posted via /review-github-pr |
…experiment-comparison-duplicate-rows # Conflicts: # apps/opik-backend/src/main/java/com/comet/opik/domain/DatasetItemVersionDAO.java
|
🌙 Nightly cleanup: The test environment for this PR ( |
2 similar comments
|
🌙 Nightly cleanup: The test environment for this PR ( |
|
🌙 Nightly cleanup: The test environment for this PR ( |
|
Note
Original content (struck through)
~~(c) Pivot to Option 2 (argMax wrap over #5852's existing structure, no new CTE). Verbose (~30-40 lines per query site, ~6 sites) but doesn't introduce the lookup JOIN at all. The verbosity is what made me skip it initially; given the analyzer issues, that may have been the wrong call.~~
🤖 Reply posted via /address-github-pr-comments |
…to stable id upstream Replaces this PR's earlier OPIK-6177 approach (which regressed OPIK-4518's legacy row_id case per #5852) with andrescrz's Option 5 design from the review. The resolution CTE pattern produces a `stable_dataset_item_id` for every experiment_items row regardless of whether ei.dataset_item_id holds the stable id (modern writes, post-OPIK-4518 BE cutover) or a per-version dataset_item_versions.id (legacy writes pre-cutover). Resolution shape: inline LEFT JOIN to dataset_item_versions FINAL keyed on (workspace_id, id) — the per-version row PK. For modern rows the JOIN misses and `if(notEmpty(lookup_div.dataset_item_id), …, ei.dataset_item_id)` falls back to ei.dataset_item_id (already the stable id). For legacy rows the JOIN matches and projects div.dataset_item_id (the stable id). A standalone `ei_id_lookup` CTE was tried first but caused ClickHouse's analyzer to drop left rows when the CTE materialized empty (deletion-cascade scenarios where dataset_item_versions has no rows for the dataset). Direct table reference avoids that. Downstream JOIN simplifies from `(di.id = ei.dataset_item_id OR di.row_id = ei.dataset_item_id) AND di.dataset_version_id = ei.resolved_dataset_version_id` to the single condition `di.id = ei.stable_dataset_item_id`, removing the OR-JOIN structural smell from 5 sites in the comparison row + count + push-top + stats queries plus 2 sites in ExperimentAggregatesDAO, and dropping #5852's per-experiment version-equality predicate (it is no longer needed once resolution happens upstream). GROUP BY in both UNION branches now keys on stable_dataset_item_id; the existing `LIMIT 1 BY dataset_item_id` in dataset_items_*resolved gives one canonical (latest-version) `di` row per stable id, so per-version `di.*` columns in GROUP BY are functionally determined by stable_dataset_item_id and don't split rows. `buildTopItemsSorting`/`getTopSortExpression` updated to project the resolved expression for sorting fields that reference the experiment-side key. Implements Option 5 as agreed with @andrescrz in the PR review thread. Fixes the OPIK-6177 duplicate-row symptom AND restores OPIK-4518's legacy row_id rendering in a single, unified shape. All 549 comparison-endpoint + stats + ExperimentAggregatesIntegration tests pass (1 pre-existing flake unrelated to these changes).
… CTE Address PR #6507 baz-reviewer finding (comment 3190434359) and thiagohora's confirmation (comment 3191033224): after collapsing eligible_dataset_item_lookup, count's arrayJoin([id, row_id]) FROM dataset_items_agg_resolved (LIMIT 1 BY dataset_item_id) only emits the LATEST version's row_id per item, so legacy EIA rows referencing an OLDER version's row_id (pre-OPIK-6177 + post-version-bump) fail the IN-check and undercount. Push-top row branch is unaffected because its lookup_div FINAL handles all versions. Add a lookup_for_count CTE that narrows by <dataset_item_filters> first then INNER JOINs dataset_item_versions FINAL, emitting both div.id (every version's row_id) and the stable id. Bloom skip-index on dataset_item_id helps because the inner join is dataset_item_id- narrowed (unlike the previously-removed eligible CTE which was version-only-narrowed and couldn't prune). Bench (per thiagohora): count 1.964s vs 1.827s (+0.14s) and full cross-version correctness. Same shape applied to row query !push_top_limit branch (CTE name shared because the queries are separate StringTemplate constants). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Backend Tests - Integration Group 5145 tests 142 ✅ 2m 54s ⏱️ Results for commit 1c34a84. ♻️ This comment has been updated with latest results. |
|
🔄 Test environment deployment process has started Phase 1: Deploying base version You can monitor the progress here. |
|
✅ Test environment is now available! To configure additional Environment variables for your environment, run [Deploy Opik AdHoc Environment workflow] (https://github.com/comet-ml/comet-deployment/actions/workflows/deploy_opik_adhoc_env.yaml) Access Information
The deployment has completed successfully and the version has been verified. |
The eligible_dataset_item_lookup CTE was replaced by lookup_for_count in 1c34a84. Update the two stale Javadoc/inline comment blocks in DatasetItemVersionDAO to reference the new CTE name and capture the narrow-by-filter / arrayJoin shape that covers all-version row_ids. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andrescrz
left a comment
There was a problem hiding this comment.
Let's clarify how eligible_dataset_item_lookup is referred and used and the logic in DatasetItemVersionDAO seems to rely on it.
| /** | ||
| * Get experiment items with cursor pagination | ||
| */ | ||
| // OPIK-6177: resolves ei.dataset_item_id (legacy per-version DIV row id or modern | ||
| // stable id) to the stable dataset_item_id at aggregation time. Compare reads still | ||
| // resolve at read time too (see DatasetItemVersionDAO), so this is best-effort hygiene | ||
| // — over time it phases out legacy values from EIA without a backfill migration. |
There was a problem hiding this comment.
Nit: better integrate this new comment into the javadoc above.
This applies to other comments below, which fit better as javadocs, whether we have a previous one or not.
There was a problem hiding this comment.
Commit 5a56e2f addressed this comment by moving the inline note into the Javadoc for GET_EXPERIMENT_ITEMS (and similar comments), properly integrating the explanation about OPIK-6177 into the existing documentation block.
There was a problem hiding this comment.
Fixed in 5a56e2f — promoted the OPIK-6177 // inline comment blocks above the relevant SQL constants into Javadoc in both DAOs:
DatasetItemVersionDAO.SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMSand_COUNTExperimentAggregatesDAO.GET_EXPERIMENT_ITEMS,SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_COUNT, and the row variant
Each Javadoc now captures the OPIK-6177 stable-id resolution rationale, the lookup_for_count CTE shape, and (post your #2 nit) the no-FINAL immutability assumption.
🤖 Reply posted via /address-github-pr-comments
| ei.last_updated_by AS last_updated_by, | ||
| ei.execution_policy AS execution_policy | ||
| FROM experiment_items AS ei | ||
| LEFT JOIN dataset_item_versions AS lookup_div FINAL |
There was a problem hiding this comment.
Minor: you could try to void the FINAL in dataset_item_versions if you verify and assume that dataset_item_id is not mutable in practice. The rest of retrieved fields are enforced to be mutable as they're part of the sort/primary key. Anyway, this is the safest approach from the functional point of view.
This comments applies to other similar final calls below for this particular table.
There was a problem hiding this comment.
Commit 5a56e2f addressed this comment by removing the FINAL qualifiers from the dataset_item_versions joins/queries in ExperimentAggregatesDAO, aligning with the assumption that dataset_item_id is effectively stable and avoiding the unnecessary finalization in those lookups.
There was a problem hiding this comment.
Fixed in 5a56e2f — verified the immutability assumption holds for dataset_item_versions and dropped FINAL on every read in both DAOs.
Why it's safe:
- The table is a
ReplicatedReplacingMergeTreeordered by(workspace_id, dataset_id, dataset_version_id, id)withlast_updated_atas version. - Every writer generates a fresh per-row
id:BATCH_INSERT_ITEMS—:id<index>from a fresh UUID eachEDIT_ITEM_VIA_SELECT_INSERT—:newIdforid,:newVersionIdfordataset_version_idCOPY_VERSION_ITEMS(version copy / fork / restore) — fresh:uuidspool, originalidis NOT preserved
- The legacy seed
COPY_ITEMS_FROM_LEGACYdoes preserveidfrom thedataset_itemsdraft table, but it's double-guarded byDatasetVersioningMigrationService.ensureDatasetMigrated(fasthasVersionscheck + Redis lock + transactional re-check) and runs at most once per dataset, ever.
So no path re-INSERTs the same (workspace_id, dataset_id, dataset_version_id, id) tuple. dataset_item_id (the only retrieved non-key column) is immutable per row PK and pre-merge duplicates can't arise — FINAL was theoretical defense.
Documented the assumption in the SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS Javadoc with a note for future maintainers to restore FINAL if a future writer ever introduces an idempotent re-INSERT that updates non-key columns.
13 FINAL calls dropped (9 in DatasetItemVersionDAO, 4 in ExperimentAggregatesDAO). Tests still green: 113/113 compare-experiments + 138/138 EAI.
🤖 Reply posted via /address-github-pr-comments
There was a problem hiding this comment.
Correction — my prior reply was wrong, and CI caught it. Restored FINAL in b86e7a0.
What I missed: PUT /datasets/items is an upsert flow that re-INSERTs the same (workspace_id, dataset_id, dataset_version_id, id) tuple when a client PUTs the same item ids twice with changed data / description / tags / evaluators / executionPolicy. The endpoint contract says: "Each item's id field is the stable identifier and upsert key" — BATCH_INSERT_ITEMS binds :id<i> = item.id().toString(), reusing the input id, so re-INSERTs of the same row PK are routine. Pre-merge duplicates exist on this table by design; FINAL is required so reads see only the latest row per PK.
The CI failure was DatasetsResourceTest$GetDatasetItemsByDatasetId.getDatasetItemsByDatasetId__whenItemsWereUpdated__thenReturnCorrectItemsCount — PUTs items twice with changed data and asserts SELECT_COLUMNS_BY_VERSION returns only the latest version's columns. Without FINAL, the query saw both pre-merge rows and unioned their column_types (5 expected → 13 actual).
Refined rule for this table: drop FINAL only where the read manually replicates ReplacingMergeTree's dedup via ORDER BY (..., id) DESC, last_updated_at DESC + LIMIT 1 BY <id-or-stable-id> (matches the engine's last_updated_at tiebreaker). Everywhere else (raw LEFT JOIN on id, raw aggregations without that idiom — like SELECT_COLUMNS_BY_VERSION), FINAL is load-bearing. Updated the SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS Javadoc to capture this.
Net diff vs e45d82b (parent of the failed FINAL drop): kept the dead-CTE removal in ExperimentAggregatesDAO and the Javadoc promotions, restored every FINAL keyword that was there before. Tests now green: 113/113 compare-experiments + 27/27 GetDatasetItemsByDatasetId (incl. the previously-failing test) + 138/138 EAI.
🤖 Reply posted via /address-github-pr-comments
| LIMIT 1 BY div.id | ||
| ) AS div_dedup | ||
| ), | ||
| eligible_dataset_item_lookup AS ( |
There was a problem hiding this comment.
Commit 5a56e2f addressed this comment by removing the eligible_dataset_item_lookup CTE altogether and wiring the queries to the dataset_item_versions lookup directly, so the previously questioned CTE is no longer present anywhere in the diff.
There was a problem hiding this comment.
Fixed in 5a56e2f — good catch. The eligible_dataset_item_lookup CTE here was dead code — defined but never referenced in the count query's IN predicate. It was a leftover from the Alt 7 shape that was iterated on in DatasetItemVersionDAO (where the IN predicate did use it, until it was further refactored into lookup_for_count), but in this file the corresponding wire-up never landed.
Removed the dead CTE definition (lines 1143-1155 pre-fix). The actual stable-id resolution in this query is the lookup_div FINAL LEFT JOIN at the bottom + the if(notEmpty(...)) ... AS stable_dataset_item_id projection, which the new Javadoc now captures. (FINAL was also dropped per your #2 comment after verifying the immutability assumption.)
This file is the dev/test parity harness (countDatasetItemsWithExperimentItemsFromAggregates, only called by ExperimentAggregatesIntegrationTest); the production count path lives in DatasetItemVersionDAO.
🤖 Reply posted via /address-github-pr-comments
| LIMIT 1 BY div.id | ||
| ) AS div_dedup | ||
| ), | ||
| eligible_dataset_item_lookup AS ( |
There was a problem hiding this comment.
Same, where is this CTE used from?
There was a problem hiding this comment.
Commit e45d82b addressed this comment by adding documentation that spells out how the lookup_for_count CTE is consumed by the aggregated branch (and why the push_top_limit branch skips it), clarifying where that CTE is used.
There was a problem hiding this comment.
Fixed in 5a56e2f — same dead-code situation as the count-query CTE you flagged in your previous comment. The row-query eligible_dataset_item_lookup here was also defined but never referenced — leftover from the Alt 7 iteration that didn't end up wiring an IN predicate in this file. Removed.
The actual stable-id resolution in this row query is lookup_div LEFT JOIN + if(notEmpty(...)) projection, captured in the new Javadoc that points back to _COUNT's rationale.
This row method (getDatasetItemsWithExperimentItemsFromAggregates) is also only called by the parity harness (ExperimentAggregatesIntegrationTest).
🤖 Reply posted via /address-github-pr-comments
Backend Tests - Integration Group 1420 tests 420 ✅ 13m 33s ⏱️ Results for commit dea6798. ♻️ This comment has been updated with latest results. |
Address PR #6507 andrescrz feedback (comments 3194973428, 3195000400, 3195054140, 3195055228): 1. Remove dead eligible_dataset_item_lookup CTE definitions in ExperimentAggregatesDAO (the count + row parity-harness queries) that were never referenced in their SQL — leftover from the Alt 7 shape that didn't end up wiring the IN predicate in this file. 2. Promote OPIK-6177 // inline comment blocks above SQL constants into proper Javadoc in both files (per andrescrz's nit). 3. Drop FINAL on every dataset_item_versions read in both files. The table is a ReplicatedReplacingMergeTree where every writer generates a fresh per-row id (BATCH_INSERT_ITEMS, EDIT_ITEM_VIA_SELECT_INSERT, COPY_VERSION_ITEMS) and the one-shot COPY_ITEMS_FROM_LEGACY is double-guarded by DatasetVersioningMigrationService.ensureDataset Migrated, so no write path re-INSERTs the same (workspace_id, dataset_id, dataset_version_id, id) tuple. Pre-merge duplicates can't arise; FINAL was theoretical defense. Documented in the SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS Javadoc with a note for future maintainers to restore FINAL if any non-key column becomes mutable via re-INSERT. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔄 Test environment deployment process has started Phase 1: Deploying base version You can monitor the progress here. |
|
✅ Test environment is now available! To configure additional Environment variables for your environment, run [Deploy Opik AdHoc Environment workflow] (https://github.com/comet-ml/comet-deployment/actions/workflows/deploy_opik_adhoc_env.yaml) Access Information
The deployment has completed successfully and the version has been verified. |
Revert the FINAL drop from 5a56e2f. The immutability assumption documented there was wrong: PUT /datasets/items is a real upsert flow that re-INSERTs the same (workspace_id, dataset_id, dataset_version_id, id) tuple when a client PUTs the same item ids twice with changed data, description, tags, evaluators, or execution_policy. Pre-merge duplicates are routine on this table; FINAL is required so reads see only the latest row per PK. CI surfaced this via DatasetsResourceTest$GetDatasetItemsByDatasetId. getDatasetItemsByDatasetId__whenItemsWereUpdated__thenReturnCorrectItemsCount which PUTs items twice with changed data and asserts that SELECT_COLUMNS_BY_VERSION returns only the latest version's columns — without FINAL the query saw both pre-merge rows. Restored FINAL only where reads don't already dedupe via LIMIT 1 BY dataset_item_id ORDER BY (..., dvid) DESC, last_updated_at DESC (those subqueries pick the latest row themselves so FINAL is redundant on the inner scan). Updated the SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS Javadoc to capture that FINAL is load-bearing for this table given the upsert contract. Keeps the dead eligible_dataset_item_lookup CTE removal and Javadoc promotions from 5a56e2f — those remain correct. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔄 Test environment deployment process has started Phase 1: Deploying base version You can monitor the progress here. |
|
✅ Test environment is now available! To configure additional Environment variables for your environment, run [Deploy Opik AdHoc Environment workflow] (https://github.com/comet-ml/comet-deployment/actions/workflows/deploy_opik_adhoc_env.yaml) Access Information
The deployment has completed successfully and the version has been verified. |
| FROM experiment_items AS ei | ||
| LEFT JOIN dataset_item_versions AS lookup_div FINAL | ||
| ON lookup_div.workspace_id = ei.workspace_id | ||
| AND lookup_div.id = ei.dataset_item_id | ||
| WHERE ei.workspace_id = :workspace_id |
There was a problem hiding this comment.
lookup_div FINAL in GET_EXPERIMENT_ITEMS uses an unbounded bare FINAL scan (violating .agents/skills/opik-backend/clickhouse.md) and duplicates the join+fallback logic from DatasetItemVersionDAO — should we refactor into a bounded/pre-deduped CTE shared with DatasetItemVersionDAO, or document an explicit exception if the bare FINAL is intentional?
Finding types: AI Coding Guidelines Code Dedup and Conventions | Severity: 🟢 Low
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents:
Before applying, verify this suggestion against the current code. In
apps/opik-backend/src/main/java/com/comet/opik/domain/experiments/aggregations/ExperimentAggregatesDAO.java
around lines 684-688 inside the `GET_EXPERIMENT_ITEMS` SQL string: 1. The `LEFT JOIN
dataset_item_versions AS lookup_div FINAL` performs an unbounded bare `FINAL` scan,
violating the monotonic-bound guidance from `.agents/skills/opik-backend/clickhouse.md`,
and can make cursor paging pathfully unbounded/expensive per page. 2. The same
join+fallback logic is duplicated from `DatasetItemVersionDAO`. Refactor this stable-id
normalization into a shared SQL fragment/CTE that: - Constrains by
workspace/dataset/version and uses `LIMIT 1 BY`/deduping instead of bare `FINAL` - Can
be reused by both `GET_EXPERIMENT_ITEMS` and `DatasetItemVersionDAO` to stay in sync If
a bare `FINAL` exception is truly required, add a Javadoc comment near the
`GET_EXPERIMENT_ITEMS` constant explicitly documenting why the monotonic-bound FINAL
rule doesn't apply here, referencing the specific clickhouse guideline.
There was a problem hiding this comment.
Skipping — lookup_div FINAL here is load-bearing for upsert pre-merge dedup, and the suggested mitigations don't fit this case:
- Pre-deduped CTE /
LIMIT 1 BY dataset_item_idbreaks legacy id resolution.lookup_div.id = ei.dataset_item_idmay match a per-version row_id from ANY past version (pre-OPIK-4518 backwards-compat), so we must keep all versions in the lookup; pre-deduping to "latest per stable item" loses the older rows we need to map back to the stable id. last_updated_attime bound is impractical: legacyexperiment_itemsrows can be arbitrarily old, so the bound would have to span the dataset's full lifetime.- JOIN is already row-PK-narrowed via
lookup_div.id = ei.dataset_item_id(equality on the row PK), so this isn't a full-table FINAL — the FINAL is bounded by skip-index pushdown on the PK. - Empirical: we just learned the hard way that dropping FINAL on this table breaks correctness. CI surfaced it via check-runs/74608249821 —
SELECT_COLUMNS_BY_VERSIONreturned 13 columns instead of 5 because the upsert flow re-INSERTs the same(ws, ds, dvid, id)tuple with newdata, so pre-merge duplicates exist by design on this table. Restored inb86e7a0b62.
@andrescrz — flagging this for your re-review since you raised the original FINAL concern. We could add a Javadoc note here explicitly documenting the exception to the clickhouse.md rule if you'd prefer that over the current state, or look into a different approach entirely.
🤖 Reply posted via /address-github-pr-comments
| FROM experiment_items ei | ||
| INNER JOIN experiments_resolved e ON e.id = ei.experiment_id | ||
| LEFT JOIN dataset_item_versions AS lookup_div FINAL | ||
| ON lookup_div.workspace_id = ei.workspace_id | ||
| AND lookup_div.id = ei.dataset_item_id |
There was a problem hiding this comment.
lookup_div FINAL reads dataset_item_versions without a monotonic time bound, but the clickhouse.md guidance says bare FINAL on large tables should be time-bounded—since this stable-id join for legacy ei.dataset_item_id would miss older legacy rows, can we either add an explicit exception for this path or switch to a pre-deduped lookup strategy?
Finding type: AI Coding Guidelines | Severity: 🟢 Low
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents:
Before applying, verify this suggestion against the current code. In
`apps/opik-backend/src/main/java/com/comet/opik/domain/DatasetItemVersionDAO.java`
around lines 443-447 inside the
`SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_COUNT` query (the
`experiment_items_scope` CTE), the new `LEFT JOIN dataset_item_versions AS lookup_div
FINAL` performs a bare `FINAL` scan on a large table without any monotonic/time bound,
violating the stated ClickHouse guidance. Refactor this join to avoid `FINAL` by
replacing `lookup_div` with a pre-deduped lookup subquery/CTE that returns the latest
row per `dataset_item_id` using `ORDER BY last_updated_at DESC LIMIT 1 BY
dataset_item_id`, and restrict that subquery to the smallest possible scope (at least
`workspace_id` and the relevant dataset/version resolution) so correctness of stable-id
resolution is preserved. If you cannot safely replace the join, add an explicit in-file
exception comment that cites the guideline and explains why this hot path is
intentionally exempt.
There was a problem hiding this comment.
Skipping — same rationale as my reply on the GET_EXPERIMENT_ITEMS thread. The lookup_div FINAL here is load-bearing for upsert pre-merge dedup; pre-deduping the lookup table breaks legacy id resolution; a last_updated_at time bound doesn't fit because legacy ei.dataset_item_id rows can be arbitrarily old; the JOIN is already row-PK-narrowed via equality on lookup_div.id. CI (check-runs/74608249821) confirmed dropping FINAL is unsafe on this table; restored in b86e7a0b62.
@andrescrz — same flag for your re-review here; happy to add a Javadoc exception note or take a different approach if you'd prefer.
🤖 Reply posted via /address-github-pr-comments
|
@andrescrz — could you take another look when you have a moment? Three things since your last review:
Tests green: 113/113 compare-experiments (incl. cross-version + cross-branch regression tests) + 138/138 EAI + 27/27 GetDatasetItemsByDatasetId. 🤖 Reply posted via /address-github-pr-comments |

Details
When comparing experiments, the compare endpoint could return duplicate rows for the same stable
dataset_item_id. Three distinct causes addressed in this PR.Bug 1 — within-branch duplication (cross-version)
When two experiments ran on different dataset versions of the same item, the dedup CTEs (
dataset_items_resolved,dataset_items_aggr_resolved) were keyed onLIMIT 1 BY id(per-version row PK), so they preserved one row per(item, version)pair. Combined with the per-experiment version-match constraint on the LEFT JOIN, each experiment's items joined to its owndirow, and the GROUP BY split on the per-versiondi.*columns produced one output row per (stable item × version × experiment).Bug 2 — cross-branch duplication (transient post-experiment window)
When one experiment had rows in
experiment_aggregates(the aggregated branch) and another did not yet (the raw branch), each branch correctly grouped by stabledataset_item_idinternally — but theUNION ALLbetween branches did not dedupe across them. Result: the same stable id surfaced twice — once with experiment A's items, once with experiment B's — instead of a single merged row.This was a transient window: visible between "experiment ran" and "aggregator caught up" (typically minutes). Pre-existing on
main; surfaced after Bug 1 was fixed because that's when the within-branch dedup started working consistently and exposed the cross-branch case.Bug 3 — Bug 1's first-pass fix caused an ~11× perf regression
The first round of fixes resolved
experiment_items.dataset_item_idto its stable form via a per-queryLEFT JOIN dataset_item_versions AS lookup_divin the compare query. Functionally correct — butlookup_div.id = eia.dataset_item_idmatches only the innermost sort-key column ondataset_item_versions, so ClickHouse couldn't apply skip indexes; and theif(notEmpty(lookup_div.dataset_item_id), ..., eia.dataset_item_id)wrap on the IN predicate disabled the EIA-sideidx_experiment_item_aggregates_dataset_item_idminmax skip index that drove #6567's 5.86× speedup. Measured by @thiagohora on a representative experiment (page-25 sorted byidDESC, DI-side filter): 0.99 s on main → 10.88 s on this branch before the fix.Solution — Alternative 7
After 7 explored alternatives (see the alignment thread and andrescrz's CHANGES_REQUESTED blocking the migration approach), shipped a pure read-time resolution that satisfies all three constraints:
(a) OPIK-4518 backwards-compat for legacy
experiment_itemsrows that store per-versionrow_ids(b) OPIK-6311 perf path (no
if(notEmpty(lookup_div...))wraps that defeat the EIA-sideidx_experiment_item_aggregates_dataset_item_idminmax skip index)(c) No data migration
Bug 1 — slim id-resolution CTE + direct DIV LEFT JOIN
A new
eligible_dataset_item_lookupCTE scoped to current experiments' dataset versions emits both per-versionrow_ids AND stabledataset_item_ids viaarrayJoin([div.id, div.dataset_item_id]). Used in two ways:eia.dataset_item_id IN (SELECT lookup_id FROM eligible_dataset_item_lookup)is a direct IN against a plain column, same shape OPIK-6311 measured at 13× speedup.experiment_items_scopefrom a prior incident).The actual stable-id resolution uses
LEFT JOIN dataset_item_versions FINALagainst the direct table (same shape as Option 5 /experiment_items_scope). For modern rows the JOIN typically misses (or matches a v1 DIV row) and theif(notEmpty(...))fallback returns the raweia.dataset_item_id. For legacy rows whereeia.dataset_item_idis a per-version row id, the JOIN finds the DIV row and projects its stabledataset_item_id. For orphaned items (DIV row deleted), fallback returns the raw value, matching main's behavior of rendering them keyed on the raw value with NULLdi.*.Sites updated (all aggregated-branch reads):
DatasetItemVersionDAO: count queryitem_agg_count, row queryitem_agginner SELECTExperimentAggregatesDAO: both_COUNTand_WITH_EXPERIMENT_ITEMS(FromAggregates dev/test parity queries)top_dataset_items× 2: reverted to main's raweia.dataset_item_idshape (resolution happens in outer row query)eia_t.dataset_item_idAlso kept the write-time stable-id resolution in
GET_EXPERIMENT_ITEMS(commit08b184b619) as best-effort hygiene — over time it phases out legacy values from EIA without a backfill migration.Bug 2 — outer GROUP BY over UNION ALL
UNION ALLwith an outerGROUP BY u.idso a stable id appears once regardless of which branches contributed it.groupArrayArray(experiment_items_array)flattens both branches' per-experiment item arrays into one merged array.argMax(col, dataset_version_id)for dataset-derived scalars — picks the higher dataset version's metadata when branches differ, matching single-branch'sLIMIT 1 BY dataset_item_id ORDER BY dataset_version_id DESCsemantic. (Earlier iteration usedlast_updated_at, swapped todataset_version_idper @baz-reviewer's MEDIUM #5.)Inner branches stay unchanged. Count, stats, and push-top variants are unaffected: count already wraps with
COUNT(DISTINCT); stats union byei.idwhich is structurally disjoint between branches; push-top is gated by!hasRaw.Wider-scope check
The cross-branch UNION pattern also exists in
ExperimentDAOandExperimentItemDAO, but those DAOs key rows by the same dimension that partitions the branches (experiment_id), so cross-branch overlap is structurally impossible. WithinDatasetItemVersionDAO, only the row query has this issue — count/stats variants are safe by construction. No other DAO needs the fix.Comparison rows now show the latest dataset item snapshot, with all experiments' items merged into a single row regardless of dataset version OR aggregation status, and
run_summaries_by_experimentkeyed byexperiment_id.Change checklist
Issues
AI-WATERMARK
AI-WATERMARK: yes
Testing
mvn test -Dtest='ExperimentAggregatesIntegrationTest'— 138 tests passmvn test -Dtest='DatasetsResourceTest$FindDatasetItemsWithExperimentItems,DatasetsResourceTest$FindDatasetItemsWithExperimentItemsAssertionResults,DatasetsResourceTest$FindDatasetItemsWithExperimentItemsSortingTest,DatasetsResourceTest$GetDatasetExperimentItemsStats'— 113 tests passfind__twoExperimentsOnDifferentDatasetVersions__thenSingleMergedRow(Bug 1) — two experiments on V1+V2 of same dataset_item, asserts 1 merged row with V2's data.find__oneExperimentAggregatedOneRaw__thenSingleMergedRow(Bug 2) — one aggregated + one raw on shared dataset_items, asserts merged row per stable id with both experiments' items. Verified to fail with "Expected size: 5 but was: 10" without the outer GROUP BY fix.multiVersionExperimentsEiaFilterConsistentBeforeAndAfterAggregates(Bug 3 / MEDIUM [CM-7217] change requests to backend for experiment creation and asset upload in comet llm #6) — EIA-side filter + cross-version + push_top_limit, asserts before-aggregation == after-aggregation result. Locks in correctness for the perf-regression-class of bugs at integration level.INagainst plain column) but not yet validated with a liveEXPLAIN indexes=1. Asked @thiagohora to re-run his prod benchmark against this branch when convenient.Documentation
N/A — backend SQL change with no API contract changes beyond fixing the duplicate-row bug.