Skip to content

[OPIK-6177] [BE] fix: dedupe by stable dataset_item_id in experiment comparison#6507

Open
JetoPistola wants to merge 30 commits intomainfrom
daniel/OPIK-6177-fix-experiment-comparison-duplicate-rows
Open

[OPIK-6177] [BE] fix: dedupe by stable dataset_item_id in experiment comparison#6507
JetoPistola wants to merge 30 commits intomainfrom
daniel/OPIK-6177-fix-experiment-comparison-duplicate-rows

Conversation

@JetoPistola
Copy link
Copy Markdown
Contributor

@JetoPistola JetoPistola commented Apr 27, 2026

Details

image

When comparing experiments, the compare endpoint could return duplicate rows for the same stable dataset_item_id. Three distinct causes addressed in this PR.

Bug 1 — within-branch duplication (cross-version)

When two experiments ran on different dataset versions of the same item, the dedup CTEs (dataset_items_resolved, dataset_items_aggr_resolved) were keyed on LIMIT 1 BY id (per-version row PK), so they preserved one row per (item, version) pair. Combined with the per-experiment version-match constraint on the LEFT JOIN, each experiment's items joined to its own di row, and the GROUP BY split on the per-version di.* columns produced one output row per (stable item × version × experiment).

Bug 2 — cross-branch duplication (transient post-experiment window)

When one experiment had rows in experiment_aggregates (the aggregated branch) and another did not yet (the raw branch), each branch correctly grouped by stable dataset_item_id internally — but the UNION ALL between branches did not dedupe across them. Result: the same stable id surfaced twice — once with experiment A's items, once with experiment B's — instead of a single merged row.

This was a transient window: visible between "experiment ran" and "aggregator caught up" (typically minutes). Pre-existing on main; surfaced after Bug 1 was fixed because that's when the within-branch dedup started working consistently and exposed the cross-branch case.

Bug 3 — Bug 1's first-pass fix caused an ~11× perf regression

The first round of fixes resolved experiment_items.dataset_item_id to its stable form via a per-query LEFT JOIN dataset_item_versions AS lookup_div in the compare query. Functionally correct — but lookup_div.id = eia.dataset_item_id matches only the innermost sort-key column on dataset_item_versions, so ClickHouse couldn't apply skip indexes; and the if(notEmpty(lookup_div.dataset_item_id), ..., eia.dataset_item_id) wrap on the IN predicate disabled the EIA-side idx_experiment_item_aggregates_dataset_item_id minmax skip index that drove #6567's 5.86× speedup. Measured by @thiagohora on a representative experiment (page-25 sorted by id DESC, DI-side filter): 0.99 s on main → 10.88 s on this branch before the fix.

Solution — Alternative 7

After 7 explored alternatives (see the alignment thread and andrescrz's CHANGES_REQUESTED blocking the migration approach), shipped a pure read-time resolution that satisfies all three constraints:

(a) OPIK-4518 backwards-compat for legacy experiment_items rows that store per-version row_ids
(b) OPIK-6311 perf path (no if(notEmpty(lookup_div...)) wraps that defeat the EIA-side idx_experiment_item_aggregates_dataset_item_id minmax skip index)
(c) No data migration

Bug 1 — slim id-resolution CTE + direct DIV LEFT JOIN

A new eligible_dataset_item_lookup CTE scoped to current experiments' dataset versions emits both per-version row_ids AND stable dataset_item_ids via arrayJoin([div.id, div.dataset_item_id]). Used in two ways:

  1. Skip-index-friendly IN list when DI filters are active — eia.dataset_item_id IN (SELECT lookup_id FROM eligible_dataset_item_lookup) is a direct IN against a plain column, same shape OPIK-6311 measured at 13× speedup.
  2. NOT used as a JOIN target — a CTE-based LEFT JOIN drops rows on deletion-cascade in ClickHouse (known analyzer behavior, also documented inline on experiment_items_scope from a prior incident).

The actual stable-id resolution uses LEFT JOIN dataset_item_versions FINAL against the direct table (same shape as Option 5 / experiment_items_scope). For modern rows the JOIN typically misses (or matches a v1 DIV row) and the if(notEmpty(...)) fallback returns the raw eia.dataset_item_id. For legacy rows where eia.dataset_item_id is a per-version row id, the JOIN finds the DIV row and projects its stable dataset_item_id. For orphaned items (DIV row deleted), fallback returns the raw value, matching main's behavior of rendering them keyed on the raw value with NULL di.*.

Sites updated (all aggregated-branch reads):

  • DatasetItemVersionDAO: count query item_agg_count, row query item_agg inner SELECT
  • ExperimentAggregatesDAO: both _COUNT and _WITH_EXPERIMENT_ITEMS (FromAggregates dev/test parity queries)
  • top_dataset_items × 2: reverted to main's raw eia.dataset_item_id shape (resolution happens in outer row query)
  • Sorting helpers: reverted to eia_t.dataset_item_id

Also kept the write-time stable-id resolution in GET_EXPERIMENT_ITEMS (commit 08b184b619) as best-effort hygiene — over time it phases out legacy values from EIA without a backfill migration.

Bug 2 — outer GROUP BY over UNION ALL

  • Wrap the existing UNION ALL with an outer GROUP BY u.id so a stable id appears once regardless of which branches contributed it.
  • groupArrayArray(experiment_items_array) flattens both branches' per-experiment item arrays into one merged array.
  • argMax(col, dataset_version_id) for dataset-derived scalars — picks the higher dataset version's metadata when branches differ, matching single-branch's LIMIT 1 BY dataset_item_id ORDER BY dataset_version_id DESC semantic. (Earlier iteration used last_updated_at, swapped to dataset_version_id per @baz-reviewer's MEDIUM #5.)

Inner branches stay unchanged. Count, stats, and push-top variants are unaffected: count already wraps with COUNT(DISTINCT); stats union by ei.id which is structurally disjoint between branches; push-top is gated by !hasRaw.

Wider-scope check

The cross-branch UNION pattern also exists in ExperimentDAO and ExperimentItemDAO, but those DAOs key rows by the same dimension that partitions the branches (experiment_id), so cross-branch overlap is structurally impossible. Within DatasetItemVersionDAO, only the row query has this issue — count/stats variants are safe by construction. No other DAO needs the fix.

Comparison rows now show the latest dataset item snapshot, with all experiments' items merged into a single row regardless of dataset version OR aggregation status, and run_summaries_by_experiment keyed by experiment_id.

Change checklist

  • User facing
  • Documentation update

Issues

  • OPIK-6177

AI-WATERMARK

AI-WATERMARK: yes

  • Tools: Claude Code
  • Model(s): Claude Opus 4.7 (1M context)
  • Scope: assisted (root-cause investigation, query change, regression tests)
  • Human verification: code review + targeted backend test runs (251 existing + 3 new regression tests pass) + manual verification on PR preview env

Testing

  • mvn test -Dtest='ExperimentAggregatesIntegrationTest' — 138 tests pass
  • mvn test -Dtest='DatasetsResourceTest$FindDatasetItemsWithExperimentItems,DatasetsResourceTest$FindDatasetItemsWithExperimentItemsAssertionResults,DatasetsResourceTest$FindDatasetItemsWithExperimentItemsSortingTest,DatasetsResourceTest$GetDatasetExperimentItemsStats' — 113 tests pass
  • New regression tests:
    • find__twoExperimentsOnDifferentDatasetVersions__thenSingleMergedRow (Bug 1) — two experiments on V1+V2 of same dataset_item, asserts 1 merged row with V2's data.
    • find__oneExperimentAggregatedOneRaw__thenSingleMergedRow (Bug 2) — one aggregated + one raw on shared dataset_items, asserts merged row per stable id with both experiments' items. Verified to fail with "Expected size: 5 but was: 10" without the outer GROUP BY fix.
    • multiVersionExperimentsEiaFilterConsistentBeforeAndAfterAggregates (Bug 3 / MEDIUM [CM-7217] change requests to backend for experiment creation and asset upload in comet llm #6) — EIA-side filter + cross-version + push_top_limit, asserts before-aggregation == after-aggregation result. Locks in correctness for the perf-regression-class of bugs at integration level.
  • Bug 3 perf: skip-index pushdown is preserved by SQL shape (direct IN against plain column) but not yet validated with a live EXPLAIN indexes=1. Asked @thiagohora to re-run his prod benchmark against this branch when convenient.
  • Pre-commit hook ran backend lint and Spotless (passed)
  • Not run: full repository-wide test suite, frontend tests (no FE changes)

Documentation

N/A — backend SQL change with no API contract changes beyond fixing the duplicate-row bug.

…comparison

When comparing experiments whose runs span multiple dataset versions of
the same item, the comparison endpoint returned one row per
(dataset_item_id, dataset_version_id) instead of one row per stable
dataset_item_id. The FE then rendered each row separately, showing
"skipped" for the experiments not present in that row's
run_summaries_by_experiment.

Collapse dataset_items_resolved and dataset_items_aggr_resolved CTEs to
one row per stable dataset_item_id (latest version) via LIMIT 1 BY
dataset_item_id, and drop the now-redundant
di.dataset_version_id = ei.resolved_dataset_version_id constraint from
the LEFT JOINs so each ei row matches the canonical di row regardless of
which version the experiment was linked to.

Implements OPIK-6177: Experiment Comparison is duplicating rows in
direct comparison of two test suites
@github-actions github-actions Bot added java Pull requests that update Java code Backend tests Including test files, or tests related like configuration. labels Apr 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 27, 2026

Python SDK E2E Tests Results (Python 3.14)

245 tests  ±0   243 ✅ ±0   3m 22s ⏱️ ±0s
  1 suites ±0     2 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 060454c. ± Comparison against base commit 3785e8a.

♻️ This comment has been updated with latest results.

…op join

Mirror the same change to the push-top-limit branch's `dataset_items_aggr_resolved`
join. Without it, when comparing aggregated experiments linked to different
dataset versions of the same item, items whose linked version is older than the
canonical (latest) one in `dataset_items_aggr_resolved` produced NULL `di_t.*`
in the JOIN, making `any(di_t.*)` sort expressions unstable and pagination
non-deterministic.

Also removes the now-unused INNER JOIN to `experiment_aggregated_scope_ids` —
its only purpose was to provide `eas_t.resolved_dataset_version_id` for the
predicate.

Caught by reviewer.
@JetoPistola JetoPistola marked this pull request as ready for review April 28, 2026 07:45
@JetoPistola JetoPistola requested a review from a team as a code owner April 28, 2026 07:45
@JetoPistola JetoPistola added the test-environment Deploy Opik adhoc environment label Apr 28, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🔄 Test environment deployment process has started

Phase 1: Deploying base version 2.0.17-5158 (from main branch) if environment doesn't exist
Phase 2: Building new images from PR branch daniel/OPIK-6177-fix-experiment-comparison-duplicate-rows
Phase 3: Will deploy newly built version after build completes

You can monitor the progress here.

@CometActions
Copy link
Copy Markdown
Collaborator

Test environment is now available!

To configure additional Environment variables for your environment, run [Deploy Opik AdHoc Environment workflow] (https://github.com/comet-ml/comet-deployment/actions/workflows/deploy_opik_adhoc_env.yaml)

Access Information

The deployment has completed successfully and the version has been verified.

Copy link
Copy Markdown
Member

@andrescrz andrescrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These change are undoing a previous fix on a confirmed bug. This requires more investigation for better alternatives: #5852

@JetoPistola JetoPistola marked this pull request as draft April 28, 2026 20:17
@JetoPistola
Copy link
Copy Markdown
Contributor Author

JetoPistola commented Apr 28, 2026

@andrescrz thanks for catching this — you're right that this PR's SQL changes are the inverse of #5852. I dug into OPIK-4518 to compare; both bugs are real, and the two PRs are pulling on opposite ends of the same OR-JOIN (di.id = ei.dataset_item_id OR di.row_id = ei.dataset_item_id).

What I'd like to confirm before deciding the path forward: is OPIK-4518's case still reproducible against current production data, or did the SDK-side change in 1.10.51 make experiment_items.dataset_item_id always store the stable id going forward, leaving #5852's SQL as belt-and-suspenders for legacy rows?

A few signals pointing to the latter:

  • The OPIK-6177 user evidence shows populated data/evaluators for both duplicated rows — consistent with both experiments storing stable ids in experiment_items.dataset_item_id (my fix's "single canonical di per stable id" works for them, but they still see the duplicate-rows symptom because of the per-version GROUP BY split, which is what this PR fixes).
  • DatasetItem.java:32-42 documents id as "stable item identifier ... remains the same across dataset versions" and datasetItemId is deprecated, always equal to id.
  • ExperimentItemProcessor.java:71 already passes datasetItem.id() (stable) when creating items.

If new writes are stable-id-only, then the question is whether legacy rows with per-version physical row ids still need to work in the comparison endpoint. If yes, the right fix is probably an upstream resolution CTE that maps either form to the stable id once, then the JOIN simplifies to a single condition (Option 5 below). That's ~10 lines and structurally cleaner than today's OR-JOIN.

If OPIK-4518's case is still actively reproducible today, I'll switch to keeping #5852's structure intact (no cross-version dedup, version-scoped JOIN) and add an outer GROUP BY by stable ei.dataset_item_id with argMax(di.X, di.item_last_updated_at) to collapse the duplicates without breaking anything (Option 2, ~30-40 lines).

I'm marking the PR as draft until we agree on direction. Happy to jump on a 10-min call if it's faster.

image

🤖 Reply posted via /address-github-pr-comments

@CometActions
Copy link
Copy Markdown
Collaborator

🌙 Nightly cleanup: The test environment for this PR (pr-6507) has been cleaned up to free cluster resources. PVCs are preserved — re-deploy to restore the environment.

@CometActions
Copy link
Copy Markdown
Collaborator

🌙 Nightly cleanup: The test environment for this PR (pr-6507) has been cleaned up to free cluster resources. PVCs are preserved — re-deploy to restore the environment.

@andrescrz
Copy link
Copy Markdown
Member

andrescrz commented Apr 30, 2026

Replying to your strategic question

@JetoPistola You called the gate correctly — "are new writes stable-id-only?" — but the mechanism is BE-side, not SDK-side. Stable-id ingestion was implemented entirely in the backend; no SDK upgrade is required, all customers see the new behavior the moment their BE service rolls past the cutover version. So the bounded population of legacy experiment_items rows (those written before that BE deploy) is exactly what we still need to render correctly. That maps to your Option 5.

Why #5852 is load-bearing for legacy, not just guarding

The BE cutover normalized the write path; it didn't retroactively rewrite existing experiment_items.dataset_item_id values. So legacy experiments still reference V1 row_ids. PR #5852's version-scoped JOIN catches those today; this PR's LIMIT 1 BY dataset_item_id + dropped version-equality drops them — any legacy ei.dataset_item_id equal to an older version's row_id (not stable, not latest) joins to nothing and di.* goes NULL. That's the OPIK-4518 symptom returning.

Option 5 is the right shape

Resolution CTE upstream — look up ei.dataset_item_id against dataset_item_versions.id, project dataset_item_versions.dataset_item_id (the stable id), COALESCE to fall back to ei.dataset_item_id if no match. Then:

  • ei_resolved.stable_dataset_item_id is a single, normalized form for both legacy (row_id) and new (stable id) shapes.
  • The downstream JOIN simplifies from (di.id = ei.dataset_item_id OR di.row_id = ei.dataset_item_id) to di.id = ei_resolved.stable_dataset_item_id.
  • LIMIT 1 BY dataset_item_id cross-version dedup is safe because every experiment is keyed on the same stable id regardless of how it was originally written.

Why this fits better than keeping version-scoping forever (Option 2):

  1. Removes the OR-JOIN, which is the actual structural smell — that pattern shows up in 6+ places in the file.
  2. Separates "what does this ei row refer to?" (resolution) from "what's the canonical snapshot?" (dedup + JOIN).
  3. Same UX trade-off as your current diff (latest snapshot in the row's data column, per-experiment ei.input/ei.output preserved in experiment_items_array), but with correct legacy rendering instead of NULLs.
  4. The legacy concern is a bounded, no-inflow population (rows pre-BE-cutover). Locking version-scoping into the schema forever for a transitional concern is over-investment.

Apply to the cousin queries too

ExperimentAggregatesDAO.java:1136,1240 and the stats path at DatasetItemVersionDAO.java:1877 carry the same OR-JOIN + version-equality shape and the same legacy concern. Adopting the resolution CTE in the comparison query only leaves the codebase on two patterns; cleaner end state is to apply it in all three (one shared concept, three usage sites) and drop the version-equality predicates everywhere.

Acceptance test (independent of option choice)

The new regression test passes because createVersionWithDelta's "edited" branch reuses item.id(), so V1 and V2 share their physical id and divergence is never set up. To lock in legacy protection, add a test that drives V2 through editItemsViaSelectInsert or the bulk-edit PUT (calls withAssignedRowIds for editedItems) so V2's physical id ≠ V1's. Then:

  • Create experiment X linked by V1's row_id (ei.dataset_item_id = old row_id).
  • Create experiment Y linked by stable id.
  • Hit the comparison endpoint. Assert one merged row, non-NULL data for both X and Y, per-experiment input preserved.

Without this, a future refactor (or a UUID-collision regression in the resolution CTE) could silently drop legacy rendering and we wouldn't notice.

Re @thiagohora's ORDER BY question — under Option 5, the dedup LIMIT 1 BY dataset_item_id ordered by dataset_version_id DESC still picks "latest version" deterministically because dataset_version_id is a UUIDv7 (via IdGenerator). His underlying question — "why are there duplicated dataset_item_ids to dedup at all" — is answered by versioning, and worth a one-line -- comment in the SQL noting the UUIDv7 monotonicity assumption.

Happy to hop on a call.

🤖 Review posted via /review-github-pr

…experiment-comparison-duplicate-rows

# Conflicts:
#	apps/opik-backend/src/main/java/com/comet/opik/domain/DatasetItemVersionDAO.java
@CometActions
Copy link
Copy Markdown
Collaborator

🌙 Nightly cleanup: The test environment for this PR (pr-6507) has been cleaned up to free cluster resources. PVCs are preserved — re-deploy to restore the environment.

2 similar comments
@CometActions
Copy link
Copy Markdown
Collaborator

🌙 Nightly cleanup: The test environment for this PR (pr-6507) has been cleaned up to free cluster resources. PVCs are preserved — re-deploy to restore the environment.

@CometActions
Copy link
Copy Markdown
Collaborator

🌙 Nightly cleanup: The test environment for this PR (pr-6507) has been cleaned up to free cluster resources. PVCs are preserved — re-deploy to restore the environment.

@JetoPistola
Copy link
Copy Markdown
Contributor Author

JetoPistola commented May 3, 2026

Note

Original "needs guidance" comment, superseded by #issuecomment-4365954666 (Option 5 successfully implemented).

Original content (struck through)

## Update on Option 5 implementation attempt

@andrescrz @thiagohora — I spent significant time across two sessions attempting Option 5 (the resolution CTE you recommended) and hit a hard ClickHouse interaction that I can't resolve alone. Honest writeup below; I'd appreciate guidance.

### Approach taken

Following your strategic comment, the intended design was:

1. New ei_id_lookup CTE projecting (div.id AS lookup_row_id, div.dataset_item_id AS lookup_stable_id) — deduped via LIMIT 1 BY div.id ordered by (workspace_id, dataset_id, dataset_version_id, id) DESC.
2. LEFT JOIN that CTE in experiment_items_scope (or experiment_items_final) to project COALESCE(lookup.lookup_stable_id, ei.dataset_item_id) AS stable_dataset_item_id.
3. Downstream: replace ei.dataset_item_id with ei.stable_dataset_item_id in the GROUP BY of both UNION branches; drop the per-version di.* columns from GROUP BY and replace with argMax(di.X, di.item_last_updated_at) so version-divergent attributes don't split rows.
4. Same pattern in the count query, stats path, and ExperimentAggregatesDAO lines 1136/1240.

### What works

With the resolution CTE in experiment_items_scope (single change, no downstream modifications): 92 of 93 comparison-endpoint tests pass, including all 30 sorting tests, all 7 assertion-result tests, and both deletion-cascade tests. The only failure is the OPIK-6177 regression test itself — expected, since I haven't yet propagated stable_dataset_item_id to the GROUP BY.

### What breaks

Once I add the GROUP BY changes (use ei.stable_dataset_item_id + drop di.* from GROUP BY + argMax wrapping), the OPIK-6177 regression test passes and all sorting/assertion tests still pass — but 2 deletion-cascade tests regress:

- find__whenTestSuiteIsDeleted__thenExperimentItemsShouldStillBeRetrievable
- findWithDeletedDatasetItems

Both fail with expected: 1 but was: 0 (no SQL error — query succeeds, returns zero rows). The scenario these test: dataset_item_versions has been physically deleted but experiment_items rows survive; the comparison endpoint should still return synthetic rows with NULL di.* fields.

### Root cause (best understanding)

The LEFT JOIN to ei_id_lookup drops rows when the lookup CTE produces zero rows (which is exactly what happens after a dataset is deleted: SELECT FROM dataset_item_versions WHERE dataset_id = :datasetId returns empty). Removing the LEFT JOIN — or replacing with stable_dataset_item_id = ei.dataset_item_id — makes the deletion tests pass again. So the JOIN to an empty derived CTE is doing something the analyzer treats as filtering rather than null-extending.

### Workarounds I tried (none worked)

1. ANY LEFT JOIN instead of LEFT JOIN
2. UNION ALL sentinel row in ei_id_lookup so the CTE is never empty (impossible-UUID row)
3. Inlined subquery in FROM clause instead of named CTE
4. Moved the JOIN from experiment_items_scope to experiment_items_final
5. Explicit ei.id AS id, ... aliasing on every projected column (helped fix one analyzer error, didn't fix this one)
6. SETTINGS allow_experimental_analyzer = 0 for this query — broke other tests

### What I need

I think this needs your eyes — you wrote #5852 and know this query intimately. Three concrete options:

(a) Accept Option 5 isn't viable in this exact code path. Ship the original OPIK-6177 fix (currently on the branch: LIMIT 1 BY dataset_item_id + drop version-equality predicates), file a follow-up ticket for the OPIK-4518 legacy regression with explicit risk acceptance based on prevalence of legacy row_id writes.

(b) Pair on it — there's likely a JOIN shape that works (maybe involving dictGet / joinGet or restructuring as IN + scalar lookup) that I'm not seeing. 30 min over Zoom would probably crack it.

~~(c) Pivot to Option 2 (argMax wrap over #5852's existing structure, no new CTE). Verbose (~30-40 lines per query site, ~6 sites) but doesn't introduce the lookup JOIN at all. The verbosity is what made me skip it initially; given the analyzer issues, that may have been the wrong call.~~

Currently the branch contains the original OPIK-6177 fix (the one that regresses OPIK-4518). PR is in draft. Happy to revert to a clean origin/main starting point if you'd prefer to take the wheel.

Sorry for the length of this thread; wanted to be transparent about where the time went rather than ship something I'm not confident in.

🤖 Reply posted via /address-github-pr-comments

…to stable id upstream

Replaces this PR's earlier OPIK-6177 approach (which regressed OPIK-4518's
legacy row_id case per #5852) with andrescrz's Option 5 design from the
review. The resolution CTE pattern produces a `stable_dataset_item_id` for
every experiment_items row regardless of whether ei.dataset_item_id holds
the stable id (modern writes, post-OPIK-4518 BE cutover) or a per-version
dataset_item_versions.id (legacy writes pre-cutover).

Resolution shape: inline LEFT JOIN to dataset_item_versions FINAL keyed on
(workspace_id, id) — the per-version row PK. For modern rows the JOIN
misses and `if(notEmpty(lookup_div.dataset_item_id), …, ei.dataset_item_id)`
falls back to ei.dataset_item_id (already the stable id). For legacy rows
the JOIN matches and projects div.dataset_item_id (the stable id).

A standalone `ei_id_lookup` CTE was tried first but caused ClickHouse's
analyzer to drop left rows when the CTE materialized empty (deletion-cascade
scenarios where dataset_item_versions has no rows for the dataset). Direct
table reference avoids that.

Downstream JOIN simplifies from
  `(di.id = ei.dataset_item_id OR di.row_id = ei.dataset_item_id)
   AND di.dataset_version_id = ei.resolved_dataset_version_id`
to the single condition
  `di.id = ei.stable_dataset_item_id`,
removing the OR-JOIN structural smell from 5 sites in the comparison row +
count + push-top + stats queries plus 2 sites in ExperimentAggregatesDAO,
and dropping #5852's per-experiment version-equality predicate (it is no
longer needed once resolution happens upstream).

GROUP BY in both UNION branches now keys on stable_dataset_item_id; the
existing `LIMIT 1 BY dataset_item_id` in dataset_items_*resolved gives one
canonical (latest-version) `di` row per stable id, so per-version `di.*`
columns in GROUP BY are functionally determined by stable_dataset_item_id
and don't split rows.

`buildTopItemsSorting`/`getTopSortExpression` updated to project the
resolved expression for sorting fields that reference the experiment-side
key.

Implements Option 5 as agreed with @andrescrz in the PR review thread.
Fixes the OPIK-6177 duplicate-row symptom AND restores OPIK-4518's legacy
row_id rendering in a single, unified shape.

All 549 comparison-endpoint + stats + ExperimentAggregatesIntegration tests
pass (1 pre-existing flake unrelated to these changes).
… CTE

Address PR #6507 baz-reviewer finding (comment 3190434359) and
thiagohora's confirmation (comment 3191033224): after collapsing
eligible_dataset_item_lookup, count's arrayJoin([id, row_id]) FROM
dataset_items_agg_resolved (LIMIT 1 BY dataset_item_id) only emits
the LATEST version's row_id per item, so legacy EIA rows referencing
an OLDER version's row_id (pre-OPIK-6177 + post-version-bump) fail
the IN-check and undercount. Push-top row branch is unaffected because
its lookup_div FINAL handles all versions.

Add a lookup_for_count CTE that narrows by <dataset_item_filters>
first then INNER JOINs dataset_item_versions FINAL, emitting both
div.id (every version's row_id) and the stable id. Bloom skip-index
on dataset_item_id helps because the inner join is dataset_item_id-
narrowed (unlike the previously-removed eligible CTE which was
version-only-narrowed and couldn't prune). Bench (per thiagohora):
count 1.964s vs 1.827s (+0.14s) and full cross-version correctness.

Same shape applied to row query !push_top_limit branch (CTE name
shared because the queries are separate StringTemplate constants).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Backend Tests - Integration Group 5

145 tests   142 ✅  2m 54s ⏱️
 28 suites    3 💤
 28 files      0 ❌

Results for commit 1c34a84.

♻️ This comment has been updated with latest results.

@JetoPistola JetoPistola added test-environment Deploy Opik adhoc environment labels May 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

🔄 Test environment deployment process has started

Phase 1: Deploying base version 2.0.22-5251 (from main branch) if environment doesn't exist
Phase 2: Building new images from PR branch daniel/OPIK-6177-fix-experiment-comparison-duplicate-rows
Phase 3: Will deploy newly built version after build completes

You can monitor the progress here.

@CometActions
Copy link
Copy Markdown
Collaborator

Test environment is now available!

To configure additional Environment variables for your environment, run [Deploy Opik AdHoc Environment workflow] (https://github.com/comet-ml/comet-deployment/actions/workflows/deploy_opik_adhoc_env.yaml)

Access Information

The deployment has completed successfully and the version has been verified.

thiagohora
thiagohora previously approved these changes May 6, 2026
Copy link
Copy Markdown
Contributor

@thiagohora thiagohora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job here!

The eligible_dataset_item_lookup CTE was replaced by lookup_for_count
in 1c34a84. Update the two stale Javadoc/inline comment blocks in
DatasetItemVersionDAO to reference the new CTE name and capture the
narrow-by-filter / arrayJoin shape that covers all-version row_ids.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@andrescrz andrescrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's clarify how eligible_dataset_item_lookup is referred and used and the logic in DatasetItemVersionDAO seems to rely on it.

Comment on lines +665 to +671
/**
* Get experiment items with cursor pagination
*/
// OPIK-6177: resolves ei.dataset_item_id (legacy per-version DIV row id or modern
// stable id) to the stable dataset_item_id at aggregation time. Compare reads still
// resolve at read time too (see DatasetItemVersionDAO), so this is best-effort hygiene
// — over time it phases out legacy values from EIA without a backfill migration.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: better integrate this new comment into the javadoc above.

This applies to other comments below, which fit better as javadocs, whether we have a previous one or not.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 5a56e2f addressed this comment by moving the inline note into the Javadoc for GET_EXPERIMENT_ITEMS (and similar comments), properly integrating the explanation about OPIK-6177 into the existing documentation block.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5a56e2f — promoted the OPIK-6177 // inline comment blocks above the relevant SQL constants into Javadoc in both DAOs:

  • DatasetItemVersionDAO.SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS and _COUNT
  • ExperimentAggregatesDAO.GET_EXPERIMENT_ITEMS, SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_COUNT, and the row variant

Each Javadoc now captures the OPIK-6177 stable-id resolution rationale, the lookup_for_count CTE shape, and (post your #2 nit) the no-FINAL immutability assumption.

🤖 Reply posted via /address-github-pr-comments

ei.last_updated_by AS last_updated_by,
ei.execution_policy AS execution_policy
FROM experiment_items AS ei
LEFT JOIN dataset_item_versions AS lookup_div FINAL
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: you could try to void the FINAL in dataset_item_versions if you verify and assume that dataset_item_id is not mutable in practice. The rest of retrieved fields are enforced to be mutable as they're part of the sort/primary key. Anyway, this is the safest approach from the functional point of view.

This comments applies to other similar final calls below for this particular table.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 5a56e2f addressed this comment by removing the FINAL qualifiers from the dataset_item_versions joins/queries in ExperimentAggregatesDAO, aligning with the assumption that dataset_item_id is effectively stable and avoiding the unnecessary finalization in those lookups.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5a56e2f — verified the immutability assumption holds for dataset_item_versions and dropped FINAL on every read in both DAOs.

Why it's safe:

  • The table is a ReplicatedReplacingMergeTree ordered by (workspace_id, dataset_id, dataset_version_id, id) with last_updated_at as version.
  • Every writer generates a fresh per-row id:
    • BATCH_INSERT_ITEMS:id<index> from a fresh UUID each
    • EDIT_ITEM_VIA_SELECT_INSERT:newId for id, :newVersionId for dataset_version_id
    • COPY_VERSION_ITEMS (version copy / fork / restore) — fresh :uuids pool, original id is NOT preserved
  • The legacy seed COPY_ITEMS_FROM_LEGACY does preserve id from the dataset_items draft table, but it's double-guarded by DatasetVersioningMigrationService.ensureDatasetMigrated (fast hasVersions check + Redis lock + transactional re-check) and runs at most once per dataset, ever.

So no path re-INSERTs the same (workspace_id, dataset_id, dataset_version_id, id) tuple. dataset_item_id (the only retrieved non-key column) is immutable per row PK and pre-merge duplicates can't arise — FINAL was theoretical defense.

Documented the assumption in the SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS Javadoc with a note for future maintainers to restore FINAL if a future writer ever introduces an idempotent re-INSERT that updates non-key columns.

13 FINAL calls dropped (9 in DatasetItemVersionDAO, 4 in ExperimentAggregatesDAO). Tests still green: 113/113 compare-experiments + 138/138 EAI.

🤖 Reply posted via /address-github-pr-comments

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction — my prior reply was wrong, and CI caught it. Restored FINAL in b86e7a0.

What I missed: PUT /datasets/items is an upsert flow that re-INSERTs the same (workspace_id, dataset_id, dataset_version_id, id) tuple when a client PUTs the same item ids twice with changed data / description / tags / evaluators / executionPolicy. The endpoint contract says: "Each item's id field is the stable identifier and upsert key" — BATCH_INSERT_ITEMS binds :id<i> = item.id().toString(), reusing the input id, so re-INSERTs of the same row PK are routine. Pre-merge duplicates exist on this table by design; FINAL is required so reads see only the latest row per PK.

The CI failure was DatasetsResourceTest$GetDatasetItemsByDatasetId.getDatasetItemsByDatasetId__whenItemsWereUpdated__thenReturnCorrectItemsCount — PUTs items twice with changed data and asserts SELECT_COLUMNS_BY_VERSION returns only the latest version's columns. Without FINAL, the query saw both pre-merge rows and unioned their column_types (5 expected → 13 actual).

Refined rule for this table: drop FINAL only where the read manually replicates ReplacingMergeTree's dedup via ORDER BY (..., id) DESC, last_updated_at DESC + LIMIT 1 BY <id-or-stable-id> (matches the engine's last_updated_at tiebreaker). Everywhere else (raw LEFT JOIN on id, raw aggregations without that idiom — like SELECT_COLUMNS_BY_VERSION), FINAL is load-bearing. Updated the SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS Javadoc to capture this.

Net diff vs e45d82b (parent of the failed FINAL drop): kept the dead-CTE removal in ExperimentAggregatesDAO and the Javadoc promotions, restored every FINAL keyword that was there before. Tests now green: 113/113 compare-experiments + 27/27 GetDatasetItemsByDatasetId (incl. the previously-failing test) + 138/138 EAI.

🤖 Reply posted via /address-github-pr-comments

LIMIT 1 BY div.id
) AS div_dedup
),
eligible_dataset_item_lookup AS (
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this CTE used?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 5a56e2f addressed this comment by removing the eligible_dataset_item_lookup CTE altogether and wiring the queries to the dataset_item_versions lookup directly, so the previously questioned CTE is no longer present anywhere in the diff.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5a56e2f — good catch. The eligible_dataset_item_lookup CTE here was dead code — defined but never referenced in the count query's IN predicate. It was a leftover from the Alt 7 shape that was iterated on in DatasetItemVersionDAO (where the IN predicate did use it, until it was further refactored into lookup_for_count), but in this file the corresponding wire-up never landed.

Removed the dead CTE definition (lines 1143-1155 pre-fix). The actual stable-id resolution in this query is the lookup_div FINAL LEFT JOIN at the bottom + the if(notEmpty(...)) ... AS stable_dataset_item_id projection, which the new Javadoc now captures. (FINAL was also dropped per your #2 comment after verifying the immutability assumption.)

This file is the dev/test parity harness (countDatasetItemsWithExperimentItemsFromAggregates, only called by ExperimentAggregatesIntegrationTest); the production count path lives in DatasetItemVersionDAO.

🤖 Reply posted via /address-github-pr-comments

LIMIT 1 BY div.id
) AS div_dedup
),
eligible_dataset_item_lookup AS (
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, where is this CTE used from?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit e45d82b addressed this comment by adding documentation that spells out how the lookup_for_count CTE is consumed by the aggregated branch (and why the push_top_limit branch skips it), clarifying where that CTE is used.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5a56e2f — same dead-code situation as the count-query CTE you flagged in your previous comment. The row-query eligible_dataset_item_lookup here was also defined but never referenced — leftover from the Alt 7 iteration that didn't end up wiring an IN predicate in this file. Removed.

The actual stable-id resolution in this row query is lookup_div LEFT JOIN + if(notEmpty(...)) projection, captured in the new Javadoc that points back to _COUNT's rationale.

This row method (getDatasetItemsWithExperimentItemsFromAggregates) is also only called by the parity harness (ExperimentAggregatesIntegrationTest).

🤖 Reply posted via /address-github-pr-comments

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Backend Tests - Integration Group 1

420 tests   420 ✅  13m 33s ⏱️
 23 suites    0 💤
 23 files      0 ❌

Results for commit dea6798.

♻️ This comment has been updated with latest results.

Address PR #6507 andrescrz feedback (comments 3194973428, 3195000400,
3195054140, 3195055228):

1. Remove dead eligible_dataset_item_lookup CTE definitions in
   ExperimentAggregatesDAO (the count + row parity-harness queries)
   that were never referenced in their SQL — leftover from the Alt 7
   shape that didn't end up wiring the IN predicate in this file.
2. Promote OPIK-6177 // inline comment blocks above SQL constants
   into proper Javadoc in both files (per andrescrz's nit).
3. Drop FINAL on every dataset_item_versions read in both files. The
   table is a ReplicatedReplacingMergeTree where every writer generates
   a fresh per-row id (BATCH_INSERT_ITEMS, EDIT_ITEM_VIA_SELECT_INSERT,
   COPY_VERSION_ITEMS) and the one-shot COPY_ITEMS_FROM_LEGACY is
   double-guarded by DatasetVersioningMigrationService.ensureDataset
   Migrated, so no write path re-INSERTs the same (workspace_id,
   dataset_id, dataset_version_id, id) tuple. Pre-merge duplicates
   can't arise; FINAL was theoretical defense. Documented in the
   SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS Javadoc with a
   note for future maintainers to restore FINAL if any non-key column
   becomes mutable via re-INSERT.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread apps/opik-backend/src/main/java/com/comet/opik/domain/DatasetItemVersionDAO.java Outdated
@JetoPistola JetoPistola added test-environment Deploy Opik adhoc environment and removed test-environment Deploy Opik adhoc environment labels May 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

🔄 Test environment deployment process has started

Phase 1: Deploying base version 2.0.22-5251 (from main branch) if environment doesn't exist
Phase 2: Building new images from PR branch daniel/OPIK-6177-fix-experiment-comparison-duplicate-rows
Phase 3: Will deploy newly built version after build completes

You can monitor the progress here.

@CometActions
Copy link
Copy Markdown
Collaborator

Test environment is now available!

To configure additional Environment variables for your environment, run [Deploy Opik AdHoc Environment workflow] (https://github.com/comet-ml/comet-deployment/actions/workflows/deploy_opik_adhoc_env.yaml)

Access Information

The deployment has completed successfully and the version has been verified.

Revert the FINAL drop from 5a56e2f. The immutability assumption
documented there was wrong: PUT /datasets/items is a real upsert flow
that re-INSERTs the same (workspace_id, dataset_id, dataset_version_id,
id) tuple when a client PUTs the same item ids twice with changed data,
description, tags, evaluators, or execution_policy. Pre-merge
duplicates are routine on this table; FINAL is required so reads see
only the latest row per PK.

CI surfaced this via DatasetsResourceTest$GetDatasetItemsByDatasetId.
getDatasetItemsByDatasetId__whenItemsWereUpdated__thenReturnCorrectItemsCount
which PUTs items twice with changed data and asserts that
SELECT_COLUMNS_BY_VERSION returns only the latest version's columns —
without FINAL the query saw both pre-merge rows.

Restored FINAL only where reads don't already dedupe via
LIMIT 1 BY dataset_item_id ORDER BY (..., dvid) DESC, last_updated_at
DESC (those subqueries pick the latest row themselves so FINAL is
redundant on the inner scan). Updated the
SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS Javadoc to capture
that FINAL is load-bearing for this table given the upsert contract.

Keeps the dead eligible_dataset_item_lookup CTE removal and Javadoc
promotions from 5a56e2f — those remain correct.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JetoPistola JetoPistola added test-environment Deploy Opik adhoc environment and removed test-environment Deploy Opik adhoc environment labels May 6, 2026
@JetoPistola JetoPistola added test-environment Deploy Opik adhoc environment and removed test-environment Deploy Opik adhoc environment labels May 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

🔄 Test environment deployment process has started

Phase 1: Deploying base version 2.0.23-5256 (from main branch) if environment doesn't exist
Phase 2: Building new images from PR branch daniel/OPIK-6177-fix-experiment-comparison-duplicate-rows
Phase 3: Will deploy newly built version after build completes

You can monitor the progress here.

@CometActions
Copy link
Copy Markdown
Collaborator

Test environment is now available!

To configure additional Environment variables for your environment, run [Deploy Opik AdHoc Environment workflow] (https://github.com/comet-ml/comet-deployment/actions/workflows/deploy_opik_adhoc_env.yaml)

Access Information

The deployment has completed successfully and the version has been verified.

Comment on lines +684 to +688
FROM experiment_items AS ei
LEFT JOIN dataset_item_versions AS lookup_div FINAL
ON lookup_div.workspace_id = ei.workspace_id
AND lookup_div.id = ei.dataset_item_id
WHERE ei.workspace_id = :workspace_id
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lookup_div FINAL in GET_EXPERIMENT_ITEMS uses an unbounded bare FINAL scan (violating .agents/skills/opik-backend/clickhouse.md) and duplicates the join+fallback logic from DatasetItemVersionDAO — should we refactor into a bounded/pre-deduped CTE shared with DatasetItemVersionDAO, or document an explicit exception if the bare FINAL is intentional?

Finding types: AI Coding Guidelines Code Dedup and Conventions | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In
apps/opik-backend/src/main/java/com/comet/opik/domain/experiments/aggregations/ExperimentAggregatesDAO.java
around lines 684-688 inside the `GET_EXPERIMENT_ITEMS` SQL string: 1. The `LEFT JOIN
dataset_item_versions AS lookup_div FINAL` performs an unbounded bare `FINAL` scan,
violating the monotonic-bound guidance from `.agents/skills/opik-backend/clickhouse.md`,
and can make cursor paging pathfully unbounded/expensive per page. 2. The same
join+fallback logic is duplicated from `DatasetItemVersionDAO`. Refactor this stable-id
normalization into a shared SQL fragment/CTE that: - Constrains by
workspace/dataset/version and uses `LIMIT 1 BY`/deduping instead of bare `FINAL` - Can
be reused by both `GET_EXPERIMENT_ITEMS` and `DatasetItemVersionDAO` to stay in sync If
a bare `FINAL` exception is truly required, add a Javadoc comment near the
`GET_EXPERIMENT_ITEMS` constant explicitly documenting why the monotonic-bound FINAL
rule doesn't apply here, referencing the specific clickhouse guideline.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping — lookup_div FINAL here is load-bearing for upsert pre-merge dedup, and the suggested mitigations don't fit this case:

  • Pre-deduped CTE / LIMIT 1 BY dataset_item_id breaks legacy id resolution. lookup_div.id = ei.dataset_item_id may match a per-version row_id from ANY past version (pre-OPIK-4518 backwards-compat), so we must keep all versions in the lookup; pre-deduping to "latest per stable item" loses the older rows we need to map back to the stable id.
  • last_updated_at time bound is impractical: legacy experiment_items rows can be arbitrarily old, so the bound would have to span the dataset's full lifetime.
  • JOIN is already row-PK-narrowed via lookup_div.id = ei.dataset_item_id (equality on the row PK), so this isn't a full-table FINAL — the FINAL is bounded by skip-index pushdown on the PK.
  • Empirical: we just learned the hard way that dropping FINAL on this table breaks correctness. CI surfaced it via check-runs/74608249821SELECT_COLUMNS_BY_VERSION returned 13 columns instead of 5 because the upsert flow re-INSERTs the same (ws, ds, dvid, id) tuple with new data, so pre-merge duplicates exist by design on this table. Restored in b86e7a0b62.

@andrescrz — flagging this for your re-review since you raised the original FINAL concern. We could add a Javadoc note here explicitly documenting the exception to the clickhouse.md rule if you'd prefer that over the current state, or look into a different approach entirely.

🤖 Reply posted via /address-github-pr-comments

Comment on lines 443 to +447
FROM experiment_items ei
INNER JOIN experiments_resolved e ON e.id = ei.experiment_id
LEFT JOIN dataset_item_versions AS lookup_div FINAL
ON lookup_div.workspace_id = ei.workspace_id
AND lookup_div.id = ei.dataset_item_id
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lookup_div FINAL reads dataset_item_versions without a monotonic time bound, but the clickhouse.md guidance says bare FINAL on large tables should be time-bounded—since this stable-id join for legacy ei.dataset_item_id would miss older legacy rows, can we either add an explicit exception for this path or switch to a pre-deduped lookup strategy?

Finding type: AI Coding Guidelines | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In
`apps/opik-backend/src/main/java/com/comet/opik/domain/DatasetItemVersionDAO.java`
around lines 443-447 inside the
`SELECT_DATASET_ITEM_VERSIONS_WITH_EXPERIMENT_ITEMS_COUNT` query (the
`experiment_items_scope` CTE), the new `LEFT JOIN dataset_item_versions AS lookup_div
FINAL` performs a bare `FINAL` scan on a large table without any monotonic/time bound,
violating the stated ClickHouse guidance. Refactor this join to avoid `FINAL` by
replacing `lookup_div` with a pre-deduped lookup subquery/CTE that returns the latest
row per `dataset_item_id` using `ORDER BY last_updated_at DESC LIMIT 1 BY
dataset_item_id`, and restrict that subquery to the smallest possible scope (at least
`workspace_id` and the relevant dataset/version resolution) so correctness of stable-id
resolution is preserved. If you cannot safely replace the join, add an explicit in-file
exception comment that cites the guideline and explains why this hot path is
intentionally exempt.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping — same rationale as my reply on the GET_EXPERIMENT_ITEMS thread. The lookup_div FINAL here is load-bearing for upsert pre-merge dedup; pre-deduping the lookup table breaks legacy id resolution; a last_updated_at time bound doesn't fit because legacy ei.dataset_item_id rows can be arbitrarily old; the JOIN is already row-PK-narrowed via equality on lookup_div.id. CI (check-runs/74608249821) confirmed dropping FINAL is unsafe on this table; restored in b86e7a0b62.

@andrescrz — same flag for your re-review here; happy to add a Javadoc exception note or take a different approach if you'd prefer.

🤖 Reply posted via /address-github-pr-comments

@JetoPistola
Copy link
Copy Markdown
Contributor Author

@andrescrz — could you take another look when you have a moment? Three things since your last review:

  1. [CM-7070] llm sdk allow users to logs basic llm prompts and answers #1 Javadoc nit (discussion_r3194973428) — promoted // OPIK-6177 inline blocks into Javadoc in both DAOs ✓
  2. [NA] debug deploy #3 + Update log_prompt docstring #4 dead CTE (discussion_r3195054140, discussion_r3195055228) — the eligible_dataset_item_lookup CTE in ExperimentAggregatesDAO was indeed dead code (defined but never referenced). Removed ✓
  3. Update README.md #2 FINAL (discussion_r3195000400) — I tried dropping FINAL (5a56e2fc5a), CI surfaced that the upsert flow (PUT /datasets/items) re-INSERTs the same row PK with new data so pre-merge duplicates are routine on dataset_item_versions. Reverted to keep FINAL in b86e7a0b62. Two follow-up baz comments (discussion_r3195517027, discussion_r3195517034) cite the clickhouse.md "avoid bare FINAL" guideline; my replies there explain why the suggested mitigations don't fit (legacy id resolution can't pre-narrow; time-bound is impractical). Open to adding an explicit Javadoc exception note or a different approach if you prefer.

Tests green: 113/113 compare-experiments (incl. cross-version + cross-branch regression tests) + 138/138 EAI + 27/27 GetDatasetItemsByDatasetId.

🤖 Reply posted via /address-github-pr-comments

@JetoPistola JetoPistola requested a review from andrescrz May 6, 2026 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend java Pull requests that update Java code test-environment Deploy Opik adhoc environment tests Including test files, or tests related like configuration.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants