Skip to content

[OPIK-4518] [BE] Restore stable dataset item IDs in versioned API#5370

Open
andrescrz wants to merge 20 commits intomainfrom
andrescrz/OPIK-4518-fix-dataset-versioning-dataset-items-ids
Open

[OPIK-4518] [BE] Restore stable dataset item IDs in versioned API#5370
andrescrz wants to merge 20 commits intomainfrom
andrescrz/OPIK-4518-fix-dataset-versioning-dataset-items-ids

Conversation

@andrescrz
Copy link
Member

@andrescrz andrescrz commented Feb 23, 2026

Details

Makes the id field in the DatasetItem API return the stable dataset_item_id instead of the version-specific physical ClickHouse row id. Previously, id changed every time an item was included in a new dataset version, which was confusing — users expect id to be a stable identifier. Now id remains the same across versions, and the deprecated dataset_item_id field always equals id.

Key changes:

  • All SQL queries alias dataset_item_id AS id so the stable ID is returned as the API's id field
  • Dual-join backward compatibility: all experiment-item queries match on both physical row ID and stable ID via OR conditions, so old experiment items (storing row IDs) and new ones (storing stable IDs) both resolve correctly — no data migration required
  • arrayJoin([id, row_id]) pattern avoids duplicate CTE evaluation in ClickHouse
  • Two-level dedup in all cross-version CTEs (inner LIMIT 1 BY id for ReplacingMergeTree, outer LIMIT 1 BY dataset_item_id for cross-version)
  • Removed the mapRowIdsToDatasetItemIds indirection layer, simplifying the service layer significantly (-269 lines net)
  • TraceDAO uses if(div.id != '', ...) instead of COALESCE due to ClickHouse FixedString NULL behavior on LEFT JOINs
  • Skip indexes (bloom_filter + minmax) on dataset_item_id for queries that lack full primary key prefix
  • New resolveDatasetIdsFromItemIds with explicit multi-dataset validation

Change checklist

  • User facing
  • Documentation update

Issues

  • OPIK-4518

AI-WATERMARK

AI-WATERMARK: yes

  • If yes:
    • Tools: Claude Code CLI
    • Model(s): Claude Opus 4.6
    • Scope: Implementation, SQL query design, code review, test writing
    • Human verification: Extensive — multiple rounds of code review, all tests passing, manual analysis of ClickHouse query plans and CTE evaluation patterns

Testing

  • mvn test -Dtest="DatasetVersionResourceTest" — all tests pass including:
    • Stable IDs across versions (putItems__whenSameItemInMultipleVersions__thenStableIdsAcrossVersions)
    • CRUD operations with stable IDs (get by ID, patch, delete, batch update, applyChanges)
    • Cursor-based streaming pagination
    • Multi-dataset rejection for delete and batch update (deleteItems__whenItemIdsSpanMultipleDatasets__thenReturn400, batchUpdate__whenItemIdsSpanMultipleDatasets__thenReturn400)
    • datasetItemId == id invariant assertion
  • mvn test -Dtest="TracesResourceTest$ExperimentItemReferenceTest" — all 5 tests pass (experiment reference resolution via TraceDAO)
  • mvn test -Dtest="ExperimentsResourceTest$InsertExperimentItems" — workspace validation test passes
  • Full DatasetsResourceTest suite passes (experiment comparison view, stats, output columns, sorting, filtering)

Documentation

API schema descriptions updated in-code for DatasetItem.id, DatasetItem.datasetItemId (deprecated), DatasetItemEdit.id, and DatasetsResource operation description. No external documentation changes needed.

@github-actions github-actions bot added java Pull requests that update Java code Backend tests Including test files, or tests related like configuration. labels Feb 23, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 23, 2026

Backend Tests Results

  439 files    439 suites   1h 6m 52s ⏱️
6 894 tests 6 881 ✅ 13 💤 0 ❌
6 746 runs  6 733 ✅ 13 💤 0 ❌

Results for commit 16c7883.

♻️ This comment has been updated with latest results.

@andrescrz andrescrz force-pushed the andrescrz/OPIK-4518-fix-dataset-versioning-dataset-items-ids branch from 387959d to 7dceef3 Compare February 27, 2026 17:05
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Feb 27, 2026
@comet-ml comet-ml deleted a comment from github-actions bot Feb 27, 2026
@comet-ml comet-ml deleted a comment from github-actions bot Feb 27, 2026
@andrescrz andrescrz force-pushed the andrescrz/OPIK-4518-fix-dataset-versioning-dataset-items-ids branch from 7dceef3 to 3611291 Compare February 27, 2026 17:26
@andrescrz andrescrz force-pushed the andrescrz/OPIK-4518-fix-dataset-versioning-dataset-items-ids branch from 3611291 to d168de1 Compare March 2, 2026 10:26
@andrescrz andrescrz force-pushed the andrescrz/OPIK-4518-fix-dataset-versioning-dataset-items-ids branch from d168de1 to 16c7883 Compare March 2, 2026 12:21
@andrescrz andrescrz force-pushed the andrescrz/OPIK-4518-fix-dataset-versioning-dataset-items-ids branch from 16c7883 to 211832c Compare March 12, 2026 17:24
@github-actions
Copy link
Contributor

github-actions bot commented Mar 12, 2026

Backend Tests - Unit Tests

1 433 tests   1 431 ✅  54s ⏱️
  170 suites      2 💤
  170 files        0 ❌

Results for commit b1c2367.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 12, 2026

Backend Tests - Integration Group 5

110 tests   110 ✅  2m 19s ⏱️
 22 suites    0 💤
 22 files      0 ❌

Results for commit b1c2367.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 12, 2026

Backend Tests - Integration Group 11

159 tests   157 ✅  2m 42s ⏱️
 21 suites    2 💤
 21 files      0 ❌

Results for commit b1c2367.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 12, 2026

Backend Tests - Integration Group 7

244 tests   244 ✅  1m 59s ⏱️
 24 suites    0 💤
 24 files      0 ❌

Results for commit b1c2367.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 12, 2026

Backend Tests - Integration Group 16

 18 files   18 suites   2m 13s ⏱️
131 tests 131 ✅ 0 💤 0 ❌
110 runs  110 ✅ 0 💤 0 ❌

Results for commit b1c2367.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 12, 2026

Backend Tests - Integration Group 15

 31 files   31 suites   3m 2s ⏱️
193 tests 193 ✅ 0 💤 0 ❌
175 runs  175 ✅ 0 💤 0 ❌

Results for commit b1c2367.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 12, 2026

Backend Tests - Integration Group 8

288 tests   288 ✅  4m 16s ⏱️
 24 suites    0 💤
 24 files      0 ❌

Results for commit b1c2367.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 12, 2026

Backend Tests - Integration Group 12

186 tests   185 ✅  3m 18s ⏱️
 32 suites    1 💤
 32 files      0 ❌

Results for commit b1c2367.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 12, 2026

Backend Tests - Integration Group 6

1 129 tests   1 129 ✅  6m 41s ⏱️
    7 suites      0 💤
    7 files        0 ❌

Results for commit b1c2367.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 12, 2026

Backend Tests - Integration Group 10

 22 files   22 suites   6m 50s ⏱️
220 tests 218 ✅ 2 💤 0 ❌
183 runs  181 ✅ 2 💤 0 ❌

Results for commit b1c2367.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 12, 2026

Backend Tests - Integration Group 14

185 tests   185 ✅  1m 41s ⏱️
 19 suites    0 💤
 19 files      0 ❌

Results for commit b1c2367.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 12, 2026

Backend Tests - Integration Group 1

461 tests  ±0   461 ✅ ±0   14m 6s ⏱️ +27s
 25 suites ±0     0 💤 ±0 
 25 files   ±0     0 ❌ ±0 

Results for commit b62b504. ± Comparison against base commit aad988f.

♻️ This comment has been updated with latest results.

andrescrz and others added 14 commits March 23, 2026 11:02
The dual-join approach handles both old (physical row ID) and new
(stable dataset_item_id) experiment items at query time, making
the manual migration unnecessary.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
All queries joining experiment_items.dataset_item_id with
dataset_item_versions now match on both physical row id AND stable
dataset_item_id, so old experiment items (storing row IDs) and new
ones (storing stable IDs) both resolve correctly without requiring
a data migration.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Avoid collision with main's 000065-000069 added since last rebase.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Main now has 000070-000072, so bump to next available.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
ClickHouse FixedString(36) columns return null bytes (not SQL NULL)
on LEFT JOIN miss, so COALESCE never falls through. Use if(div.id != '')
to check whether the join actually matched.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Defer the dataset_item_id projection to a follow-up. The skip indexes
(000073) provide sufficient coverage for now. The projection can be
added later if large-version performance requires it.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Consolidate OR-expanded filter conditions that referenced the same CTE
twice into single arrayJoin([col1, col2]) calls. Each CTE is now
evaluated exactly once per usage site.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Use idx_{table}_{column} pattern: idx_dataset_item_versions_dataset_item_id_bf
and idx_dataset_item_versions_dataset_item_id_minmax.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Revert to main's simpler query that resolves trace_ids directly from
experiment_items without joining dataset_item_versions. The dual-join
treatment is unnecessary here — output column discovery only needs
trace_ids, not dataset item mapping.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Add LIMIT 1 BY dataset_item_id to dataset_items_agg_resolved,
  dataset_items_aggr_resolved, and ExperimentAggregatesDAO's
  dataset_item_versions_resolved CTEs. Without this, the OR-condition
  joins could match one experiment item to multiple dataset item rows
  from different versions, inflating groupArray results.
- Remove orphaned Javadoc from deleted validateMappingsBelongToSameDataset
- Rename deletedRowIds to deletedIds (now holds stable IDs, not row IDs)

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The dataset_item_filters in item_agg selected only physical id,
missing new experiment items that store stable dataset_item_id.
Use arrayJoin([id, dataset_item_id]) and add two-level dedup.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Main added 000073_add_minmax_index_trace_threads_last_updated_at.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…S row_id

All aggregated CTEs (dataset_items_agg_resolved, dataset_items_aggr_resolved,
dataset_item_versions_resolved) now use the same convention as
dataset_items_resolved: dataset_item_id AS id (stable), id AS row_id (physical).
Updated all join conditions, arrayJoin filters, and GROUP BY references.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
… update

Verify that delete and batch update requests with item IDs spanning
multiple datasets (without explicit datasetId) return 400.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@andrescrz andrescrz force-pushed the andrescrz/OPIK-4518-fix-dataset-versioning-dataset-items-ids branch from 97393f0 to 47b068f Compare March 23, 2026 10:38
@andrescrz andrescrz marked this pull request as ready for review March 23, 2026 10:39
@andrescrz andrescrz requested a review from a team as a code owner March 23, 2026 10:39
Copy link
Contributor

@thiagohora thiagohora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things worth addressing before merging.

@andrescrz andrescrz added the test-environment Deploy Opik adhoc environment label Mar 23, 2026
@github-actions
Copy link
Contributor

🔄 Test environment deployment process has started

Phase 1: Deploying base version 1.10.46-4599 (from main branch) if environment doesn't exist
Phase 2: Building new images from PR branch andrescrz/OPIK-4518-fix-dataset-versioning-dataset-items-ids
Phase 3: Will deploy newly built version after build completes

You can monitor the progress here.

@CometActions
Copy link
Collaborator

Test environment is now available!

To configure additional Environment variables for your environment, run [Deploy Opik AdHoc Environment workflow] (https://github.com/comet-ml/comet-deployment/actions/workflows/deploy_opik_adhoc_env.yaml)

Access Information

The deployment has completed successfully and the version has been verified.

@CometActions
Copy link
Collaborator

🌙 Nightly cleanup: The test environment for this PR (pr-5370) has been cleaned up to free cluster resources. PVCs are preserved — re-deploy to restore the environment.

@CometActions CometActions removed the test-environment Deploy Opik adhoc environment label Mar 24, 2026
@AndreiCautisanu AndreiCautisanu added the test-environment Deploy Opik adhoc environment label Mar 24, 2026
@github-actions
Copy link
Contributor

🔄 Test environment deployment process has started

Phase 1: Deploying base version 1.10.46-4599 (from main branch) if environment doesn't exist
Phase 2: Building new images from PR branch andrescrz/OPIK-4518-fix-dataset-versioning-dataset-items-ids
Phase 3: Will deploy newly built version after build completes

You can monitor the progress here.

@CometActions
Copy link
Collaborator

Test environment is now available!

To configure additional Environment variables for your environment, run [Deploy Opik AdHoc Environment workflow] (https://github.com/comet-ml/comet-deployment/actions/workflows/deploy_opik_adhoc_env.yaml)

Access Information

The deployment has completed successfully and the version has been verified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend java Pull requests that update Java code test-environment Deploy Opik adhoc environment tests Including test files, or tests related like configuration.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants