Skip to content

[issue-5494] [P-SDK] fix: reuse original trace/span IDs in project import for idempotency#6471

Open
octo-patch wants to merge 1 commit intocomet-ml:mainfrom
octo-patch:ximi/issue-5494-idempotent-import-trace-ids
Open

[issue-5494] [P-SDK] fix: reuse original trace/span IDs in project import for idempotency#6471
octo-patch wants to merge 1 commit intocomet-ml:mainfrom
octo-patch:ximi/issue-5494-idempotent-import-trace-ids

Conversation

@octo-patch
Copy link
Copy Markdown

Fixes #5494

Problem

When importing a project export, import_projects_from_directory generated a fresh random UUID for every trace and every span via id_helpers.generate_id(), even though the original IDs were available in the export file. This meant:

  • Running the same import twice created duplicate traces (different IDs, identical content).
  • Users had no way to make imports idempotent without the migration manifest, and even with the manifest a forced re-run (--force) still produced duplicates.

Solution

Pass the original id from the export file directly to client.trace() and client.span(), falling back to generate_id() only when the export contains no ID (for forward-compatibility with older export formats that predated ID storage).

# Before
trace = client.trace(
    id=id_helpers.generate_id(timestamp=original_start_time),
    ...
)

# After
trace = client.trace(
    id=original_trace_id or id_helpers.generate_id(timestamp=original_start_time),
    ...
)

The same pattern is applied to client.span(). The server handles repeat writes of the same ID gracefully (upsert or 409 dedup), so re-running import no longer creates duplicates.

The trace_id_map and span_id_map bookkeeping is unchanged — when original IDs are preserved the maps become identity maps, which is still correct for experiment recreation lookups.

Testing

  • Three new unit tests added to tests/unit/cli/test_import_project.py:
    • test_trace_uses_original_id — asserts client.trace() is called with the original trace ID
    • test_span_uses_original_id — asserts client.span() is called with the original span ID
    • test_import_twice_produces_same_ids — asserts two consecutive imports use identical trace IDs
  • Full CLI unit test suite (201 tests) passes without regressions.

🤖 AI-assisted contribution — code reviewed and tests verified by human author.

…port for idempotency

When importing traces from an export file, each run generated fresh random
IDs (via `id_helpers.generate_id()`), so importing the same data twice
created duplicate traces and spans instead of overwriting or deduplicating.

Reuse the `id` from the export file for both `client.trace()` and
`client.span()`, falling back to `generate_id()` only when the export
contains no ID (older export format). The server handles repeat writes of
the same ID gracefully (upsert / 409), so re-running import no longer
creates duplicates.

Also update comments on the span_id_map to clarify it is now an identity
map when original IDs are preserved.

Adds three unit tests that assert original IDs are passed through and that
consecutive imports produce identical IDs.

Fixes comet-ml#5494

Co-Authored-By: Octopus <liyuan851277048@icloud.com>
@octo-patch octo-patch requested a review from a team as a code owner April 24, 2026 02:43
@github-actions github-actions Bot added python Pull requests that update Python code tests Including test files, or tests related like configuration. Python SDK labels Apr 24, 2026

trace = client.trace(
id=id_helpers.generate_id(timestamp=original_start_time),
id=original_trace_id or id_helpers.generate_id(timestamp=original_start_time),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sdks/python/AGENTS.md enforces 88-char Python lines, but the new client.trace() id=original_trace_id or id_helpers.generate_id(timestamp=original_start_time), line breaks that limit—should we wrap the conditional id expression like the other keyword args?

Finding type: AI Coding Guidelines | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In
sdks/python/src/opik/cli/imports/project.py around lines 205-228 inside
import_projects_from_directory(), update the client.trace() call so the keyword argument
`id=original_trace_id or id_helpers.generate_id(timestamp=original_start_time)` does not
exceed the 88-character limit from AGENTS.md. Refactor that single long `id=` expression
by wrapping it across multiple lines (using parentheses) and keep the formatting
consistent with the other keyword arguments in the same call. Re-run lint/format checks
to confirm the line length violation is resolved.

@@ -246,7 +246,10 @@ def import_projects_from_directory(
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The span import loop duplicates the span reconstruction flow in sdks/python/src/opik/cli/imports/experiment.py, should we refactor both to a shared helper (e.g. opik/cli/imports/utils.py)?

Finding type: Code Dedup and Conventions | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

@dsblank
Copy link
Copy Markdown
Contributor

dsblank commented Apr 27, 2026

@octo-patch this makes a lot of sense; thank you for the PR!

Can you either address the baz-reviewer comments, or resolve them.

Also, there are linting errors. Please fix those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Python SDK python Pull requests that update Python code tests Including test files, or tests related like configuration.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Missing traces when importing into new workspace

2 participants