[issue-5494] [P-SDK] fix: reuse original trace/span IDs in project import for idempotency#6471
Conversation
…port for idempotency When importing traces from an export file, each run generated fresh random IDs (via `id_helpers.generate_id()`), so importing the same data twice created duplicate traces and spans instead of overwriting or deduplicating. Reuse the `id` from the export file for both `client.trace()` and `client.span()`, falling back to `generate_id()` only when the export contains no ID (older export format). The server handles repeat writes of the same ID gracefully (upsert / 409), so re-running import no longer creates duplicates. Also update comments on the span_id_map to clarify it is now an identity map when original IDs are preserved. Adds three unit tests that assert original IDs are passed through and that consecutive imports produce identical IDs. Fixes comet-ml#5494 Co-Authored-By: Octopus <liyuan851277048@icloud.com>
|
|
||
| trace = client.trace( | ||
| id=id_helpers.generate_id(timestamp=original_start_time), | ||
| id=original_trace_id or id_helpers.generate_id(timestamp=original_start_time), |
There was a problem hiding this comment.
sdks/python/AGENTS.md enforces 88-char Python lines, but the new client.trace() id=original_trace_id or id_helpers.generate_id(timestamp=original_start_time), line breaks that limit—should we wrap the conditional id expression like the other keyword args?
Finding type: AI Coding Guidelines | Severity: 🟢 Low
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents:
Before applying, verify this suggestion against the current code. In
sdks/python/src/opik/cli/imports/project.py around lines 205-228 inside
import_projects_from_directory(), update the client.trace() call so the keyword argument
`id=original_trace_id or id_helpers.generate_id(timestamp=original_start_time)` does not
exceed the 88-character limit from AGENTS.md. Refactor that single long `id=` expression
by wrapping it across multiple lines (using parentheses) and keep the formatting
consistent with the other keyword arguments in the same call. Re-run lint/format checks
to confirm the line length violation is resolved.
| @@ -246,7 +246,10 @@ def import_projects_from_directory( | |||
| ) | |||
There was a problem hiding this comment.
The span import loop duplicates the span reconstruction flow in sdks/python/src/opik/cli/imports/experiment.py, should we refactor both to a shared helper (e.g. opik/cli/imports/utils.py)?
Finding type: Code Dedup and Conventions | Severity: 🟢 Low
Want Baz to fix this for you? Activate Fixer
|
@octo-patch this makes a lot of sense; thank you for the PR! Can you either address the baz-reviewer comments, or resolve them. Also, there are linting errors. Please fix those. |
Fixes #5494
Problem
When importing a project export,
import_projects_from_directorygenerated a fresh random UUID for every trace and every span viaid_helpers.generate_id(), even though the original IDs were available in the export file. This meant:--force) still produced duplicates.Solution
Pass the original
idfrom the export file directly toclient.trace()andclient.span(), falling back togenerate_id()only when the export contains no ID (for forward-compatibility with older export formats that predated ID storage).The same pattern is applied to
client.span(). The server handles repeat writes of the same ID gracefully (upsert or 409 dedup), so re-running import no longer creates duplicates.The
trace_id_mapandspan_id_mapbookkeeping is unchanged — when original IDs are preserved the maps become identity maps, which is still correct for experiment recreation lookups.Testing
tests/unit/cli/test_import_project.py:test_trace_uses_original_id— assertsclient.trace()is called with the original trace IDtest_span_uses_original_id— assertsclient.span()is called with the original span IDtest_import_twice_produces_same_ids— asserts two consecutive imports use identical trace IDs