Skip to content
This repository was archived by the owner on Apr 2, 2026. It is now read-only.

Additional Tracing -for Perf Debugging#7

Merged
kumaramit01 merged 7 commits intofeatures/v2.6.6-patched-251123from
v2.6.6-patched-251123-trace
Dec 6, 2025
Merged

Additional Tracing -for Perf Debugging#7
kumaramit01 merged 7 commits intofeatures/v2.6.6-patched-251123from
v2.6.6-patched-251123-trace

Conversation

@kumaramit01
Copy link
Copy Markdown
Collaborator

💸 TL;DR

This PR adds a bunch more of tracing to debug performance bottleneck in Milvus. Expected trace for search
With all tracing enabled, a search request should show like this -We are seeing for non-failed queries search latencies > 1.5 sec
image

With a lot of time in schedule and segcore. The additional tracing would help us determine if the issue is with the

  • MVCC timestamp filtering
  • Pipeline setup (LocalPlanner + drivers)
  • CGo overhead
0ms     → [Go Layer] Search request starts
         ...
CGo Boundary →
         async_search_cgo_entry
         before_future_async
         async_lambda_start
         before_span_start
         after_span_start
         SegCoreSearch span begins
         before_lazy_check_schema
         after_lazy_check_schema
         segment_search_start
         obtained_segment_lock_mutex
         before_check_search
         after_check_search
         before_create_visitor
         after_create_visitor
         visitor_before_plan_fragment
         visitor_before_query_context
         visitor_before_execute_task
         execute_task_before_create
         execute_task_after_create
         task_next_start
         before_local_planner
         after_local_planner
         before_driver_factory_loop
         after_driver_factory_loop
         before_create_drivers
         after_create_drivers
         before_driver_execution_loop
         execute_task_before_loop
         mvcc_node_start
         mvcc_before_bitmap_creation
         mvcc_before_mask_timestamps
         mvcc_after_mask_timestamps    ← MVCC filtering time
         mvcc_after_mask_delete        ← Delete filtering time
         filter_bits_node_start
         filter_bits_before_eval
         filter_bits_after_eval        ← Expression evaluation time
         execute_task_after_loop
         visitor_after_execute_task
         before_segment_search
         start_knowhere_index_search   ← Actual vector search
         finish_knowhere_index_search
         finish_searching_vector_index
         after_segment_search
         segment_search_end
         async_lambda_end
         SegCoreSearch span ends (138ms total)

📜 Details

Design Doc

Jira

🧪 Testing Steps / Validation

✅ Checks

  • CI tests (if present) are passing
  • Adheres to code style for repo
  • Contributor License Agreement (CLA) completed if not a Reddit employee

@kumaramit01 kumaramit01 requested a review from Xinyi7 December 6, 2025 06:34
@kumaramit01 kumaramit01 merged commit 04e4c7a into features/v2.6.6-patched-251123 Dec 6, 2025
9 of 14 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant