[Opt](cloud) Add rate limit for BE to MS rpc#60344
[Opt](cloud) Add rate limit for BE to MS rpc#60344bobhan1 wants to merge 3 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
28843ee to
6e33c81
Compare
82ed77c to
a8239ac
Compare
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 30385 ms |
ClickBench: Total hot run time: 28.3 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
bdd55d6 to
b345ed1
Compare
|
run buildall |
93f86d5 to
bb1452b
Compare
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-DS: Total hot run time: 169708 ms |
c255612 to
7fb080a
Compare
|
run buildall |
TPC-H: Total hot run time: 35362 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run cloudut |
|
run external |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run external |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
3e89cf3 to
fb5dbff
Compare
fb5dbff to
5eedb99
Compare
|
run buildall |
fix [improvement](be) Hook dynamic MS throttle configs to update callbacks Issue Number: None Related PR: None Problem Summary: Newly added BE configs for per-RPC MS QPS limits and MS backpressure throttle upgrade/downgrade only changed config values at runtime, but did not propagate those changes into the in-memory rate limiter and backpressure handler state. This commit registers DEFINE_ON_UPDATE callbacks for those configs and refreshes the corresponding runtime objects only when the new value differs from the old value. None - Test: No need to test (code change committed without rerunning build in this step) - Behavior changed: Yes (runtime config updates now take effect on the corresponding in-memory MS throttling state) - Does this need documentation: No update fix sync rowset retry and fix MSBackpressureHandler state transition is not atomic fix wrong substitution [fix](be) Log actual throttle ticks on transition Issue Number: None Related PR: None Problem Summary: Capture the actual elapsed tick counters before resetting them so the ms-throttle upgrade and downgrade logs report real values instead of reset counters. None - Test: No need to test (log-only change; attempted targeted BE UT but sandbox blocked submodule update) - Behavior changed: Yes (INFO logs now print the actual elapsed ticks for upgrade and downgrade triggers) - Does this need documentation: No [fix](be) Disable MS backpressure handling by default Issue Number: None Related PR: None Problem Summary: Change the default value of enable_ms_backpressure_handling to false so MS backpressure response handling is opt-in instead of enabled by default. MS backpressure handling is now disabled by default. - Test: No need to test (single default-config change only) - Behavior changed: Yes (enable_ms_backpressure_handling defaults to false) - Does this need documentation: No format change enable_ms_rpc_host_level_rate_limit default to falase
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: ExecEnv forward-declared doris::cloud MS RPC limiter types, which exposed doris::cloud through common include paths and made older headers resolve global cloud protobuf types incorrectly.
### Release note
None
### Check List (For Author)
- Test: Manual test
- ./build.sh --be -j100
- Behavior changed: No
- Does this need documentation: No
2e825e8 to
a79fdb1
Compare
|
run buildall |
What problem does this PR solve?
Problem Summary:
This PR implements a two-layer MS (Meta Service) RPC rate limiting system for Doris cloud mode BE:
MS_BUSYerror code, BE dynamically identifies and throttles the top-k highest-QPS tables using a state machine, and automatically relaxes limits after the pressure subsides.Part 1: BE Host-Level Rate Limiting
Problem
In cloud mode, all BE nodes send RPCs (get_tablet, prepare_rowset, commit_rowset, etc.) to a shared Meta Service. A single BE experiencing load spikes (e.g., large batch imports, compaction storms) can send excessive RPC traffic that overwhelms MS, degrading service for all BEs.
Solution
Introduce
HostLevelMSRpcRateLimiters— a per-BE, per-RPC-type rate limiter using token bucket algorithm.Architecture:
MetaServiceRPCenum (defined via X-macro for maintainability)TokenBucketRateLimiterHolderwith its own QPS limitactual_qps = config_value × num_coresatomic_shared_ptr<RpcRateLimiter>array for lock-free concurrent access duringlimit()callsbvar::LatencyRecorderto monitor sleep durations caused by rate limitingConfiguration:
enable_ms_rpc_host_level_rate_limittruems_rpc_qps_default100ms_rpc_qps_<rpc_name>-1-1= use default,0= disabled)All QPS configs are mutable (
DEFINE_mInt32), allowing runtime adjustment without restart.reset_all()re-reads configs and recreates rate limiters.Integration:
Rate limiting is applied inside the
retry_rpc()template function incloud_meta_mgr.cpp, which wraps all MS RPC calls. TheRpcRateLimitCtxstruct carries the rate limiter reference. Rate limiting executes before each RPC attempt (including retries), with the call toapply_rate_limit()performing abthread_usleepif the token bucket requires waiting.New files:
be/src/cloud/cloud_ms_rpc_rate_limiters.h/.cppbe/test/cloud/cloud_ms_rpc_rate_limiters_test.cppPart 2: BE Table-Level Adaptive Backpressure
Problem
Host-level rate limiting applies uniformly across all tables. When MS reports overload (
MAX_QPS_LIMIT), it's often caused by a small number of high-traffic tables (e.g., tables with many concurrent stream load jobs). A uniform rate limit would unnecessarily penalize all tables, while the hot tables continue to dominate the RPC traffic.Solution
Implement table-level adaptive throttling for load-related RPCs. When MS returns
MAX_QPS_LIMIT, BE identifies the top-k highest-QPS tables and progressively reduces their QPS limits, while leaving other tables unaffected.Scope: Only 5 load-related RPC types participate in table-level throttling:
PREPARE_ROWSETCOMMIT_ROWSETUPDATE_TMP_ROWSETUPDATE_PACKED_FILE_INFOUPDATE_DELETE_BITMAPArchitecture (4 components with clear separation of concerns):
Component details:
TableRpcQpsRegistry— Tracks per-(rpc_type, table_id) QPS usingbvar::PerSecond<bvar::Adder>. Supports efficient top-k query via min-heap. Configurable time window viams_rpc_table_qps_window_sec(immutable, default 10s).RpcThrottleStateMachine— Pure state machine with no time awareness or side effects. Maintains upgrade history as a stack for clean rollback.on_upgrade(snapshot): For each top-k table in the QPS snapshot, calculatesnew_limit = current_qps × ratio(first time) orcurrent_limit × ratio(already limited), with a floor ofms_rpc_table_qps_limit_floor. ReturnsSET_LIMITactions.on_downgrade(): Pops the most recent upgrade from history. If the table had a prior limit, restores it (SET_LIMIT). If no prior limit, removes it (REMOVE_LIMIT).RpcThrottleCoordinator— Timing control layer using tick counts (1 tick = 1 ms).report_ms_busy(): Returns true if enough ticks have passed since last upgrade (cooldown).tick(n): Advances time by n ticks. Returns true if downgrade should trigger (no MS_BUSY fordowngrade_after_ticks).TableRpcThrottler— Enforces QPS limits usingStrictQpsLimiter(strict fixed-interval, no burst allowed). Each (rpc_type, table_id) pair has its own limiter. Returns the time point when the request may execute; the caller sleeps until then.MSBackpressureHandler— Orchestrator that wires all components together:on_ms_busy(): Called whenretry_rpcreceivesMAX_QPS_LIMIT. Consults coordinator for cooldown, builds QPS snapshot from registry, feeds to state machine, applies resulting actions to throttler.before_rpc()/after_rpc(): Called around each load-related RPC for throttle enforcement and QPS recording.Upgrade/Downgrade lifecycle example:
Configuration:
enable_ms_backpressure_handlingfalsems_rpc_table_qps_window_sec3ms_backpressure_upgrade_interval_ms3000ms_backpressure_upgrade_top_k2ms_backpressure_throttle_ratio0.75ms_rpc_table_qps_limit_floor1.0ms_backpressure_downgrade_interval_ms3000Observability (bvar metrics):
ms_rpc_backpressure_upgrade_count/_60s— Upgrade event countsms_rpc_backpressure_downgrade_count/_60s— Downgrade event countsms_rpc_backpressure_ms_busy_count/_60s— MS_BUSY signal countsms_rpc_backpressure_throttle_wait_<rpc_name>— Per-RPC-type throttle wait latencyms_rpc_backpressure_throttled_tables_<rpc_name>— Number of throttled tables per RPC typeNew files:
be/src/cloud/cloud_throttle_state_machine.h/.cppbe/src/cloud/cloud_ms_backpressure_handler.h/.cppbe/test/cloud/cloud_throttle_state_machine_test.cppbe/test/cloud/cloud_ms_backpressure_handler_test.cppAlso renamed (not part of the feature, cleanup):
common/cpp/s3_rate_limiter.h/.cpp→common/cpp/token_bucket_rate_limiter.h/.cpp(more general naming since it's now used beyond S3)Part 3: System Table for Table-Level Throttler Observability
Problem
The table-level backpressure system operates transparently inside BE. When issues arise, users and DBAs have no way to inspect which tables are being throttled, what their QPS limits are, or what their current QPS is — beyond checking raw bvar metrics.
Solution
Add a new system table
information_schema.backend_ms_rpc_table_throttlersthat exposes the real-time state of theTableRpcThrottleron each BE. This table is a Backend-Partitioned Schema Table, meaning each BE reports its own throttling data, and queries aredistributed to all alive BEs and aggregated.
Schema:
BE_IDTABLE_IDRPC_TYPEPREPARE_ROWSET,COMMIT_ROWSET)QPS_LIMITCURRENT_QPSUsage examples:
Release note
None
Check List (For Author)
Test
Behavior changed:
enable_ms_rpc_host_level_rate_limit)MAX_QPS_LIMITresponse (disabled by default viaenable_ms_backpressure_handling)Does this need documentation?
Check List (For Reviewer who merge this PR)