Skip to content

Commit ed5e350

Browse files
github-actions[bot]ti-chi-bot
authored andcommitted
Auto-sync: Update English docs from Chinese PR
Synced from: pingcap/docs-cn#21196 Target PR: pingcap#22269 AI Provider: gemini Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
1 parent 12b6d61 commit ed5e350

File tree

2 files changed

+43
-2
lines changed

2 files changed

+43
-2
lines changed

best-practices/pd-scheduling-best-practices.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -296,8 +296,11 @@ If a TiKV node fails, PD defaults to setting the corresponding node to the **dow
296296

297297
Practically, if a node failure is considered unrecoverable, you can immediately take it offline. This makes PD replenish replicas soon in another node and reduces the risk of data loss. In contrast, if a node is considered recoverable, but the recovery cannot be done in 30 minutes, you can temporarily adjust `max-store-down-time` to a larger value to avoid unnecessary replenishment of the replicas and resources waste after the timeout.
298298

299-
In TiDB v5.2.0, TiKV introduces the mechanism of slow TiKV node detection. By sampling the requests in TiKV, this mechanism works out a score ranging from 1 to 100. A TiKV node with a score higher than or equal to 80 is marked as slow. You can add [`evict-slow-store-scheduler`](/pd-control.md#scheduler-show--add--remove--pause--resume--config--describe) to detect and schedule slow nodes. If only one TiKV is detected as slow, and the slow score reaches the limit (80 by default), the Leader in this node will be evicted (similar to the effect of `evict-leader-scheduler`).
299+
In TiDB v5.2.0, TiKV introduces the mechanism of **disk-based** slow TiKV node detection. By sampling the requests in TiKV, this mechanism works out a score ranging from 1 to 100. A TiKV node with a score higher than or equal to 80 is marked as slow. You can add [`evict-slow-store-scheduler`](/pd-control.md#scheduler-show--add--remove--pause--resume--config--describe) to schedule slow nodes. If only one TiKV is detected as slow, and the slow score reaches the limit (80 by default), the Leader in this node will be evicted (similar to the effect of `evict-leader-scheduler`).
300+
301+
Starting from v8.5.5 and v9.0.0, TiKV introduces a network-based slow node detection mechanism. Similar to disk-based slow node detection, this mechanism detects network anomalies by probing network delays between TiKV nodes and calculating a score. You can enable this mechanism using [`enable-network-slow-store`](/pd-control.md#scheduler-config-evict-slow-store-scheduler).
300302

301303
> **Note:**
302304
>
303305
> **Leader eviction** is accomplished by PD sending scheduling requests to TiKV slow nodes and then TiKV executing the received scheduling requests sequentially. Due to factors such as **slow I/O**, slow nodes might experience request accumulation, causing some Leaders to wait until the delayed requests are processed before handling **Leader eviction** requests. This results in an overall extended time for **Leader eviction**. Therefore, when you enable `evict-slow-store-scheduler`, it is recommended to enable [`store-io-pool-size`](/tikv-configuration-file.md#store-io-pool-size-new-in-v530) as well to mitigate this situation.
306+

pd-control.md

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -940,7 +940,7 @@ Usage:
940940
>> scheduler config evict-leader-scheduler // Display the stores in which the scheduler is located since v4.0.0
941941
>> scheduler config evict-leader-scheduler add-store 2 // Add leader eviction scheduling for store 2
942942
>> scheduler config evict-leader-scheduler delete-store 2 // Remove leader eviction scheduling for store 2
943-
>> scheduler add evict-slow-store-scheduler // When there is one and only one slow store, evict all Region leaders of that store
943+
>> scheduler add evict-slow-store-scheduler // Automatically detect disk or network slow stores and evict all Region leaders on that store when conditions are met
944944
>> scheduler remove grant-leader-scheduler-1 // Remove the corresponding scheduler, and `-1` corresponds to the store ID
945945
>> scheduler pause balance-region-scheduler 10 // Pause the balance-region scheduler for 10 seconds
946946
>> scheduler pause all 10 // Pause all schedulers for 10 seconds
@@ -964,6 +964,44 @@ The state of the scheduler can be one of the following:
964964
- `pending`: the scheduler cannot generate scheduling operators. For a scheduler in the `pending` state, brief diagnostic information is returned. The brief information describes the state of stores and explains why these stores cannot be selected for scheduling.
965965
- `normal`: there is no need to generate scheduling operators.
966966
967+
### `scheduler config evict-slow-store-scheduler`
968+
969+
The `evict-slow-store-scheduler` is used to limit PD from scheduling Leaders to abnormal TiKV nodes and actively evict Leaders when necessary, thereby reducing the impact of slow nodes on the cluster when TiKV nodes experience disk I/O or network jitters.
970+
971+
#### Disk Slow Stores
972+
973+
Since v6.2.0, TiKV reports the `SlowScore` in store heartbeats to PD. This score is calculated based on disk I/O conditions. The score ranges from 1 to 100, where a higher value indicates a greater possibility of disk performance anomalies on that node.
974+
975+
For disk slow stores, TiKV-side detection and PD-side scheduling based on `evict-slow-store-scheduler` are enabled by default and require no additional configuration.
976+
977+
#### Network Slow Stores
978+
979+
Since v8.5.5 and v9.0.0, TiKV supports reporting `NetworkSlowScore` in store heartbeats. This score is calculated based on network probe results and is used to identify slow nodes caused by network jitters. The score ranges from 1 to 100, where a higher value indicates a greater possibility of network anomalies.
980+
981+
For compatibility and resource consumption considerations, network slow store detection and scheduling are disabled by default. To enable them, you need to complete the following configurations simultaneously:
982+
983+
1. Enable the scheduler to handle network slow stores on the PD side:
984+
985+
```bash
986+
scheduler config evict-slow-store-scheduler set enable-network-slow-store true
987+
```
988+
989+
2. On the TiKV side, set the [`raftstore.inspect-network-interval`](/tikv-configuration-file.md#inspect-network-interval) configuration item to a value greater than `0` to enable network probing.
990+
991+
#### Recovery Time Control
992+
993+
You can control the time a slow node needs to remain stable before being considered recovered by using the `recovery-duration` parameter.
994+
995+
Example:
996+
997+
```bash
998+
>> scheduler config evict-slow-store-scheduler
999+
{
1000+
"recovery-duration": "1800" // 30 minutes
1001+
}
1002+
>> scheduler config evict-slow-store-scheduler set recovery-duration 600
1003+
```
1004+
9671005
### `scheduler config balance-leader-scheduler`
9681006
9691007
Use this command to view and control the `balance-leader-scheduler` policy.

0 commit comments

Comments
 (0)