[enhance](load) exclude version-gap replicas from success counting in quorum success#60953
[enhance](load) exclude version-gap replicas from success counting in quorum success#60953sollhui wants to merge 1 commit intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
FE UT Coverage ReportIncrement line coverage |
89f57c7 to
481f7e6
Compare
|
run buildall |
TPC-H: Total hot run time: 28849 ms |
TPC-DS: Total hot run time: 184100 ms |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
FE UT Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 29183 ms |
TPC-DS: Total hot run time: 183500 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
1 similar comment
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
481f7e6 to
926d5e7
Compare
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 27992 ms |
TPC-DS: Total hot run time: 153793 ms |
|
/review |
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
FE UT Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 27760 ms |
TPC-DS: Total hot run time: 153036 ms |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Summary
When using majority write (quorum success), BE does not distinguish between replicas
with continuous versions and replicas with version gaps (
lastFailedVersion >= 0).This causes inconsistency with FE's commit check, which correctly excludes
version-gap replicas from success counting.
Bad Case
Consider 3 replicas on nodes 1, 2, 3 with
load_required_replica_num = 2:Node 3 now has a version gap (
lastFailedVersion >= 0).so
successReplicaNum = 1 < 2, commit fails.but FE rejects the transaction.
The correct behavior for the second write:
Solution
Pass per-tablet version-gap backend information from FE to BE via a new thrift field
map<tablet_id, list<backend_id>> tablet_version_gap_backendsinTOlapTablePartition.On the BE side, when counting successful replicas for majority write in both
VTabletWriter(V1) andVTabletWriterV2, exclude version-gap backends fromthe
finished_tablets_replicacounter. This makes BE's quorum check consistentwith FE's commit check.
Changes
tablet_version_gap_backendsfield toTOlapTablePartitiongetPartitionVersionGapBackends()to compute gap backends per tablet_quorum_success_quorum_successand_create_commit_info