Skip to content

Improve scheduling VDiff streams to speed up VDiff time.Β #18227

@bluecrabs007

Description

@bluecrabs007

Issue Description
In a large MoveTables workflow (128 β†’ 128 shards, over 140 billion rows), we encountered persistent "Deadline exceeded" errors when running a VDiff operation. The VDiff job failed to complete even after running for over 24 hours.

Error: (shard 2e-30) deadline exceeded: /vitess/vitess-loadtest-db/global/keyspaces/customer_v8/locks
Error: (shard 38-3a) deadline exceeded: /vitess/vitess-loadtest-db/global/keyspaces/customer_v8/locks

Workaround
We were able to successfully complete the VDiff by manually:

  • Stopping all VDiff streams
  • Restarting each stream one at a time
  • Waiting for each individual stream to complete before starting the next

Using this approach, the full VDiff completed within approximately 5 hours.

VDiff Summary for customer_v8.MigrateData (40e02a01-1fdb-11f0-b0e1-36db5969c9c9)
State:        completed
RowsCompared: 141427827192
HasMismatch:  false
StartedAt:    2025-04-23 18:39:56
CompletedAt:  2025-04-23 22:56:20

Suggestion
It would be beneficial for Vitess to support batched or sequential scheduling of VDiff streams automatically, to avoid lock contention, which seem to be a bottleneck in large-scale workflows.

Environment
Vitess version: v16

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions