Skip to content

feat(retry): add JitterFactor to RetryPolicy#69

Open
javier-aliaga wants to merge 1 commit intodapr:mainfrom
javier-aliaga:add-jitter-option
Open

feat(retry): add JitterFactor to RetryPolicy#69
javier-aliaga wants to merge 1 commit intodapr:mainfrom
javier-aliaga:add-jitter-option

Conversation

@javier-aliaga
Copy link

@javier-aliaga javier-aliaga commented Mar 5, 2026

Context

When orchestrations retry failed activities or sub-orchestrations, the retry delay is calculated using exponential backoff (InitialRetryInterval * BackoffCoefficient^attempt). In systems running many concurrent orchestration instances, all retries sharing the same policy configuration will compute identical delays, causing them to fire at the exact same time.

Problem

Without jitter, concurrent retries create thundering herd scenarios — all failing instances retry simultaneously, overwhelming downstream services and increasing the likelihood of repeated failures. This is a well-known distributed systems anti-pattern. Additionally, the existing computeNextDelay function had a bug where the MaxRetryInterval comparison used < instead of >, meaning the delay was never actually capped before being returned.

Solution

Add a JitterFactor field to RetryPolicy (range [0.0, 1.0]) that introduces controlled randomness to retry delays:

  • Deterministic jitter: Seeded by firstAttempt + attempt so replays produce identical delays, preserving orchestrator replay safety.
  • Configurable reduction: A factor of 0.5 allows up to 50% reduction of the computed delay; 0.0 disables jitter entirely (backward compatible).
  • Validation clamping: Validate() clamps out-of-range values to [0.0, 1.0] instead of returning an error.
  • Bug fix: Corrected the MaxRetryInterval comparison (< → >) so the delay is properly capped before jitter is applied.
  • Nil-safety: WithActivityRetryPolicy and WithChildWorkflowRetryPolicy now handle nil policy gracefully.

@javier-aliaga javier-aliaga changed the title feat(retry): add JitterFactor to RetryPolicy with unit tests feat(retry): add JitterFactor to RetryPolicy Mar 9, 2026
@javier-aliaga javier-aliaga marked this pull request as ready for review March 11, 2026 14:54
@javier-aliaga javier-aliaga requested a review from a team as a code owner March 11, 2026 14:54
  Introduce JitterFactor field to desynchronize concurrent retries using
  deterministic randomness (seeded by firstAttempt + attempt) for replay safety.
  Clamp value to [0.0, 1.0] in Validate(). Fix MaxRetryInterval comparison
  (was <, now >) so delay is properly capped before jitter is applied.

  Tests cover deterministic replay safety, delay reduction, zero-jitter
  passthrough, per-attempt variation, MaxRetryInterval capping, and
  validation clamping.

Signed-off-by: Javier Aliaga <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant