fix(api): add query-level retry with primary fallback to rls_transaction via execute_wrapper#10379
Conversation
…ion via execute_wrapper
|
✅ All necessary |
|
✅ Conflict Markers Resolved All conflict markers have been successfully resolved in this pull request. |
🔒 Container Security ScanImage: 📊 Vulnerability Summary
4 package(s) affected
|
There was a problem hiding this comment.
Pull request overview
This PR hardens rls_transaction against RDS read-replica recovery events by adding query-level retries and automatic primary fallback for mid-query OperationalErrors, without requiring call-site changes.
Changes:
- Add an
execute_wrapper-based mechanism to retry failed replicacursor.execute()calls and fall back to primary when retries are exhausted. - Fix an off-by-one in replica retry counting (now
REPLICA_MAX_ATTEMPTSreplica tries + 1 primary fallback) and close stale connections between pre-yield retries. - Add/adjust unit tests covering mid-query failover and the updated retry attempt semantics.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
api/src/backend/api/db_utils.py |
Implements mid-query retry/failover logic via a query wrapper; adjusts pre-yield retry loop and cleanup behavior. |
api/src/backend/api/tests/test_db_utils.py |
Updates existing retry-count assertions and adds new tests for mid-query failover behavior. |
api/CHANGELOG.md |
Documents the fix in the API changelog under 🐞 Fixed. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #10379 +/- ##
===========================================
+ Coverage 56.86% 93.42% +36.55%
===========================================
Files 87 218 +131
Lines 2847 30742 +27895
===========================================
+ Hits 1619 28721 +27102
- Misses 1228 2021 +793
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
…ngelog section - Fix no new line between sections
…replica-that-doesnt-use-the-write-replica-on-retries--second-attempt
Context
When the RDS read replica enters recovery mode, SQL queries inside
with rls_transaction(...)crash withOperationalError: SSL SYSCALL error: EOF detected. The existing retry loop only covers connection-setup failures (before the cursor is yielded). After yield, the yielded_cursor guard re-raises immediately with no retry and no primary fallback. Sentry CLOUD-API-3W7 shows 40 such crashes during a single replica recovery window.There was also an off-by-one:
REPLICA_MAX_ATTEMPTS=3gave 2 replica tries + 1 primary instead of 3 + 1.Description
Adds query-level retry to
rls_transactionusing Django'sexecute_wrapperAPI. When acursor.execute()call fails on the replica:REPLICA_MAX_ATTEMPTS/REPLICA_RETRY_BASE_DELAYsettings)fetchall()/fetchone()read from the new connection transparentlytransaction.atomic()cleanup error on the dead replica after a successful failoverAlso fixes the off-by-one in
max_attemptsand closes stale connections between pre-yield retries.Zero call-site changes. All 100+
with rls_transaction(...)usages work as before.Limitation:
.iterator()fetches rows viafetchmany(), which the wrapper does not intercept. ~4 of 28 replica call sites use.iterator()and need caller-side retry for mid-iteration failures. This is the same behavior as before, as the previous implementation also could not safely retry mid-iteration without risking duplicate rows.Steps to review
Deep review:
api/src/backend/api/db_utils.py: the _query_failover closure and changes to the retry loopapi/src/backend/api/tests/test_db_utils.py: 6 new tests for mid-query failover + 4 updated for off-by-oneAlso run tests and run a local instance,
API+UIto check is working.Checklist
API
License
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.