fix(java): remove Java-side inflight counter, let Rust be sole authority#5656
Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
7253084 to
609813f
Compare
609813f to
c3078bb
Compare
shohamazon
left a comment
There was a problem hiding this comment.
Code Review
Clean, well-motivated bugfix. The dual-counter desync is a real and critical issue — consolidating to Rust's AtomicIsize counter (Client::reserve_inflight_request / release_inflight_request) is the right architectural call. The diff is focused and the removals are consistent across all three files.
Suggestions
1. Dead maxInflightRequests field in GlideCoreClient
The PR removes all 7 usages of this.maxInflightRequests in AsyncRegistry.register() calls, but the field itself (line 92), the getter getMaxInflightRequests() (lines 94-96), and the constructor parameter maxInflight (line 114) remain. After this PR, nothing in GlideCoreClient reads the field anymore.
Consider removing the field and simplifying the constructor, or if it's intentionally kept (e.g. for the Rust-side config passed via ConnectionManager.connectToValkey()), a brief comment noting it's unused on the Java side would help future readers.
2. Stale comments after cleanupClient removal
In GlideCoreClient.close() (line ~419) and CleanupAction.run() (line ~455), the comment // Clean up per-client inflight tracking is left behind but the AsyncRegistry.cleanupClient() call it described is removed. The remaining GlideNativeBridge.closeClient() is about closing the native handle, not inflight tracking. These comments are now misleading.
Similarly in ConnectionManager.closeConnectionSync() (line ~429), the comment // Clean up any pending async operations for this client remains with a blank line where the call used to be.
3. Verify no other callers of the removed cleanupClient method
The PR removes AsyncRegistry.cleanupClient() and its 4 call sites. Worth confirming there are no other callers in tests or other modules that would fail to compile after this change.
What looks good
- Javadoc updated correctly to document Rust as sole authority with
InflightRequestTrackerreference estimateInitialCapacity()properly removed along with the env var / system property readingclientInflightCountsfield and all its usages inregister(),whenComplete,shutdown(),reset()cleanly removedAsyncRegistryimport removed fromConnectionManager- New
register()Javadoc clearly documents the simplified contract
Overall: solid fix. The stale comments and dead field are minor nits.
Review responseThanks for the thorough review. Addressing each point: 1. 2. 3. 4. 5. 6. Javadoc — Updated: "Inflight request limits are enforced exclusively by the Rust core via 7. Import cleanup — All points addressed in the current commit. |
f7eaca1 to
6ef3a01
Compare
|
Addressed all three suggestions in the latest push:
|
shohamazon
left a comment
There was a problem hiding this comment.
All three suggestions addressed — verified against the updated diff. LGTM.
One tiny nit: ConnectionManager.closeConnectionSync() has a leftover double blank line (~line 427-429) where the removed code was. Not a blocker.
6ef3a01 to
93e0aca
Compare
93e0aca to
236e1a7
Compare
AsyncRegistry had its own per-client inflight counter that recycled immediately on timeout via whenComplete callback. This defeated the Rust InflightRequestTracker — Java thought slots were free while Rust still held them for zombie sub-commands. Remove the Java counter entirely. Rust's send_command() with InflightRequestTracker::try_new() is now the sole inflight authority. Chaos test A/B on 4-core pods with 5s netem: - Old v2.2: permanent freeze after 35min, no recovery after chaos removed - Fix: stable 15min, recovered to 23K/s in 15 seconds Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
236e1a7 to
3aa3ad6
Compare
With Java's inflight counter removed (PR #5656), rejection now happens asynchronously in Rust's send_command() rather than synchronously in Java's AsyncRegistry. The tokio task ordering is non-deterministic, so any of the N+1 requests may be the one rejected — not necessarily the last one sent. Update the test to wait for async processing and count rejected requests instead of asserting specific request ordering. Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
With Java's inflight counter removed (PR #5656), rejection now happens asynchronously in Rust's send_command() rather than synchronously in Java's AsyncRegistry. The tokio task ordering is non-deterministic, so any of the N+1 requests may be the one rejected — not necessarily the last one sent. Update the test to wait for async processing and count rejected requests instead of asserting specific request ordering. Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
With Java's inflight counter removed (PR #5656), rejection now happens asynchronously in Rust's send_command() rather than synchronously in Java's AsyncRegistry. The tokio task ordering is non-deterministic, so any of the N+1 requests may be the one rejected — not necessarily the last one sent. Update the test to wait for async processing and count rejected requests instead of asserting specific request ordering. Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
With Java's inflight counter removed (PR #5656), rejection now happens asynchronously in Rust's send_command() rather than synchronously in Java's AsyncRegistry. The tokio task ordering is non-deterministic, so any of the N+1 requests may be the one rejected — not necessarily the last one sent. Update the test to wait for async processing and count rejected requests instead of asserting specific request ordering. Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
With Java's inflight counter removed (PR #5656), rejection now happens asynchronously in Rust's send_command() rather than synchronously in Java's AsyncRegistry. The tokio task ordering is non-deterministic, so any of the N+1 requests may be the one rejected — not necessarily the last one sent. Update the test to wait for async processing and count rejected requests instead of asserting specific request ordering. Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
…on (#5658) With Java's inflight counter removed (PR #5656), rejection now happens asynchronously in Rust's send_command() rather than synchronously in Java's AsyncRegistry. The tokio task ordering is non-deterministic, so any of the N+1 requests may be the one rejected — not necessarily the last one sent. Update the test to wait for async processing and count rejected requests instead of asserting specific request ordering. Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
* Revert "fix(java): update inflight_requests_limit test for async Rust rejection (#5658)" This reverts commit 9331bf4. Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com> * Revert "fix(java): remove Java-side inflight counter, let Rust be sole authority (#5656)" This reverts commit 072ccdf. Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com> --------- Signed-off-by: Avi Fenesh <aviarchi1994@gmail.com>
Remove Java-side inflight counter, let Rust be sole authority
Follow-up to #5642 (InflightRequestTracker cherry-pick). Separated into its own PR for clean review.
The Problem
Both Rust and Java had independent inflight counters that released on timeout -- while zombie sub-commands were still alive in the event loop:
#5642 fixed Rust (RAII tracker holds until zombie finishes). But Java's
AsyncRegistrycounter still recycled on timeout, defeating the fix. Default config (maxInflightRequests=1000) meant this was active for all users.The Fix
Remove Java's counter entirely (-95 lines, 3 files). Rust is the sole inflight authority.
Chaos Test A/B (4-core pods, 60 threads, sustained 5s netem)