Fix RDMA re-entrancy assertion and lost wakeup deadlocks with I/O threads by Ada-Church-Closure · Pull Request #3335 · valkey-io/valkey

Ada-Church-Closure · 2026-03-09T16:13:28Z

Fixes #3112
When using RDMA with I/O threads and pipelining (e.g., -P 32), the server crashes due to a c->cmd_queue.len == 0 assertion or hangs indefinitely (lost wakeup).

1. The Problem

When using RDMA (--rdma) with I/O threads and high pipelining (e.g., -P 32), the server suffers from two critical issues:

Crash: Assertion c->cmd_queue.len == 0 failed in networking.c.
Hang (Lost Wakeup): The server completely stalls under heavy pipeline loads because leftover data in querybuf is not parsed, and RDMA's edge-triggered CQ does not fire again for already buffered data.

2. Root Cause & The Fix

In processIOThreadsReadDone, the execution sequence was conflicting with RDMA's synchronous and edge-triggered nature:

Fixing the Crash: Previously, connUpdateState(c->conn) was called before the parsed commands were executed by the batch. For RDMA, this synchronously triggers the CQ, re-entering the parse logic while cmd_queue is heavily populated, breaking the assert. Fix: We now defer connUpdateState until after processClientsCommandsBatch().
Fixing the Hang: IO threads may leave unparsed residual data in querybuf due to pipeline limits. If we just go back to the event loop, RDMA will not wake us up again. Fix: We track handled clients in a temporary list. After batch execution, we synchronously drain any residual querybuf using processPendingCommandAndInputBuffer(c) before calling connUpdateState.

3. Impact on TCP / TLS

This change is 100% safe for standard TCP/TLS connections.
For TCP, connUpdateState simply modifies the epoll registration asynchronously, so deferring it past the batch execution is a functional no-op. Eagerly draining the querybuf synchronously also slightly reduces epoll round-trips for TCP pipelines.

4. How to Reproduce & Verify (Linux Soft-RoCE)

You can easily verify this using a dummy interface and RXE, without needing physical RDMA hardware:

server:

sudo prlimit --memlock=unlimited --pid $$
sudo modprobe dummy
sudo ip link add dummy0 type dummy
sudo ip link set dummy0 up
sudo ip addr add 10.0.0.1/24 dev dummy0
sudo rdma link add rxe_dummy type rxe netdev dummy0
# compile and run
make BUILD_RDMA=yes USE_FAST_FLOAT=yes
sudo ./src/valkey-server valkey.conf --rdma-bind 10.0.0.1 --rdma-port 6379

client

sudo prlimit --memlock=unlimited --pid $$
./src/valkey-benchmark -h 10.0.0.1 -p 6379 -d 256 --threads 16 -c 100 -P 32 -n 20000000 -t get --rdma

Before this patch,Benchmark crashes or hangs immediately.
After this patch,Completes 20,000,000 requests flawlessly at peak QPS like:

~/Project/valkey fix-rdma-io-threads* 15s ❯ ./src/valkey-benchmark -h 10.0.0.1 -p 6379 -d 256 --threads 5 -r 100000 -n 20000000 -c 30 -P 32 -t get --rdma
====== GET ======                                                                         
  20000000 requests completed in 10.28 seconds
  30 parallel clients
  256 bytes payload
  keep alive: 1
  host configuration "save": 
  host configuration "appendonly": no
  multi-thread: yes
  threads: 5

Latency by percentile distribution:
0.000% <= 0.015 milliseconds (cumulative count 160)
50.000% <= 0.463 milliseconds (cumulative count 10938848)
75.000% <= 0.487 milliseconds (cumulative count 15378240)
87.500% <= 0.519 milliseconds (cumulative count 17661824)
93.750% <= 0.583 milliseconds (cumulative count 18812384)
96.875% <= 0.695 milliseconds (cumulative count 19375456)
98.438% <= 0.863 milliseconds (cumulative count 19698176)
99.219% <= 1.015 milliseconds (cumulative count 19846016)
99.609% <= 1.167 milliseconds (cumulative count 19924960)
99.805% <= 1.255 milliseconds (cumulative count 19962496)
99.902% <= 1.359 milliseconds (cumulative count 19981312)
99.951% <= 1.479 milliseconds (cumulative count 19990336)
99.976% <= 1.687 milliseconds (cumulative count 19995136)
99.988% <= 1.959 milliseconds (cumulative count 19997568)
99.994% <= 2.215 milliseconds (cumulative count 19998816)
99.997% <= 3.007 milliseconds (cumulative count 19999488)
99.998% <= 3.167 milliseconds (cumulative count 19999712)
99.999% <= 3.711 milliseconds (cumulative count 19999872)
100.000% <= 7.807 milliseconds (cumulative count 19999936)
100.000% <= 8.879 milliseconds (cumulative count 19999968)
100.000% <= 9.911 milliseconds (cumulative count 20000000)
100.000% <= 9.911 milliseconds (cumulative count 20000000)

Cumulative distribution of latencies:
0.119% <= 0.103 milliseconds (cumulative count 23712)
0.475% <= 0.207 milliseconds (cumulative count 94976)
2.051% <= 0.303 milliseconds (cumulative count 410112)
12.146% <= 0.407 milliseconds (cumulative count 2429280)
84.272% <= 0.503 milliseconds (cumulative count 16854368)
94.958% <= 0.607 milliseconds (cumulative count 18991552)
96.972% <= 0.703 milliseconds (cumulative count 19394304)
98.019% <= 0.807 milliseconds (cumulative count 19603808)
98.768% <= 0.903 milliseconds (cumulative count 19753600)
99.201% <= 1.007 milliseconds (cumulative count 19840160)
99.464% <= 1.103 milliseconds (cumulative count 19892832)
99.718% <= 1.207 milliseconds (cumulative count 19943552)
99.868% <= 1.303 milliseconds (cumulative count 19973536)
99.929% <= 1.407 milliseconds (cumulative count 19985888)
99.956% <= 1.503 milliseconds (cumulative count 19991264)
99.970% <= 1.607 milliseconds (cumulative count 19994080)
99.977% <= 1.703 milliseconds (cumulative count 19995392)
99.983% <= 1.807 milliseconds (cumulative count 19996544)
99.986% <= 1.903 milliseconds (cumulative count 19997248)
99.990% <= 2.007 milliseconds (cumulative count 19997952)
99.991% <= 2.103 milliseconds (cumulative count 19998176)
99.998% <= 3.103 milliseconds (cumulative count 19999584)
99.999% <= 4.103 milliseconds (cumulative count 19999872)
100.000% <= 5.103 milliseconds (cumulative count 19999904)
100.000% <= 8.103 milliseconds (cumulative count 19999936)
100.000% <= 9.103 milliseconds (cumulative count 19999968)
100.000% <= 10.103 milliseconds (cumulative count 20000000)

Summary:
  throughput summary: 1945903.88 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.472     0.008     0.463     0.615     0.959     9.911

~/Project/valkey fix-rdma-io-threads* 10s ❯ ./src/valkey-benchmark -h 10.0.0.1 -p 6379 -d 256 --threads 5 -r 100000 -n 20000000 -c 30 -P 256 -t get --rdma
====== GET ======                                                                         
  20000000 requests completed in 8.03 seconds
  30 parallel clients
  256 bytes payload
  keep alive: 1
  host configuration "save": 
  host configuration "appendonly": no
  multi-thread: yes
  threads: 5

Latency by percentile distribution:
0.000% <= 0.031 milliseconds (cumulative count 768)
50.000% <= 2.687 milliseconds (cumulative count 10050816)
75.000% <= 2.991 milliseconds (cumulative count 15007232)
87.500% <= 4.287 milliseconds (cumulative count 17501440)
93.750% <= 5.231 milliseconds (cumulative count 18751744)
96.875% <= 5.759 milliseconds (cumulative count 19377664)
98.438% <= 6.367 milliseconds (cumulative count 19688448)
99.219% <= 7.159 milliseconds (cumulative count 19844864)
99.609% <= 7.951 milliseconds (cumulative count 19922432)
99.805% <= 8.775 milliseconds (cumulative count 19961088)
99.902% <= 9.431 milliseconds (cumulative count 19980800)
99.951% <= 10.303 milliseconds (cumulative count 19990528)
99.976% <= 10.663 milliseconds (cumulative count 19995136)
99.988% <= 11.591 milliseconds (cumulative count 19997696)
99.994% <= 12.647 milliseconds (cumulative count 19998976)
99.997% <= 12.751 milliseconds (cumulative count 19999488)
99.998% <= 12.839 milliseconds (cumulative count 20000000)
100.000% <= 12.839 milliseconds (cumulative count 20000000)

Cumulative distribution of latencies:
0.032% <= 0.103 milliseconds (cumulative count 6400)
0.050% <= 0.207 milliseconds (cumulative count 9984)
0.073% <= 0.303 milliseconds (cumulative count 14592)
0.087% <= 0.407 milliseconds (cumulative count 17408)
0.113% <= 0.503 milliseconds (cumulative count 22528)
0.134% <= 0.607 milliseconds (cumulative count 26880)
0.160% <= 0.703 milliseconds (cumulative count 32000)
0.188% <= 0.807 milliseconds (cumulative count 37632)
0.219% <= 0.903 milliseconds (cumulative count 43776)
0.302% <= 1.007 milliseconds (cumulative count 60416)
0.411% <= 1.103 milliseconds (cumulative count 82176)
0.580% <= 1.207 milliseconds (cumulative count 115968)
0.774% <= 1.303 milliseconds (cumulative count 154880)
1.076% <= 1.407 milliseconds (cumulative count 215296)
1.509% <= 1.503 milliseconds (cumulative count 301824)
2.172% <= 1.607 milliseconds (cumulative count 434432)
2.935% <= 1.703 milliseconds (cumulative count 587008)
4.189% <= 1.807 milliseconds (cumulative count 837888)
5.833% <= 1.903 milliseconds (cumulative count 1166592)
8.218% <= 2.007 milliseconds (cumulative count 1643520)
11.154% <= 2.103 milliseconds (cumulative count 2230784)
78.685% <= 3.103 milliseconds (cumulative count 15737088)
86.468% <= 4.103 milliseconds (cumulative count 17293568)
92.955% <= 5.103 milliseconds (cumulative count 18590976)
98.003% <= 6.103 milliseconds (cumulative count 19600640)
99.178% <= 7.103 milliseconds (cumulative count 19835648)
99.656% <= 8.103 milliseconds (cumulative count 19931136)
99.859% <= 9.103 milliseconds (cumulative count 19971840)
99.946% <= 10.103 milliseconds (cumulative count 19989248)
99.985% <= 11.103 milliseconds (cumulative count 19996928)
99.990% <= 12.103 milliseconds (cumulative count 19997952)
100.000% <= 13.103 milliseconds (cumulative count 20000000)

Summary:
  throughput summary: 2490349.75 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        2.957     0.024     2.687     5.439     6.895    12.839

When using RDMA with I/O threads and pipelining (e.g., -P 32), the server crashes due to a `c->cmd_queue.len == 0` assertion or hangs indefinitely (lost wakeup). Root causes: 1. Crash: `connUpdateState(c->conn)` was called before the parsed commands were executed. For RDMA, this synchronously triggers the CQ, re-entering parsing while `cmd_queue` is heavily populated. 2. Hang: I/O threads leave unparsed residual data in `querybuf` due to batch limits. Because RDMA CQ is edge-triggered, returning to the event loop without draining the buffer causes a lost wakeup. Fix: - Track handled clients in `processIOThreadsReadDone` using a list. - After draining the command queue via `processClientsCommandsBatch`, synchronously drain any residual `querybuf` using `processPendingCommandAndInputBuffer`. - Safely call `connUpdateState` only after the command queue is fully empty to prevent re-entrancy. This change strictly respects the edge-triggered nature of RDMA while remaining 100% safe for standard TCP/TLS connections. Fixes valkey-io#3112 Signed-off-by: Ada-Church-Closure <[email protected]>

codecov · 2026-03-09T16:46:11Z

Codecov Report

❌ Patch coverage is 85.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.26%. Comparing base (fc217fc) to head (df67b5d).
⚠️ Report is 32 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/networking.c	78.57%	3 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #3335      +/-   ##
============================================
- Coverage     75.05%   74.26%   -0.79%     
============================================
  Files           129      130       +1     
  Lines         71553    72730    +1177     
============================================
+ Hits          53705    54016     +311     
- Misses        17848    18714     +866

Files with missing lines	Coverage Δ
src/memory_prefetch.c	`80.88% <100.00%> (+4.01%)`	⬆️
src/networking.c	`91.23% <78.57%> (-0.55%)`	⬇️

... and 39 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

pizhenwei

Thank you for your contribution and for discovering the root cause of this issue. Overall, the solution LGTM. I have only a few minor suggestions, and welcome further discussion.

Hi @zuiderkwast please correct me if I misunderstand anything.

src/networking.c

…cture Signed-off-by: Ada-Church-Closure <[email protected]>

pizhenwei

LGTM, thanks!

Co-authored-by: zhenwei pi [email protected]
Signed-off-by: zhenwei pi [email protected]

zuiderkwast

Thanks for finding the root cause of the issue and solving it!

I don't know if the extra linked list affects the performance in any way, with or without pipelining. I kicked off benchmark tests. Let's wait and see. We could perhaps try to use a fifo.{c,h} queue instead of a linked list if it makes it faster. Let's wait for the benchmark and see.

zuiderkwast · 2026-03-10T15:22:52Z

src/networking.c

+     * for edge-triggered transports like RDMA). */
+    list *handled_clients = listCreate();

    listNode *next = listFirst(server.clients_pending_io_read);


This communication between IO threads and main thread is being changed completely in #3324. Lock-free queues are used instead of looping over the clients and checking each one if their IO is finished.

These lines are being deleted: https://github.com/valkey-io/valkey/pull/3324/changes#diff-252bce0cc340542712f0c1adf62e9035ea47a4a064321fbf40ec3dd4b814aaf2L6378

I think we can merge this fix first, so we can backport it to 9.0 and 8.x. (When was RDMA introduced?) But it might need to be rewritten later when #3324 gets merged.

sarthakaggarwal97 · 2026-03-10T17:17:37Z

src/networking.c

+
+        /* Track the client so we can drain any leftover buffered data even if no
+         * further network events arrive. */
+        listAddNodeTail(handled_clients, c);


I have some UAF concerns with storing direct client pointers. Is it possible that the client can already be dead / free but still be added to the list.

on 6448 - we call processPendingCommandAndInputBuffer, which calls processCommandAndResetClient is the client is ready for execution.

The description of this function says -

/* This function calls processCommand(), but also performs a few sub tasks * for the client that are useful in that context: * * 1. It sets the current client to the client 'c'. * 2. calls commandProcessed() if the command was handled. * * The function returns C_ERR in case the client was freed as a side effect * of processing the command, otherwise C_OK is returned. */

So is it possible that in case of an error, the client is already free, and we still can add the dead client to the list?

Please correct me if I am missing something.

Thanks! It's a problem.

github-actions · 2026-03-10T22:48:44Z

Benchmark ran on this commit: d8f8b39

Benchmark Comparison: HEAD vs HEAD (averaged) - rps metrics

Run Summary:

HEAD: 80 total runs, 16 configurations (avg 5.00 runs per config)
HEAD: 80 total runs, 16 configurations (avg 5.00 runs per config)

Statistical Notes:

CI99%: 99% Confidence Interval - range where the true population mean is likely to fall
PI99%: 99% Prediction Interval - range where a single future observation is likely to fall
CV: Coefficient of Variation - relative variability (σ/μ × 100%)

Note: Values with (n=X, σ=Y, CV=Z%, CI99%=±W%, PI99%=±V%) indicate averages from X runs with standard deviation Y, coefficient of variation Z%, 99% confidence interval margin of error ±W% of the mean, and 99% prediction interval margin of error ±V% of the mean. CI bounds [A, B] and PI bounds [C, D] show the actual interval ranges.

Configuration:

architecture: aarch64
benchmark_mode: duration
clients: 1600
cluster_mode: False
data_size: 16
duration: 180
tls: False
valkey_benchmark_threads: 90
warmup: 30

Command	Metric	Pipeline	io_threads	HEAD	HEAD	Diff	% Change
GET	rps	1	1	229462.798 (n=5, σ=918.117, CV=0.40%, CI99%=±0.824%, PI99%=±2.018%, CI[227572.383, 231353.213], PI[224832.246, 234093.350])	227546.482 (n=5, σ=2442.552, CV=1.07%, CI99%=±2.210%, PI99%=±5.414%, CI[222517.235, 232575.729], PI[215227.392, 239865.572])	-1916.316	-0.835%
GET	rps	1	9	1509044.650 (n=5, σ=12333.961, CV=0.82%, CI99%=±1.683%, PI99%=±4.122%, CI[1483648.854, 1534440.446], PI[1446837.908, 1571251.392])	1508721.674 (n=5, σ=16980.628, CV=1.13%, CI99%=±2.317%, PI99%=±5.676%, CI[1473758.326, 1543685.022], PI[1423079.311, 1594364.037])	-322.976	-0.021%
GET	rps	10	1	1223575.124 (n=5, σ=2088.586, CV=0.17%, CI99%=±0.351%, PI99%=±0.861%, CI[1219274.697, 1227875.551], PI[1213041.273, 1234108.975])	1216917.498 (n=5, σ=7694.875, CV=0.63%, CI99%=±1.302%, PI99%=±3.189%, CI[1201073.645, 1232761.351], PI[1178108.142, 1255726.854])	-6657.626	-0.544%
GET	rps	10	9	2777632.950 (n=5, σ=26161.368, CV=0.94%, CI99%=±1.939%, PI99%=±4.750%, CI[2723766.332, 2831499.568], PI[2645687.222, 2909578.678])	2789579.700 (n=5, σ=26379.201, CV=0.95%, CI99%=±1.947%, PI99%=±4.769%, CI[2735264.560, 2843894.840], PI[2656535.323, 2922624.077])	11946.750	+0.430%
SET	rps	1	1	221214.284 (n=5, σ=559.814, CV=0.25%, CI99%=±0.521%, PI99%=±1.276%, CI[220061.619, 222366.949], PI[218390.842, 224037.726])	221052.092 (n=5, σ=1332.809, CV=0.60%, CI99%=±1.241%, PI99%=±3.041%, CI[218307.819, 223796.365], PI[214330.023, 227774.161])	-162.192	-0.073%
SET	rps	1	9	1440573.700 (n=5, σ=12614.608, CV=0.88%, CI99%=±1.803%, PI99%=±4.416%, CI[1414600.047, 1466547.353], PI[1376951.503, 1504195.897])	1449371.024 (n=5, σ=8749.826, CV=0.60%, CI99%=±1.243%, PI99%=±3.045%, CI[1431355.011, 1467387.037], PI[1405240.985, 1493501.063])	8797.324	+0.611%
SET	rps	10	1	1040125.364 (n=5, σ=6583.199, CV=0.63%, CI99%=±1.303%, PI99%=±3.192%, CI[1026570.467, 1053680.261], PI[1006922.783, 1073327.945])	1044384.926 (n=5, σ=5529.125, CV=0.53%, CI99%=±1.090%, PI99%=±2.670%, CI[1033000.382, 1055769.470], PI[1016498.602, 1072271.250])	4259.562	+0.410%
SET	rps	10	9	1950205.100 (n=5, σ=23290.046, CV=1.19%, CI99%=±2.459%, PI99%=±6.023%, CI[1902250.573, 1998159.627], PI[1832740.978, 2067669.222])	1946232.524 (n=5, σ=14534.463, CV=0.75%, CI99%=±1.538%, PI99%=±3.767%, CI[1916305.865, 1976159.183], PI[1872927.479, 2019537.569])	-3972.576	-0.204%

Configuration:

architecture: aarch64
benchmark_mode: duration
clients: 1600
cluster_mode: False
data_size: 96
duration: 180
tls: False
valkey_benchmark_threads: 90
warmup: 30

Command	Metric	Pipeline	io_threads	HEAD	HEAD	Diff	% Change
GET	rps	1	1	219057.912 (n=5, σ=2581.004, CV=1.18%, CI99%=±2.426%, PI99%=±5.942%, CI[213743.589, 224372.235], PI[206040.532, 232075.292])	220683.866 (n=5, σ=494.918, CV=0.22%, CI99%=±0.462%, PI99%=±1.131%, CI[219664.822, 221702.910], PI[218187.728, 223180.004])	1625.954	+0.742%
GET	rps	1	9	1469495.452 (n=5, σ=9457.135, CV=0.64%, CI99%=±1.325%, PI99%=±3.246%, CI[1450023.081, 1488967.823], PI[1421798.078, 1517192.826])	1456182.826 (n=5, σ=8965.871, CV=0.62%, CI99%=±1.268%, PI99%=±3.105%, CI[1437721.974, 1474643.678], PI[1410963.158, 1501402.494])	-13312.626	-0.906%
GET	rps	10	1	1159126.172 (n=5, σ=5735.214, CV=0.49%, CI99%=±1.019%, PI99%=±2.495%, CI[1147317.287, 1170935.057], PI[1130200.429, 1188051.915])	1154969.226 (n=5, σ=9868.334, CV=0.85%, CI99%=±1.759%, PI99%=±4.309%, CI[1134650.191, 1175288.261], PI[1105197.957, 1204740.495])	-4156.946	-0.359%
GET	rps	10	9	2232633.750 (n=5, σ=20302.271, CV=0.91%, CI99%=±1.872%, PI99%=±4.586%, CI[2190831.094, 2274436.406], PI[2130238.573, 2335028.927])	2237195.650 (n=5, σ=17012.725, CV=0.76%, CI99%=±1.566%, PI99%=±3.835%, CI[2202166.215, 2272225.085], PI[2151391.407, 2322999.893])	4561.900	+0.204%
SET	rps	1	1	211289.888 (n=5, σ=1706.436, CV=0.81%, CI99%=±1.663%, PI99%=±4.073%, CI[207776.313, 214803.463], PI[202683.422, 219896.354])	212113.028 (n=5, σ=1026.276, CV=0.48%, CI99%=±0.996%, PI99%=±2.440%, CI[209999.912, 214226.144], PI[206936.971, 217289.085])	823.140	+0.390%
SET	rps	1	9	1435656.950 (n=5, σ=18259.852, CV=1.27%, CI99%=±2.619%, PI99%=±6.415%, CI[1398059.662, 1473254.238], PI[1343562.779, 1527751.121])	1431164.102 (n=5, σ=5554.419, CV=0.39%, CI99%=±0.799%, PI99%=±1.957%, CI[1419727.477, 1442600.727], PI[1403150.206, 1459177.998])	-4492.848	-0.313%
SET	rps	10	1	1032731.478 (n=5, σ=3556.056, CV=0.34%, CI99%=±0.709%, PI99%=±1.737%, CI[1025409.510, 1040053.446], PI[1014796.391, 1050666.565])	1033732.300 (n=5, σ=7318.891, CV=0.71%, CI99%=±1.458%, PI99%=±3.571%, CI[1018662.603, 1048801.997], PI[996819.232, 1070645.368])	1000.822	+0.097%
SET	rps	10	9	1813571.476 (n=5, σ=19437.983, CV=1.07%, CI99%=±2.207%, PI99%=±5.406%, CI[1773548.400, 1853594.552], PI[1715535.361, 1911607.591])	1811598.374 (n=5, σ=36285.281, CV=2.00%, CI99%=±4.124%, PI99%=±10.102%, CI[1736886.479, 1886310.269], PI[1628592.353, 1994604.395])	-1973.102	-0.109%

zuiderkwast · 2026-03-11T09:23:13Z

There is a crash in this job: https://github.com/valkey-io/valkey/actions/runs/22915719396/job/66500408197?pr=3335#step:9:25952

13008:S 10 Mar 2026 18:34:27.909 # valkey 255.255.255 crashed by signal: 11, si_code: 1
13008:S 10 Mar 2026 18:34:27.909 # Accessing address: (nil)
13008:S 10 Mar 2026 18:34:27.909 # Crashed running the instruction at: 0x55e732c92065

Stack trace main thread:

13008 valkey-server *
#0 0x7f2907e45330 <unknown>
#1 0x55e732c92065 connUpdateState at /home/runner/work/valkey/valkey/src/connection.h:516
#2 0x55e732c92065 processIOThreadsReadDone at /home/runner/work/valkey/valkey/src/networking.c:6471
#3 0x55e732ced56e beforeSleep at /home/runner/work/valkey/valkey/src/server.c:1946
#4 0x55e732bafe6f aeProcessEvents at /home/runner/work/valkey/valkey/src/ae.c:428
#5 0x55e732bafe6f aeProcessEvents at /home/runner/work/valkey/valkey/src/ae.c:411
#6 0x55e732bafe6f aeMain at /home/runner/work/valkey/valkey/src/ae.c:543
#7 0x55e732b9e807 main at /home/runner/work/valkey/valkey/src/server.c:7689
#8 0x7f2907e2a1ca <unknown>
#9 0x7f2907e2a28b <unknown>
#10 0x55e732ba0095 <unknown>

Signed-off-by: Ada-Church-Closure <[email protected]>

pizhenwei · 2026-03-12T02:12:27Z

src/networking.c

+    listRewind(handled_clients_ids, &handled_li);
+    while ((handled_ln = listNext(&handled_li))) {
+        uint64_t id = (uint64_t)(uintptr_t)listNodeValue(handled_ln);
+        client *c = lookupClientByID(id);


client *c = lookupClientByID(id); brings more overhead, not sure the performance impact.

I have another choice:

int processIOThreadsReadDone(void) { ... int ret = addCommandToBatchAndProcessIfFull(c); /* If the command was not added to the commands batch, process it immediately */ if (ret == C_ERR) { if (processPendingCommandAndInputBuffer(c) == C_OK) { beforeNextClient(c); listAddNodeTail(handled_clients, c); /* NEW CODES */ } } }

And modify processClientsCommandsBatch:

void processClientsCommandsBatch(list *handled_clients) { ... for (size_t i = 0; i < batch->client_count; i++) { client *c = batch->clients[i]; if (c == NULL) continue; /* Set the client to null immediately to avoid accessing it again recursively when ProcessingEventsWhileBlocked */ batch->clients[i] = NULL; batch->executed_commands++; if (processPendingCommandAndInputBuffer(c) != C_ERR) { beforeNextClient(c); if (handled_clients) listAddNodeTail(handled_clients, c); /* NEW CODES */ } }

Then we can handle the healthy clients as the previous version, this will avoid lookupClientByID

Signed-off-by: Ada-Church-Closure <[email protected]>

pizhenwei · 2026-03-15T01:54:09Z

Hi @sarthakaggarwal97 would you please take a loot at the latest version?

Hi @zuiderkwast, would you please retry the test job to reproduce crash?

Ada-Church-Closure · 2026-03-15T06:46:48Z

I used a shell to test this:

#!/bin/bash

# ==============================================================================
# Valkey Chaos Testing Script: Mixed TCP & RDMA workload with random disconnections
# Purpose: Verify the robustness of the I/O threads state machine and ensure 
#          no Use-After-Free (UAF) or Lost Wakeups occur during heavy batching.
# ==============================================================================

# 1: Unlock memory limits for RDMA Memory Registration (MR)
echo "[INFO] Unlocking memory limits for RDMA..."
sudo prlimit --memlock=unlimited --pid $$

HOST="10.0.0.1"
PORT="6379"

# Aggressive parameters to maximize pipeline depth and force batching logic
# Adjust -P (pipeline) and -c (connections) to push your specific hardware to the limit.
COMMON_ARGS="-h $HOST -p $PORT -d 256 --threads 16 -c 200 -P 64 -n 50000000 -t get"

echo "[INFO] Starting 4 TCP benchmark processes in the background..."
TCP_PIDS=()
for i in {1..4}; do
    ./src/valkey-benchmark $COMMON_ARGS > /dev/null 2>&1 &
    TCP_PIDS+=($!)
done
echo "[INFO] TCP benchmarks started. PIDs: ${TCP_PIDS[*]}"

echo "[INFO] Starting 4 RDMA benchmark processes in the background..."
RDMA_PIDS=()
for i in {1..4}; do
    ./src/valkey-benchmark $COMMON_ARGS --rdma > /dev/null 2>&1 &
    RDMA_PIDS+=($!)
done
echo "[INFO] RDMA benchmarks started. PIDs: ${RDMA_PIDS[*]}"

echo "[INFO] Letting the workloads saturate CPU and I/O threads (waiting 4 seconds)..."
sleep 4

echo "[INFO]  Chaos initiated Sending SIGINT (Ctrl+C) to random clients..."
kill -INT "${RDMA_PIDS[0]}" "${RDMA_PIDS[1]}"
kill -INT "${TCP_PIDS[0]}"

echo "[INFO] Killed RDMA PIDs: ${RDMA_PIDS[0]}, ${RDMA_PIDS[1]}"
echo "[INFO] Killed TCP PID: ${TCP_PIDS[0]}"
echo "[INFO] OS is now sending RST packets. Valkey server should survive without Segfault/UAF."

echo "[INFO] Waiting for the surviving benchmark processes to complete naturally..."
wait

echo "[SUCCESS] Chaos test completed! If the Valkey server is still running and responsive, the fix is rock solid."

and the server works fine:

~/Project/valkey fix-rdma-io-threads* 1m 23s ❯ sudo ./chaos_test.sh
[INFO] Unlocking memory limits for RDMA...
[INFO] Starting 4 TCP benchmark processes in the background...
[INFO] TCP benchmarks started. PIDs: 19370 19371 19372 19373
[INFO] Starting 4 RDMA benchmark processes in the background...
[INFO] RDMA benchmarks started. PIDs: 19374 19375 19376 19377
[INFO] Letting the workloads saturate CPU and I/O threads (waiting 4 seconds)...
[INFO]  Chaos initiated Sending SIGINT (Ctrl+C) to random clients...
[INFO] Killed RDMA PIDs: 19374, 19375
[INFO] Killed TCP PID: 19370
[INFO] OS is now sending RST packets. Valkey server should survive without Segfault/UAF.
[INFO] Waiting for the surviving benchmark processes to complete naturally...
[SUCCESS] Chaos test completed! If the Valkey server is still running and responsive, the fix is rock solid.

pizhenwei

LGTM, thanks!

github-actions bot assigned Ada-Church-Closure Mar 9, 2026

Ada-Church-Closure force-pushed the fix-rdma-io-threads branch from 1cab189 to 3fc6395 Compare March 9, 2026 16:16

zuiderkwast requested a review from pizhenwei March 9, 2026 23:12

pizhenwei reviewed Mar 10, 2026

View reviewed changes

src/networking.c Show resolved Hide resolved

make function connUpdateState has return value and simplify code stru…

d8f8b39

…cture Signed-off-by: Ada-Church-Closure <[email protected]>

pizhenwei approved these changes Mar 10, 2026

View reviewed changes

zuiderkwast added the run-benchmark label Mar 10, 2026

zuiderkwast reviewed Mar 10, 2026

View reviewed changes

zuiderkwast mentioned this pull request Mar 10, 2026

Redesign IO threading communication model #3324

Open

sarthakaggarwal97 reviewed Mar 10, 2026

View reviewed changes

zuiderkwast added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Mar 10, 2026

github-actions bot removed the run-benchmark label Mar 10, 2026

change into id lists to avoid UAF

0dabb3d

Signed-off-by: Ada-Church-Closure <[email protected]>

Ada-Church-Closure force-pushed the fix-rdma-io-threads branch from 5642a80 to 0dabb3d Compare March 11, 2026 15:04

pizhenwei reviewed Mar 12, 2026

View reviewed changes

Ada-Church-Closure force-pushed the fix-rdma-io-threads branch from d38811e to 3e6aa49 Compare March 13, 2026 12:57

remove client id method

df67b5d

Signed-off-by: Ada-Church-Closure <[email protected]>

Ada-Church-Closure force-pushed the fix-rdma-io-threads branch from 3e6aa49 to df67b5d Compare March 13, 2026 13:08

pizhenwei approved these changes Mar 15, 2026

View reviewed changes

pizhenwei requested review from sarthakaggarwal97 and zuiderkwast March 15, 2026 07:35

Conversation

Ada-Church-Closure commented Mar 9, 2026

1. The Problem

2. Root Cause & The Fix

3. Impact on TCP / TLS

4. How to Reproduce & Verify (Linux Soft-RoCE)

Uh oh!

codecov bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pizhenwei left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pizhenwei left a comment

Choose a reason for hiding this comment

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

zuiderkwast Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

sarthakaggarwal97 Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ada-Church-Closure Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 10, 2026

Benchmark Comparison: HEAD vs HEAD (averaged) - rps metrics

Uh oh!

zuiderkwast commented Mar 11, 2026

Uh oh!

pizhenwei Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

pizhenwei commented Mar 15, 2026

Uh oh!

Ada-Church-Closure commented Mar 15, 2026

Uh oh!

pizhenwei left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Mar 9, 2026 •

edited

Loading

sarthakaggarwal97 Mar 10, 2026 •

edited

Loading