privsep: Ensure we recv for real after a successful recv MSG_PEEK#556
privsep: Ensure we recv for real after a successful recv MSG_PEEK#556
Conversation
Adjust the code flow so that the same errors would be caught after the final recv. This ensures we read what is really meant for us and not something silly. Should fix #555.
|
Important Review skippedThis PR was authored by the user configured for CodeRabbit reviews. By default, CodeRabbit skips reviewing PRs authored by this user. It's recommended to use a dedicated user account to post CodeRabbit review feedback. To trigger a single review, invoke the You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughReworked privsep IPC I/O to a two-phase recvmsg flow (peek to size, then mandatory consume), introduced optional heap allocation for per-message payloads (psr_mallocdata/psr_data), removed in-place mdata realloc/copy paths and the per-process data pointer/free callback fields and cleanup. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Root
participant KernelSocket
Client->>KernelSocket: sendmsg(error+payload)
Root->>KernelSocket: recvmsg(MSG_PEEK|MSG_WAITALL) // peek header + lengths
KernelSocket-->>Root: header + total_len (peeked)
alt allocate payload (psr_mallocdata)
Root->>Root: malloc(psr_data, total_len)
Root->>KernelSocket: recvmsg(MSG_WAITALL) // consume and read payload into psr_data
KernelSocket-->>Root: payload (consumed)
else use preallocated buffer
Root->>KernelSocket: recvmsg(MSG_WAITALL) // consume into existing buffer
KernelSocket-->>Root: payload (consumed)
end
Root->>Client: process/error callback invoked with read data
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/privsep-root.c(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: netbsd (--disable-ipv6)
- GitHub Check: netbsd (--disable-ipv6, -DSMALL)
- GitHub Check: netbsd (-DSMALL)
- GitHub Check: netbsd (--disable-dhcp6, -DSMALL)
- GitHub Check: netbsd (--disable-dhcp6)
- GitHub Check: netbsd (--disable-ipv4ll)
- GitHub Check: netbsd (--disable-ipv4ll, -DSMALL)
- GitHub Check: netbsd (--disable-ipv4)
- GitHub Check: netbsd (--disable-arp, -DSMALL)
- GitHub Check: netbsd (--disable-ipv4, -DSMALL)
- GitHub Check: netbsd (--disable-arp)
- GitHub Check: netbsd
- GitHub Check: freebsd
- GitHub Check: openbsd
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/privsep-root.c (1)
129-138: Critical: Fix MSG_WAITALL hang by using incoming message size.Lines 134 and 137 set
iov_lenfrom the context buffer sizes (psr_ctx->psr_mdatalen,psr_ctx->psr_datalen), which can be larger than the actual incoming message sizepsr_error->psr_datalen. WhenMSG_WAITALLis used with aniov_lenlarger than available data,recvmsgblocks indefinitely waiting for bytes that will never arrive, wedging the privileged proxy and causing cascading failures like "No buffer space available."This is the same critical issue identified in the past review and must be fixed.
Apply this diff to clamp
iov_lento the actual incoming message size:if (psr_error->psr_datalen != 0) { if (psr_ctx->psr_usemdata) { iov[1].iov_base = psr_ctx->psr_mdata; - /* psr_mdatalen could be smaller then psr_datalen - * if the above malloc failed. */ - iov[1].iov_len = - MIN(psr_ctx->psr_mdatalen, psr_ctx->psr_datalen); + /* Clamp to actual incoming size to avoid MSG_WAITALL hang. + * If buffer is too small, MSG_TRUNC will catch it. */ + iov[1].iov_len = psr_error->psr_datalen; + if (iov[1].iov_len > psr_ctx->psr_mdatalen) + iov[1].iov_len = psr_ctx->psr_mdatalen; } else { iov[1].iov_base = psr_ctx->psr_data; - iov[1].iov_len = psr_ctx->psr_datalen; + iov[1].iov_len = psr_error->psr_datalen; + if (iov[1].iov_len > psr_ctx->psr_datalen) + iov[1].iov_len = psr_ctx->psr_datalen; } }
🧹 Nitpick comments (1)
src/privsep-root.c (1)
102-111: Consider removing MSG_WAITALL from the MSG_PEEK operation.Using
MSG_WAITALLwithMSG_PEEKmay cause unnecessary blocking when the message header hasn't fully arrived yet. Since this is just a peek operation to determine buffer size, and you're checking for short reads immediately after (line 110),MSG_PEEKalone should suffice.Apply this diff:
- len = recvmsg(fd, &msg, MSG_PEEK | MSG_WAITALL); + len = recvmsg(fd, &msg, MSG_PEEK);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/privsep-root.c(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: netbsd (--disable-dhcp6, -DSMALL)
- GitHub Check: netbsd (--disable-ipv6, -DSMALL)
- GitHub Check: netbsd (--disable-dhcp6)
- GitHub Check: netbsd (--disable-ipv6)
- GitHub Check: netbsd (--disable-ipv4ll)
- GitHub Check: netbsd (--disable-ipv4ll, -DSMALL)
- GitHub Check: netbsd (--disable-arp)
- GitHub Check: netbsd (-DSMALL)
- GitHub Check: netbsd (--disable-ipv4)
- GitHub Check: netbsd (--disable-arp, -DSMALL)
- GitHub Check: netbsd
- GitHub Check: netbsd (--disable-ipv4, -DSMALL)
- GitHub Check: freebsd
- GitHub Check: openbsd
🔇 Additional comments (2)
src/privsep-root.c (2)
113-127: LGTM! Proper handling of allocation failure.The allocation failure handling is correct: when
reallocfails, the code leaves the smaller buffer in place and relies onMSG_TRUNCdetection in the subsequentrecvmsgto signalENOBUFS. This ensures the message is properly discarded from the queue.
141-151: LGTM! Comprehensive validation of received message.The final receive path correctly:
- Centralizes the actual message consumption for both early failures and normal flow
- Validates minimum header size
- Detects truncation via
MSG_TRUNCand returnsENOBUFS- Ensures exact size match between received data and expected header + payload
Allows us to distinguish from a shortened header. Log the result and expected sizes to try and diagnose.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/privsep-root.c (2)
118-140: Dynamic buffer growth andiov_lencapping avoid hangs and handle ENOBUFS cleanlyThe
reallocofpsr_ctx->psr_mdataand the subsequent setup:
- Preserve the old buffer if
reallocfails, so the pointer and size remain valid.- Use
iov[1].iov_basefrom eitherpsr_mdataor the caller’s buffer as before.- Critically, cap
iov[1].iov_lentopsr_error->psr_datalen, so the total requested length is never larger than the actual message size, eliminating the previous MSG_WAITALL hang scenario when the context buffer was larger than the incoming payload.- In the “realloc failed” case,
psr_mdatalen < psr_error->psr_datalen, which will force a truncated read in the finalrecvmsg, correctly surfacing ENOBUFS while draining the queue.Only tiny nit: the new comment still says “failed to malloc” even though
reallocis used here.- /* If we failed to malloc then psr_mdatalen will be smaller + /* If we failed to realloc then psr_mdatalen will be smaller
142-157: Finalrecvmsgvalidation is robust; comment could be tightened to match new invariantsThe final
recvmsg(fd, &msg, MSG_WAITALL)plus:
- Short‑header check (
len < sizeof(*psr_error)),- Explicit MSG_TRUNC → ENOBUFS mapping,
- Exact size check against
sizeof(*psr_error) + psr_error->psr_datalenwith EBADMSG and logging,gives a solid guard against partial or malformed records and guarantees the caller sees consistent errors while the message is fully consumed from the queue. With the new
iov_lencapping above, the “receive buffers are bigger than the actual message” case cannot actually occur anymore, so the comment at Line 143 is now slightly misleading; consider updating it to say explicitly that the total requested length is bounded by the announcedpsr_datalen, which is what prevents stalling with MSG_WAITALL.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/privsep-root.c(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: netbsd (--disable-dhcp6, -DSMALL)
- GitHub Check: netbsd (--disable-ipv4ll, -DSMALL)
- GitHub Check: netbsd (--disable-dhcp6)
- GitHub Check: netbsd (--disable-ipv6, -DSMALL)
- GitHub Check: netbsd (--disable-ipv4ll)
- GitHub Check: netbsd (--disable-ipv6)
- GitHub Check: netbsd (-DSMALL)
- GitHub Check: netbsd
- GitHub Check: netbsd (--disable-ipv4, -DSMALL)
- GitHub Check: netbsd (--disable-arp, -DSMALL)
- GitHub Check: netbsd (--disable-arp)
- GitHub Check: netbsd (--disable-ipv4)
- GitHub Check: freebsd
- GitHub Check: openbsd
🔇 Additional comments (1)
src/privsep-root.c (1)
89-112: Two-phaserecvmsgwith MSG_PEEK looks correct and keeps the queue in syncThe introduction of
struct msghdr msgand the initialrecvmsg(fd, &msg, MSG_PEEK | MSG_WAITALL)followed by an unconditional non‑peekrecvmsg(viagoto recvwhen needed) ensures you (a) size buffers based on the header, and (b) always actually consume the message from the SOCK_SEQPACKET queue. This addresses the prior risk of acting only on peeked data without draining the record, without introducing obvious new failure modes.
|
Does this one seem ready to merge too? |
I would like some feedback on it first as I still don't know if it fixes people issues as I cannot replicate the problem. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/privsep-root.c`:
- Around line 130-144: The error path after the second recvmsg can leak memory
when psr_mallocdata succeeded: ensure pc->psr_data is freed (or set to NULL)
before invoking PSR_ERROR in the recv handling block; modify the recv error
branches in the recv/psr_error handling so that before any PSR_ERROR call you
either free(pc->psr_data) or set pc->psr_data = NULL (so ps_root_mreaderror and
callers like ps_root_getifaddrs won't receive a dangling pointer), or factor out
a single cleanup label that clears pc->psr_data and then jumps to the PSR_ERROR
invocation to avoid repetition.
Adjust the code flow so that the same errors would be caught after the final recv.
This ensures we read what is really meant for us and not something silly.
Should fix #555.