-
Notifications
You must be signed in to change notification settings - Fork 192
Missing calls to od_route_signal causing clients to remain queued in od_router_attach()? #971
Description
Hello Team,
We're having some issues around the queueing semantics for incoming clients. Specifically, it seems like there are fewer usages of od_route_signal() than there should be?
For context:
od_route_wait()is called once duringod_router_attach()if a client fails to immediately get a backend slot (router.c:929):
Lines 926 to 929 in 8bd0ada
uint32_t timeout = route->rule->pool->timeout; if (timeout == 0) timeout = UINT32_MAX; rc = od_route_wait(route, timeout); od_route_signal()is called once at the end ofod_router_detach()if there are any queued clients (inrouter.c:1040):
Lines 1039 to 1042 in 8bd0ada
/* notify waiters */ if (signal) { od_route_signal(route); } - At the start of
od_router_attach(), we enqueue the current client in the pool queue to (atrouter.c:857).
Lines 857 to 858 in 8bd0ada
/* enqueue client (pending -> queue) */ od_client_pool_set(&route->client_pool, client, OD_CLIENT_QUEUE); - Finally, upon successful attach, we return
OD_ROUTER_OKon line 989. - However, if attach fails (lines 852, 875, 917, 931 or 943), we return from
od_router_attachwith the corresponding failure. In at least some of these cases, the caller ofod_router_attach()doesn't seem to callod_route_signal().- See, for example, calls to
od_frontend_attach() - Consider error handling in the call stack:
... -> od_frontend_setup() -> od_frontend_setup_params() -> od_frontend_attach() -> od_frontend_attach_to_endpoint() -> od_router_attach() (returning OD_ROUTER_ERROR), which would lead to callingod_frontend_cleanup()with the status passed asOD_ETTACH, which has no associatedod_route_signal()call atfrontend.c:2405.
- See, for example, calls to
In essence, shouldn't od_route_signal() be called for every error handling path from od_router_attach() because we eventually unset the client from the client pool, so we need to wake up a queued client if it is waiting when the current client hits an error.
The above is our current understanding of a potential issue. It is possible we are missing some invariant that prevents the mentioned scenario or that we are misunderstanding the intended code flow. In either case, please correct me if I am wrong.
Thanks