Skip to content

Missing calls to od_route_signal causing clients to remain queued in od_router_attach()? #971

@vdamle-yb

Description

@vdamle-yb

Hello Team,

We're having some issues around the queueing semantics for incoming clients. Specifically, it seems like there are fewer usages of od_route_signal() than there should be?
For context:

  • od_route_wait() is called once during od_router_attach() if a client fails to immediately get a backend slot (router.c:929):

    odyssey/sources/router.c

    Lines 926 to 929 in 8bd0ada

    uint32_t timeout = route->rule->pool->timeout;
    if (timeout == 0)
    timeout = UINT32_MAX;
    rc = od_route_wait(route, timeout);
  • od_route_signal() is called once at the end of od_router_detach() if there are any queued clients (in router.c:1040):

    odyssey/sources/router.c

    Lines 1039 to 1042 in 8bd0ada

    /* notify waiters */
    if (signal) {
    od_route_signal(route);
    }
  • At the start of od_router_attach(), we enqueue the current client in the pool queue to (at router.c:857).

    odyssey/sources/router.c

    Lines 857 to 858 in 8bd0ada

    /* enqueue client (pending -> queue) */
    od_client_pool_set(&route->client_pool, client, OD_CLIENT_QUEUE);
  • Finally, upon successful attach, we return OD_ROUTER_OK on line 989.
  • However, if attach fails (lines 852, 875, 917, 931 or 943), we return from od_router_attach with the corresponding failure. In at least some of these cases, the caller of od_router_attach() doesn't seem to call od_route_signal().
    • See, for example, calls to od_frontend_attach()
    • Consider error handling in the call stack: ... -> od_frontend_setup() -> od_frontend_setup_params() -> od_frontend_attach() -> od_frontend_attach_to_endpoint() -> od_router_attach() (returning OD_ROUTER_ERROR), which would lead to calling od_frontend_cleanup() with the status passed as OD_ETTACH, which has no associated od_route_signal() call at frontend.c:2405.

In essence, shouldn't od_route_signal() be called for every error handling path from od_router_attach() because we eventually unset the client from the client pool, so we need to wake up a queued client if it is waiting when the current client hits an error.

The above is our current understanding of a potential issue. It is possible we are missing some invariant that prevents the mentioned scenario or that we are misunderstanding the intended code flow. In either case, please correct me if I am wrong.
Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions