You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
bug: Tasks with thousands of related nodes get stuck in RUNNING when Prefect rejects events for exceeding PREFECT_SERVER_EVENTS_MAXIMUM_RELATED_RESOURCES (default 500) #9068
Prefect events carry a list of related resources (one entry per related node passed to a flow / emitted from a flow run). Prefect's server enforces an upper bound on this list through PREFECT_SERVER_EVENTS_MAXIMUM_RELATED_RESOURCES, which defaults to 500.
When an Infrahub flow runs over thousands of related nodes (e.g. a computed-attribute pipeline touching every instance of a kind), the terminal event emitted for the flow run exceeds this cap. The Prefect server raises a validation error while persisting the event, the flow run never receives a terminal state transition, and the task is stuck in RUNNING indefinitely from Infrahub's perspective.
The issue is purely the size of the related-resources list on the emitted event — the underlying work has finished successfully.
Expected Behavior
Tasks with arbitrarily large related-node sets must reach a terminal state (COMPLETED / FAILED) reliably. Options to consider:
Cap or chunk related resources at emit time. Truncate the related-resources list to a safe size before the event is sent, or split into multiple events.
Avoid attaching every related node to flow-run events. Move the per-node identity off events (e.g. into task logs or a dedicated relationship in the Infrahub graph) and only keep aggregate / summary related resources on the Prefect event.
Raise the Prefect limit. Document and ship a higher PREFECT_SERVER_EVENTS_MAXIMUM_RELATED_RESOURCES value with the Prefect server config — but this only delays the failure, so it should be paired with (1) or (2).
Steps to Reproduce
Schema with a kind that has more than 500 instances.
Observe in the UI / infrahubctl task list: the task remains in RUNNING.
Inspect Prefect server logs: validation error rejecting the event because related resources exceed PREFECT_SERVER_EVENTS_MAXIMUM_RELATED_RESOURCES (500). No terminal state is recorded for the flow run.
Component
API Server / GraphQL
Infrahub version
1.9 (develop)
Current Behavior
Prefect events carry a list of related resources (one entry per related node passed to a flow / emitted from a flow run). Prefect's server enforces an upper bound on this list through
PREFECT_SERVER_EVENTS_MAXIMUM_RELATED_RESOURCES, which defaults to 500.When an Infrahub flow runs over thousands of related nodes (e.g. a computed-attribute pipeline touching every instance of a kind), the terminal event emitted for the flow run exceeds this cap. The Prefect server raises a validation error while persisting the event, the flow run never receives a terminal state transition, and the task is stuck in
RUNNINGindefinitely from Infrahub's perspective.The issue is purely the size of the related-resources list on the emitted event — the underlying work has finished successfully.
Expected Behavior
Tasks with arbitrarily large related-node sets must reach a terminal state (
COMPLETED/FAILED) reliably. Options to consider:PREFECT_SERVER_EVENTS_MAXIMUM_RELATED_RESOURCESvalue with the Prefect server config — but this only delays the failure, so it should be paired with (1) or (2).Steps to Reproduce
infrahubctl task list: the task remains inRUNNING.PREFECT_SERVER_EVENTS_MAXIMUM_RELATED_RESOURCES(500). No terminal state is recorded for the flow run.Additional Information
PREFECT_SERVER_EVENTS_MAXIMUM_RELATED_RESOURCES = 500.