-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
Problem
When a handler takes a long time to execute (e.g., 20-80 minutes), the step status remains pending in the database until the handler completes. This happens because the entire ExecuteNext operation runs within a single database transaction.
Observable behavior:
- Workflow is started
- Step actually begins executing (handler is called)
- Query
workflow_stepstable from another connection → status ispending,started_atisNULL - Handler runs for 40 minutes, database still shows
pending - Any application built on top of floxy cannot show correct
runningstatus to end users - After handler completes, status jumps directly from
pendingtocompleted
Root Cause
In engine.go, the ExecuteNext method wraps everything in a single ReadCommitted transaction:
func (engine *Engine) ExecuteNext(ctx context.Context, workerID string) (empty bool, err error) {
err = engine.txManager.ReadCommitted(ctx, func(ctx context.Context) error {
// ... dequeue step ...
return engine.executeStep(ctx, instance, step) // handler runs here
})
}Inside executeStep:
// Write "running" status - NOT COMMITTED YET
engine.store.UpdateStep(ctx, step.ID, StepStatusRunning, nil, nil)
// Execute handler - can take 20-80 minutes
output, stepErr = engine.executeTask(handlerCtx, instance, step, stepDef)
// Only after handler returns does the transaction commitSince PostgreSQL READ COMMITTED isolation means other connections only see committed data, the running status is invisible until the handler completes.
Additional Issues
- Connection pool exhaustion — Each worker holds a DB connection for the entire handler duration
- Long-lived transactions — PostgreSQL may have issues with transactions lasting 40+ minutes
Environment
- floxy version: v1.16.4
- Use case: External API handlers that can run for 20-80 minutes
Metadata
Metadata
Assignees
Labels
No labels