Skip to content

[Feature Request] Implement QRTZ_TRIGGERS Fine-Grained Row-Level Locking to replace Global QRTZ_LOCKS.TRIGGER_ACCESS Mutex #1481

@ivo500

Description

@ivo500

1. Introduction: The Scalability Wall of Cluster-Wide Locking

Currently, Quartz relies on a pessimistic cluster-wide locking strategy via the QRTZ_LOCKS table (specifically the TRIGGER_ACCESS lock). While this ensures high data integrity, it creates a significant scalability bottleneck in clustered environments or high-throughput instances processing thousands of jobs per minute.

The official Quartz documentation acknowledges this architectural "speed limit" in the Quartz FAQ:

"...The clustering feature works best for scaling out long-running and/or cpu-intensive jobs (distributing the work-load over multiple nodes). If you need to scale out to support thousands of short-running (e.g 1 second) jobs, consider partitioning the set of jobs by using multiple distinct schedulers. Using one scheduler forces the use of a cluster-wide lock, a pattern that degrades performance as you add more clients."

The Problem:

  • Sequential Acquisition: Even with multiple nodes, the cluster-wide lock acts as a global mutex. Only one node can "acquire" jobs at a time; others sit idle in a database "wait" state.

  • Completion Bottleneck: Since the completion phase is thread-per-job, hundreds of worker threads across the cluster compete for the same single row in QRTZ_LOCKS to update trigger states.

  • Payload Tax: Jobs with large JobDataMaps (property bags) increase the transaction hold time, exponentially increasing the wait-queue for the cluster-wide lock.


2. Conceptual Proposal: Row-Based Concurrency

We propose an optional high-performance locking mode that moves synchronization from a Cluster-Wide Mutex to Fine-Grained Row-Level Locking on the QRTZ_TRIGGERS table.

By utilizing modern SQL features like FOR UPDATE SKIP LOCKED (available in PostgreSQL 9.5+, Oracle 11g+, MySQL 8.0+, and MS SQL Server), Quartz can achieve true horizontal scalability:

  • Parallel Acquisition: Multiple nodes can query the triggers table simultaneously. Node A grabs a batch; Node B skips those locked rows and immediately grabs the next available batch without blocking.

  • Parallel Completion: Completion logic would lock only the specific row being updated, allowing hundreds of threads to commit state changes simultaneously.


3. Technical Example: Proposed SQL Implementation

The DriverDelegate would be updated to support a "Row-Locking" mode. Below is the conceptual SQL logic:

A. High-Concurrency Acquisition

Instead of locking a global semaphore, the acquisition query targets rows directly:

SQL

-- Acquisition with Skip Locked
SELECT TRIGGER_NAME, TRIGGER_GROUP, NEXT_FIRE_TIME, PRIORITY 
FROM QRTZ_TRIGGERS 
WHERE SCHED_NAME = 'MyScheduler' 
  AND TRIGGER_STATE = 'WAITING' 
  AND NEXT_FIRE_TIME <= :now 
ORDER BY NEXT_FIRE_TIME ASC, PRIORITY DESC 
FOR UPDATE SKIP LOCKED 
LIMIT :batchSize;

B. Row-Level State Transition (Completion)

During completion, we avoid the cluster-wide lock and target the specific trigger row to ensure atomicity.

Step 1: Lock the specific trigger row

SQL

-- Explicitly lock only the specific row being finalized
SELECT TRIGGER_STATE 
FROM QRTZ_TRIGGERS 
WHERE SCHED_NAME = 'MyScheduler' 
  AND TRIGGER_NAME = :name 
  AND TRIGGER_GROUP = :group 
FOR UPDATE;

Step 2: Update the state

SQL

-- Completion update using the held row-level lock
UPDATE QRTZ_TRIGGERS 
SET TRIGGER_STATE = 'WAITING', 
    NEXT_FIRE_TIME = :nextTime, 
    PREVIOUS_FIRE_TIME = :lastTime 
WHERE SCHED_NAME = 'MyScheduler' 
  AND TRIGGER_NAME = :name 
  AND TRIGGER_GROUP = :group;


4. Suggested Configuration

We suggest a new property to enable this behavior: org.quartz.jobStore.useRowLevelLocking = true

This feature would allow a single Quartz scheduler instance to handle the "thousands of short-running jobs" use case without requiring the complex partitioning currently recommended in the documentation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions