Skip to content

Fix nodetool assassinate blocking the GOSSIP stage, causing the execu…#4713

Open
Runtian wants to merge 1 commit intoapache:cassandra-4.1from
Runtian:CASSANDRA-21249-4.1
Open

Fix nodetool assassinate blocking the GOSSIP stage, causing the execu…#4713
Runtian wants to merge 1 commit intoapache:cassandra-4.1from
Runtian:CASSANDRA-21249-4.1

Conversation

@Runtian
Copy link
Copy Markdown
Contributor

@Runtian Runtian commented Apr 7, 2026

…ting node to be marked down and the liveness check to be ineffective

assassinateEndpoint (since CASSANDRA-15059) ran entirely inside runInGossipStageBlocking, including a 30-second RING_DELAY sleep. This blocked the single-threaded GOSSIP stage, causing two issues:

  1. Liveness check is ineffective — the target's heartbeat cannot be updated while the GOSSIP stage is sleeping, so the check always passes, even for live nodes.
  2. Executing node marked DOWN — peers' failure detectors convict the executor because its GOSSIP stage is unresponsive for ~34s.

Fix: Move the heartbeat snapshot and sleep onto the caller (JMX) thread, keeping the GOSSIP stage free. Only enter the GOSSIP stage briefly to verify the heartbeat and perform the assassination. The
post-assassination propagation wait is also moved to the caller thread.


The [Cassandra Jira](https://issues.apache.org/jira/browse/CASSANDRA-21249)

…ting node to be marked down and the liveness check to be ineffective
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants