Skip to content

Fix Statement.cancel()#37472

Open
MilanTyagi2004 wants to merge 16 commits intoapache:masterfrom
MilanTyagi2004:mysql-connector-j
Open

Fix Statement.cancel()#37472
MilanTyagi2004 wants to merge 16 commits intoapache:masterfrom
MilanTyagi2004:mysql-connector-j

Conversation

@MilanTyagi2004
Copy link

Fixes #37404

Changes proposed in this pull request:

  • In Apache ShardingSphere Proxy using the MySQL protocol, cancelling a running SQL statement does not correctly interrupt
    the underlying JDBC Statement in a cluster environment.

  • Although a Process instance is created for the execution, the MySQL thread ID is not bound to the connection session. As a
    result, KILL QUERY / Statement.cancel() cannot locate or terminate the actual running MySQL query.

  • This pull request binds the MySQL thread ID to the connection session after authentication, ensuring that cancellation signals
    are correctly propagated to the JDBC layer.

Expected behavior

  • Statement.cancel() or KILL QUERY should terminate the running MySQL query.

Actual behavior

  • The query continues running even after cancellation is requested.

Environment

  • ShardingSphere version: 5.5.3-SNAPSHOT

  • OS: Windows 11

  • Database: MySQL

  • Mode: Cluster mode


Before committing this PR, I'm sure that I have checked the following options:

  • My code follows the code of conduct of this project.
  • I have self-reviewed the commit code.
  • I have (or in comment I request) added corresponding labels for the pull request.
  • I have verified the changes locally with Maven (module-level and targeted builds)..
  • I have made corresponding changes to the documentation.
  • I have added corresponding unit tests for my changes.
  • I have updated the Release Notes of the current development version. For more details, see Update Release Note

@MilanTyagi2004
Copy link
Author

MilanTyagi2004 commented Dec 23, 2025

@terrymanu Sorry , the previous pr was accidently closed.
Please approve workflow and Review this when you have time.
thankyou for your understanding.

@MilanTyagi2004
Copy link
Author

@terrymanu Please review this.
Core changes are in place; a small part of the implementation is still left and will be added next.

Copy link
Member

@terrymanu terrymanu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current PR has blocking issues and shouldn’t be merged:

  • Root cause not fixed: The handshake exposes a 32-bit connection ID while Process stores a 64-bit one. On MySQL, ProcessRegistry matches the full 64-bit ID and MySQLKillProcessExecutor relies on node bits, so the client’s 32-bit ID never matches and may be treated as
    “remote,” causing cancel/KILL to fail. On PostgreSQL, BackendKeyData only carries a 32-bit processId, but Process stores only a 64-bit connectionId and never sets postgresqlProcessId; CancelRequest looks up by 32-bit, so in a cluster it can’t find the target.
  • Cluster forwarding gap: KillProcessHandler parses IDs with Integer.parseInt; 64-bit IDs with node bits are dropped and never forwarded to the right node.
  • Persistence gap: YamlProcess/swapper only store mysqlThreadId/postgresqlProcessId and omit the 64-bit connection ID and secretKey, so cross-node cancel/kill lacks critical data.
  • Change surface too large: 71 files touched, with many unrelated format/alignment tweaks (e.g., Firebird/OpenGauss frontend), which inflates the diff and review cost.

Suggested direction: (1) Align the handshake identifier with what Process stores—either make the handshake ID reconstruct node bits or map the 32-bit handshake ID back to the 64-bit stored value. (2) PostgreSQL should store both 32-bit postgresqlProcessId and 64-bit
postgresqlConnectionId; Cancel should match on 32-bit, and both the 64-bit ID and secretKey should be persisted. (3) KillProcessHandler must accept and forward 64-bit connection IDs. (4) Extend YamlProcess/swapper to include 64-bit connection IDs and secretKey. (5) Trim
this PR to only the necessary changes, remove unrelated formatting tweaks, and add tests covering handshake → process bind → cancel/KILL for local and cross-node paths (32/64-bit, secretKey mismatch).

In its current form, the PR should not be merged.

@MilanTyagi2004
Copy link
Author

The current PR has blocking issues and shouldn’t be merged:

  • Root cause not fixed: The handshake exposes a 32-bit connection ID while Process stores a 64-bit one. On MySQL, ProcessRegistry matches the full 64-bit ID and MySQLKillProcessExecutor relies on node bits, so the client’s 32-bit ID never matches and may be treated as
    “remote,” causing cancel/KILL to fail. On PostgreSQL, BackendKeyData only carries a 32-bit processId, but Process stores only a 64-bit connectionId and never sets postgresqlProcessId; CancelRequest looks up by 32-bit, so in a cluster it can’t find the target.
  • Cluster forwarding gap: KillProcessHandler parses IDs with Integer.parseInt; 64-bit IDs with node bits are dropped and never forwarded to the right node.
  • Persistence gap: YamlProcess/swapper only store mysqlThreadId/postgresqlProcessId and omit the 64-bit connection ID and secretKey, so cross-node cancel/kill lacks critical data.
  • Change surface too large: 71 files touched, with many unrelated format/alignment tweaks (e.g., Firebird/OpenGauss frontend), which inflates the diff and review cost.

Suggested direction: (1) Align the handshake identifier with what Process stores—either make the handshake ID reconstruct node bits or map the 32-bit handshake ID back to the 64-bit stored value. (2) PostgreSQL should store both 32-bit postgresqlProcessId and 64-bit postgresqlConnectionId; Cancel should match on 32-bit, and both the 64-bit ID and secretKey should be persisted. (3) KillProcessHandler must accept and forward 64-bit connection IDs. (4) Extend YamlProcess/swapper to include 64-bit connection IDs and secretKey. (5) Trim this PR to only the necessary changes, remove unrelated formatting tweaks, and add tests covering handshake → process bind → cancel/KILL for local and cross-node paths (32/64-bit, secretKey mismatch).

In its current form, the PR should not be merged.

Thankyou for the review and i will surely look into this

@MilanTyagi2004
Copy link
Author

hi,@terrymanu i have an doubt ,connection id is pass through firebird/opengaus , and the function written in these are in int
and we have to take 64 bitId
either we have to lower the bits bit using (int) which makes multiple touch file.
any guidance or suggestion

@ShenFeng312
Copy link

Currently, with an auto-increment ID scheme, there might be a scenario where some clients use connection pools while others connect directly. Suppose the auto-increment ID overflows but some original connections haven’t been released yet. In that case, duplicate IDs could occur.
I think we can manage this by maintaining a bitmap based on the maximum number of connections. This would allow us to quickly find the next available ID. Do you agree? @terrymanu @MilanTyagi2004

@ShenFeng312
Copy link

Currently, with an auto-increment ID scheme, there might be a scenario where some clients use connection pools while others connect directly. Suppose the auto-increment ID overflows but some original connections haven’t been released yet. In that case, duplicate IDs could occur. I think we can manage this by maintaining a bitmap based on the maximum number of connections. This would allow us to quickly find the next available ID. Do you agree? @terrymanu @MilanTyagi2004

I will open a new issue to discuss this problem. In the meantime, I think we should pause any further work on this PR.

@MilanTyagi2004
Copy link
Author

Currently, with an auto-increment ID scheme, there might be a scenario where some clients use connection pools while others connect directly. Suppose the auto-increment ID overflows but some original connections haven’t been released yet. In that case, duplicate IDs could occur. I think we can manage this by maintaining a bitmap based on the maximum number of connections. This would allow us to quickly find the next available ID. Do you agree? @terrymanu @MilanTyagi2004

I will open a new issue to discuss this problem. In the meantime, I think we should pause any further work on this PR.

Please give me some time ,I am working on this issue

@MilanTyagi2004
Copy link
Author

MilanTyagi2004 commented Dec 24, 2025

Currently, with an auto-increment ID scheme, there might be a scenario where some clients use connection pools while others connect directly. Suppose the auto-increment ID overflows but some original connections haven’t been released yet. In that case, duplicate IDs could occur. I think we can manage this by maintaining a bitmap based on the maximum number of connections. This would allow us to quickly find the next available ID. Do you agree? @terrymanu @MilanTyagi2004

I will open a new issue to discuss this problem. In the meantime, I think we should pause any further work on this PR.

Currently, with an auto-increment ID scheme, there might be a scenario where some clients use connection pools while others connect directly. Suppose the auto-increment ID overflows but some original connections haven’t been released yet. In that case, duplicate IDs could occur. I think we can manage this by maintaining a bitmap based on the maximum number of connections. This would allow us to quickly find the next available ID. Do you agree? @terrymanu @MilanTyagi2004

Thanks for pointing this out. I agree this scenario can lead to duplicate IDs when auto-increment overflows and some connections are still active (especially with mixed pooled and direct connections).

To handle this safely, I used a BitSet to track allocated connection IDs up to the maximum allowed connections. On allocation, we pick the next clear bit, and on release, we clear it. This ensures an ID is reused only after the original connection is fully released, even after overflow.

This avoids collisions, works well with connection pools, and keeps the behavior deterministic.

@MilanTyagi2004
Copy link
Author

@terrymanu what are you thoughts about this

@terrymanu
Copy link
Member

terrymanu commented Dec 24, 2025

We’ve overcomplicated this. The real goal is just to make Kill/Cancel hit the running process. The chatter exploded because the scope ballooned: we mixed in 64-bit global IDs, cross-node forwarding, persistence, and even Firebird/OpenGauss while the client protocol still exposes 32-bit IDs, and we never nailed a clear mapping. On top of that, we touched 70+ files before agreeing on the design, so formatting/compat issues drowned out the core fix.

Let’s simplify and get back on target:

  • Pick one rule and make “handshake ID == lookup ID” work end to end. Either stay with 32-bit and align MySQL/PostgreSQL handshake, ProcessRegistry, and Kill/Cancel, or if we keep 64-bit, define an exact mapping so the 32-bit client ID can resolve to the stored 64-bit.
  • Fix the main MySQL/PostgreSQL path only: handshake → process bind → registry lookup → Kill/Cancel. Don’t fan out to other protocols yet.
  • Pause unrelated formatting/refactors and shrink the diff.
  • Add a few focused tests: local/(if needed) cross-node, secret key checks, and the ID mapping.

First make the core path reliable; discuss global IDs/bitset allocation separately afterward.

@MilanTyagi2004
Copy link
Author

I’ve reduced the PR to only the process ID binding during authentication, removing all unrelated changes to keep the scope minimal and focused.

Copy link
Member

@terrymanu terrymanu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the effort—good direction, but the current patch doesn’t solve the root cause and introduces new issues. Please refine and resubmit:

  • Design/abstraction issue: Process is a cross-protocol infra class. Adding mysqlThreadId and MySQL-specific lookup into Process/ProcessRegistry bakes dialect details into shared code. Please revert these changes and handle MySQL handshake thread id binding at the MySQL protocol/session layer, or introduce a protocol-agnostic “native connection identifier” abstraction instead.
  • Root cause still unsolved: The handshake thread id remains node-local and not cluster-unique; processId is still not aligned with the handshake id. During execution, ProcessEngine.executeSQL/completeSQLExecution recreates Process with the same processId, dropping the thread id you set at handshake time, so KILL can still fail to find the running process. Please first ensure the handshake id is cluster-unique (or can route to the right node) and keep it associated with the processId across the full lifecycle before wiring cancel/kill.
  • Newly introduced risks: The numeric thread-id scan in ProcessRegistry.get may kill the wrong session if thread ids collide across nodes; the new test sets threadId manually and doesn’t cover the real handshake→execute→kill flow, and it leaves the singleton ProcessRegistry state dirty. Please fix these or roll them back.
  • Unrelated/incorrect changes to revert: Remove the MySQL-specific fields and lookup logic from the infra Process/ProcessRegistry layer; redo the binding and routing in the MySQL protocol layer.

I encourage you to continue: make the handshake connection/thread id cluster-unique, keep it consistently mapped to the processId through the execution lifecycle, and add an end-to-end test covering handshake → execute → KILL/cancel. Looking forward to the updated version.

@MilanTyagi2004
Copy link
Author

@ShenFeng312 @terrymanu All tests pass locally.
CI consistently fails, and I’m unable to reproduce the failure locally.

Before I resubmit, could you please help me identify:
Whether the CI failure is due to test isolation / singleton state expectations that I may be missing.

@badmasflana-netizen
Copy link

This is not CI flakiness.
This is a deterministic NPE introduced by your change. @MilanTyagi2004

@badmasflana-netizen
Copy link

Why it passes locally sometimes
Because:

  • Other tests initialize global/singleton state before these tests

  • CI runs tests in a different order

  • These tests expect handshake() to be null-safe

Classic test-order dependency.

Copy link
Member

@terrymanu terrymanu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice progress here—thanks for tackling this! A few highlights and follow-ups:

  1. Good work
  • Binding the handshake connection ID to the process ID after auth and rewriting KILL/Statement.cancel to use processId is the right direction for #37404.
  • The protocol-layer MySQLConnectionIdRegistry with register/cleanup on auth/release makes the flow clearer.
  1. Needs refinement
  • Process.processStatements now keys by ExecutionUnit; since ExecutionUnit is equal by datasource + SQL, concurrent identical SQL on the same datasource can collide and some statements may not get canceled/killed. Please switch to a collision-free key (e.g.,
    executionUnit identity) or explain why collisions cannot happen.
  • ProcessEngine.executeSQL lazily creates a process when missing, masking call-order errors and leaving processes without proper binding, which still cannot be killed. Please restore an explicit check or ensure connect() always precedes execute.
  • Handshake IDs are still node-local; MySQLConnectionIdRegistry is local. If the goal is to make handshake IDs globally unique/cluster-routable (per the issue discussion), this patch doesn’t solve that yet. Please clarify or extend.
  • KILL and KILL QUERY now share the same logic (cancel query only). That may diverge from native MySQL semantics; please confirm intent or handle separately and document.
  1. Newly introduced issues
  • Possible cancel/kill omissions due to ExecutionUnit key collisions in Process.
  • Lazy process creation can leave unbound processes that can’t be killed—needs a fix.
  1. Unrelated changes
  • Test expansions like ProxyMetaDataInfoExporterTest are unrelated to this PR’s scope; please drop or split them for a focused review.

The direction is solid—addressing these items will make it much safer to merge. Keep going!

Copilot AI review requested due to automatic review settings December 27, 2025 15:41
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical issue in Apache ShardingSphere Proxy where Statement.cancel() and KILL QUERY commands failed to properly terminate running MySQL queries in cluster mode. The core problem was that MySQL protocol connection IDs were not bound to cluster-wide process IDs, preventing cancellation signals from reaching the actual JDBC statements.

Key changes:

  • Introduced MySQLConnectionIdRegistry to map MySQL handshake connection IDs to cluster-unique process IDs
  • Enhanced ConnectionIdGenerator to support cluster-wide unique ID generation with ephemeral node reservation
  • Modified authentication flow to bind connection IDs after successful authentication
  • Implemented KILL statement normalization to translate MySQL connection IDs to process IDs
  • Refactored Process class to use identity-based statement tracking and support statement merging

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
proxy/frontend/spi/src/main/java/org/apache/shardingsphere/proxy/frontend/spi/DatabaseProtocolFrontendEngine.java Adds hook for protocol-specific binding after authentication
proxy/frontend/dialect/mysql/src/main/java/org/apache/shardingsphere/proxy/frontend/mysql/connection/MySQLConnectionIdRegistry.java New registry mapping MySQL connection IDs to cluster process IDs with persistence support
proxy/frontend/dialect/mysql/src/main/java/org/apache/shardingsphere/proxy/frontend/mysql/MySQLFrontendEngine.java Implements binding and cleanup of connection ID mappings
proxy/frontend/dialect/mysql/src/main/java/org/apache/shardingsphere/proxy/frontend/mysql/authentication/MySQLAuthenticationEngine.java Stores native connection ID in channel attributes during handshake
proxy/frontend/dialect/mysql/src/main/java/org/apache/shardingsphere/proxy/frontend/mysql/command/query/text/query/MySQLComQueryPacketExecutor.java Normalizes KILL QUERY statements by translating connection IDs to process IDs
proxy/frontend/dialect/mysql/src/main/java/org/apache/shardingsphere/proxy/frontend/mysql/command/MySQLCommandExecuteEngine.java Code formatting improvements
proxy/frontend/dialect/mysql/src/main/java/org/apache/shardingsphere/proxy/frontend/mysql/command/MySQLCommandExecutorFactory.java Code formatting improvements
proxy/frontend/core/src/main/java/org/apache/shardingsphere/proxy/frontend/netty/FrontendConstants.java New constant for native connection ID attribute key
proxy/frontend/core/src/main/java/org/apache/shardingsphere/proxy/frontend/netty/FrontendChannelInboundHandler.java Invokes binding hook after authentication and adds connection ID cleanup
proxy/frontend/core/src/main/java/org/apache/shardingsphere/proxy/frontend/connection/ConnectionIdGenerator.java Adds cluster-aware ID generation and release mechanism
proxy/backend/dialect/mysql/src/main/java/org/apache/shardingsphere/proxy/backend/mysql/handler/admin/executor/MySQLKillProcessExecutor.java Enhances validation and error handling for KILL commands
proxy/backend/core/src/main/java/org/apache/shardingsphere/proxy/backend/session/ConnectionSession.java Adds setter for processId field
infra/executor/src/main/java/org/apache/shardingsphere/infra/executor/sql/process/Process.java Refactors statement tracking with identity wrapper, adds merge support, and improves kill() error handling
infra/executor/src/main/java/org/apache/shardingsphere/infra/executor/sql/process/ProcessEngine.java Updates process lifecycle to support merging execution contexts
infra/executor/src/main/java/org/apache/shardingsphere/infra/executor/sql/process/ProcessRegistry.java Changes merge logic to use Map.merge with BiFunction
kernel/data-pipeline/core/src/main/java/org/apache/shardingsphere/data/pipeline/core/job/progress/persist/PipelineJobProgressPersistService.java Changes event count reset from addAndGet to set
proxy/frontend/dialect/mysql/pom.xml Adds dependency on cluster-mode-repository-api
proxy/frontend/core/pom.xml Adds dependency on cluster-mode-repository-api
proxy/frontend/dialect/mysql/src/test/java/org/apache/shardingsphere/proxy/frontend/mysql/connection/MySQLConnectionIdRegistryTest.java Comprehensive tests for connection ID registry including cluster mode
proxy/frontend/dialect/mysql/src/test/java/org/apache/shardingsphere/proxy/frontend/mysql/MySQLFrontendEngineTest.java Tests for connection ID cleanup in release()
proxy/frontend/dialect/mysql/src/test/java/org/apache/shardingsphere/proxy/frontend/mysql/authentication/MySQLAuthenticationEngineTest.java Adds mock for native connection ID attribute
proxy/backend/dialect/mysql/src/test/java/org/apache/shardingsphere/proxy/backend/mysql/handler/admin/executor/MySQLKillProcessExecutorTest.java Test for KILL without scope validation
infra/executor/src/test/java/org/apache/shardingsphere/infra/executor/sql/process/ProcessEngineTest.java Updates tests to reflect new process lifecycle

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@MilanTyagi2004
Copy link
Author

@terrymanu could you please review this.

Copy link
Member

@terrymanu terrymanu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to align on the design before touching more code. The current code keeps changing direction, which raises review cost and makes merging risky. Please first share a concise design note (target behavior, protocol compatibility, core flow: processId creation/
propagation, native connectionId registration and cross-node lookup, KILL/Statement.cancel handling, fallback paths). Once that’s agreed, we can adjust the code accordingly.

Clear change requests:

  1. Compatibility: MySQLKillProcessExecutor now forces scope=QUERY; KILL throws an exception. This breaks MySQL semantics and regresses existing behavior. The design must spell out how to support both KILL and KILL QUERY , then implement and test it.
  2. Process registration: ProcessEngine.executeSQL removed the null processId fallback and now returns immediately. If upstream doesn’t set processId (e.g., federated/pipeline paths), cancellations will stop working. The design must define when/how processId is created
    and propagated end-to-end, whether a safe fallback is needed, and tests should cover it.
  3. KILL normalization: currently only KILL QUERY is normalized; KILL is neither mapped nor accepted, so #37404 is only partially fixed. Define the full support matrix and behavior, then implement and test it.
  4. Unrelated changes: cluster connectionId reservation/persistence, ProxyMetaDataInfoExporterTest, PipelineJobProgressPersistService, etc., are not related to this issue. Please revert them to keep the PR focused. If cluster connectionId changes are truly needed,
    justify necessity/compatibility/risk in the design and consider a separate PR.
  5. Test plan: include in the design your validation for KILL/Statement.cancel in single-node and two-node cluster scenarios, covering cross-node native connectionId ↔ processId lookup and JDBC Statement.cancel.

Please provide the design/plan covering the above decisions and expected behavior first; that will lower review overhead and build confidence for merging.

Please be advised that I need to see a design proposal​ first. Any further direct code changes submitted without this will be assigned a lower review priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Statement.cancel() in MySQL JDBC fails in a cluster environment.

4 participants