Conversation
|
@terrymanu Sorry , the previous pr was accidently closed. |
|
@terrymanu Please review this. |
terrymanu
left a comment
There was a problem hiding this comment.
The current PR has blocking issues and shouldn’t be merged:
- Root cause not fixed: The handshake exposes a 32-bit connection ID while Process stores a 64-bit one. On MySQL, ProcessRegistry matches the full 64-bit ID and MySQLKillProcessExecutor relies on node bits, so the client’s 32-bit ID never matches and may be treated as
“remote,” causing cancel/KILL to fail. On PostgreSQL, BackendKeyData only carries a 32-bit processId, but Process stores only a 64-bit connectionId and never sets postgresqlProcessId; CancelRequest looks up by 32-bit, so in a cluster it can’t find the target. - Cluster forwarding gap: KillProcessHandler parses IDs with Integer.parseInt; 64-bit IDs with node bits are dropped and never forwarded to the right node.
- Persistence gap: YamlProcess/swapper only store mysqlThreadId/postgresqlProcessId and omit the 64-bit connection ID and secretKey, so cross-node cancel/kill lacks critical data.
- Change surface too large: 71 files touched, with many unrelated format/alignment tweaks (e.g., Firebird/OpenGauss frontend), which inflates the diff and review cost.
Suggested direction: (1) Align the handshake identifier with what Process stores—either make the handshake ID reconstruct node bits or map the 32-bit handshake ID back to the 64-bit stored value. (2) PostgreSQL should store both 32-bit postgresqlProcessId and 64-bit
postgresqlConnectionId; Cancel should match on 32-bit, and both the 64-bit ID and secretKey should be persisted. (3) KillProcessHandler must accept and forward 64-bit connection IDs. (4) Extend YamlProcess/swapper to include 64-bit connection IDs and secretKey. (5) Trim
this PR to only the necessary changes, remove unrelated formatting tweaks, and add tests covering handshake → process bind → cancel/KILL for local and cross-node paths (32/64-bit, secretKey mismatch).
In its current form, the PR should not be merged.
Thankyou for the review and i will surely look into this |
|
hi,@terrymanu i have an doubt ,connection id is pass through firebird/opengaus , and the function written in these are in int |
|
Currently, with an auto-increment ID scheme, there might be a scenario where some clients use connection pools while others connect directly. Suppose the auto-increment ID overflows but some original connections haven’t been released yet. In that case, duplicate IDs could occur. |
I will open a new issue to discuss this problem. In the meantime, I think we should pause any further work on this PR. |
Please give me some time ,I am working on this issue |
Thanks for pointing this out. I agree this scenario can lead to duplicate IDs when auto-increment overflows and some connections are still active (especially with mixed pooled and direct connections). To handle this safely, I used a BitSet to track allocated connection IDs up to the maximum allowed connections. On allocation, we pick the next clear bit, and on release, we clear it. This ensures an ID is reused only after the original connection is fully released, even after overflow. This avoids collisions, works well with connection pools, and keeps the behavior deterministic. |
|
@terrymanu what are you thoughts about this |
|
We’ve overcomplicated this. The real goal is just to make Kill/Cancel hit the running process. The chatter exploded because the scope ballooned: we mixed in 64-bit global IDs, cross-node forwarding, persistence, and even Firebird/OpenGauss while the client protocol still exposes 32-bit IDs, and we never nailed a clear mapping. On top of that, we touched 70+ files before agreeing on the design, so formatting/compat issues drowned out the core fix. Let’s simplify and get back on target:
First make the core path reliable; discuss global IDs/bitset allocation separately afterward. |
4420755 to
0d59385
Compare
|
I’ve reduced the PR to only the process ID binding during authentication, removing all unrelated changes to keep the scope minimal and focused. |
There was a problem hiding this comment.
Thanks for the effort—good direction, but the current patch doesn’t solve the root cause and introduces new issues. Please refine and resubmit:
- Design/abstraction issue: Process is a cross-protocol infra class. Adding mysqlThreadId and MySQL-specific lookup into Process/ProcessRegistry bakes dialect details into shared code. Please revert these changes and handle MySQL handshake thread id binding at the MySQL protocol/session layer, or introduce a protocol-agnostic “native connection identifier” abstraction instead.
- Root cause still unsolved: The handshake thread id remains node-local and not cluster-unique; processId is still not aligned with the handshake id. During execution, ProcessEngine.executeSQL/completeSQLExecution recreates Process with the same processId, dropping the thread id you set at handshake time, so KILL can still fail to find the running process. Please first ensure the handshake id is cluster-unique (or can route to the right node) and keep it associated with the processId across the full lifecycle before wiring cancel/kill.
- Newly introduced risks: The numeric thread-id scan in ProcessRegistry.get may kill the wrong session if thread ids collide across nodes; the new test sets threadId manually and doesn’t cover the real handshake→execute→kill flow, and it leaves the singleton ProcessRegistry state dirty. Please fix these or roll them back.
- Unrelated/incorrect changes to revert: Remove the MySQL-specific fields and lookup logic from the infra Process/ProcessRegistry layer; redo the binding and routing in the MySQL protocol layer.
I encourage you to continue: make the handshake connection/thread id cluster-unique, keep it consistently mapped to the processId through the execution lifecycle, and add an end-to-end test covering handshake → execute → KILL/cancel. Looking forward to the updated version.
|
@ShenFeng312 @terrymanu All tests pass locally. Before I resubmit, could you please help me identify: |
|
This is not CI flakiness. |
|
Why it passes locally sometimes
Classic test-order dependency. |
terrymanu
left a comment
There was a problem hiding this comment.
Nice progress here—thanks for tackling this! A few highlights and follow-ups:
- Good work
- Binding the handshake connection ID to the process ID after auth and rewriting KILL/Statement.cancel to use processId is the right direction for #37404.
- The protocol-layer MySQLConnectionIdRegistry with register/cleanup on auth/release makes the flow clearer.
- Needs refinement
- Process.processStatements now keys by ExecutionUnit; since ExecutionUnit is equal by datasource + SQL, concurrent identical SQL on the same datasource can collide and some statements may not get canceled/killed. Please switch to a collision-free key (e.g.,
executionUnit identity) or explain why collisions cannot happen. - ProcessEngine.executeSQL lazily creates a process when missing, masking call-order errors and leaving processes without proper binding, which still cannot be killed. Please restore an explicit check or ensure connect() always precedes execute.
- Handshake IDs are still node-local; MySQLConnectionIdRegistry is local. If the goal is to make handshake IDs globally unique/cluster-routable (per the issue discussion), this patch doesn’t solve that yet. Please clarify or extend.
- KILL and KILL QUERY now share the same logic (cancel query only). That may diverge from native MySQL semantics; please confirm intent or handle separately and document.
- Newly introduced issues
- Possible cancel/kill omissions due to ExecutionUnit key collisions in Process.
- Lazy process creation can leave unbound processes that can’t be killed—needs a fix.
- Unrelated changes
- Test expansions like ProxyMetaDataInfoExporterTest are unrelated to this PR’s scope; please drop or split them for a focused review.
The direction is solid—addressing these items will make it much safer to merge. Keep going!
There was a problem hiding this comment.
Pull request overview
This PR fixes a critical issue in Apache ShardingSphere Proxy where Statement.cancel() and KILL QUERY commands failed to properly terminate running MySQL queries in cluster mode. The core problem was that MySQL protocol connection IDs were not bound to cluster-wide process IDs, preventing cancellation signals from reaching the actual JDBC statements.
Key changes:
- Introduced
MySQLConnectionIdRegistryto map MySQL handshake connection IDs to cluster-unique process IDs - Enhanced
ConnectionIdGeneratorto support cluster-wide unique ID generation with ephemeral node reservation - Modified authentication flow to bind connection IDs after successful authentication
- Implemented KILL statement normalization to translate MySQL connection IDs to process IDs
- Refactored
Processclass to use identity-based statement tracking and support statement merging
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| proxy/frontend/spi/src/main/java/org/apache/shardingsphere/proxy/frontend/spi/DatabaseProtocolFrontendEngine.java | Adds hook for protocol-specific binding after authentication |
| proxy/frontend/dialect/mysql/src/main/java/org/apache/shardingsphere/proxy/frontend/mysql/connection/MySQLConnectionIdRegistry.java | New registry mapping MySQL connection IDs to cluster process IDs with persistence support |
| proxy/frontend/dialect/mysql/src/main/java/org/apache/shardingsphere/proxy/frontend/mysql/MySQLFrontendEngine.java | Implements binding and cleanup of connection ID mappings |
| proxy/frontend/dialect/mysql/src/main/java/org/apache/shardingsphere/proxy/frontend/mysql/authentication/MySQLAuthenticationEngine.java | Stores native connection ID in channel attributes during handshake |
| proxy/frontend/dialect/mysql/src/main/java/org/apache/shardingsphere/proxy/frontend/mysql/command/query/text/query/MySQLComQueryPacketExecutor.java | Normalizes KILL QUERY statements by translating connection IDs to process IDs |
| proxy/frontend/dialect/mysql/src/main/java/org/apache/shardingsphere/proxy/frontend/mysql/command/MySQLCommandExecuteEngine.java | Code formatting improvements |
| proxy/frontend/dialect/mysql/src/main/java/org/apache/shardingsphere/proxy/frontend/mysql/command/MySQLCommandExecutorFactory.java | Code formatting improvements |
| proxy/frontend/core/src/main/java/org/apache/shardingsphere/proxy/frontend/netty/FrontendConstants.java | New constant for native connection ID attribute key |
| proxy/frontend/core/src/main/java/org/apache/shardingsphere/proxy/frontend/netty/FrontendChannelInboundHandler.java | Invokes binding hook after authentication and adds connection ID cleanup |
| proxy/frontend/core/src/main/java/org/apache/shardingsphere/proxy/frontend/connection/ConnectionIdGenerator.java | Adds cluster-aware ID generation and release mechanism |
| proxy/backend/dialect/mysql/src/main/java/org/apache/shardingsphere/proxy/backend/mysql/handler/admin/executor/MySQLKillProcessExecutor.java | Enhances validation and error handling for KILL commands |
| proxy/backend/core/src/main/java/org/apache/shardingsphere/proxy/backend/session/ConnectionSession.java | Adds setter for processId field |
| infra/executor/src/main/java/org/apache/shardingsphere/infra/executor/sql/process/Process.java | Refactors statement tracking with identity wrapper, adds merge support, and improves kill() error handling |
| infra/executor/src/main/java/org/apache/shardingsphere/infra/executor/sql/process/ProcessEngine.java | Updates process lifecycle to support merging execution contexts |
| infra/executor/src/main/java/org/apache/shardingsphere/infra/executor/sql/process/ProcessRegistry.java | Changes merge logic to use Map.merge with BiFunction |
| kernel/data-pipeline/core/src/main/java/org/apache/shardingsphere/data/pipeline/core/job/progress/persist/PipelineJobProgressPersistService.java | Changes event count reset from addAndGet to set |
| proxy/frontend/dialect/mysql/pom.xml | Adds dependency on cluster-mode-repository-api |
| proxy/frontend/core/pom.xml | Adds dependency on cluster-mode-repository-api |
| proxy/frontend/dialect/mysql/src/test/java/org/apache/shardingsphere/proxy/frontend/mysql/connection/MySQLConnectionIdRegistryTest.java | Comprehensive tests for connection ID registry including cluster mode |
| proxy/frontend/dialect/mysql/src/test/java/org/apache/shardingsphere/proxy/frontend/mysql/MySQLFrontendEngineTest.java | Tests for connection ID cleanup in release() |
| proxy/frontend/dialect/mysql/src/test/java/org/apache/shardingsphere/proxy/frontend/mysql/authentication/MySQLAuthenticationEngineTest.java | Adds mock for native connection ID attribute |
| proxy/backend/dialect/mysql/src/test/java/org/apache/shardingsphere/proxy/backend/mysql/handler/admin/executor/MySQLKillProcessExecutorTest.java | Test for KILL without scope validation |
| infra/executor/src/test/java/org/apache/shardingsphere/infra/executor/sql/process/ProcessEngineTest.java | Updates tests to reflect new process lifecycle |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...utor/src/main/java/org/apache/shardingsphere/infra/executor/sql/process/ProcessRegistry.java
Show resolved
Hide resolved
.../main/java/org/apache/shardingsphere/proxy/frontend/netty/FrontendChannelInboundHandler.java
Show resolved
Hide resolved
...hardingsphere/data/pipeline/core/job/progress/persist/PipelineJobProgressPersistService.java
Show resolved
Hide resolved
...src/main/java/org/apache/shardingsphere/proxy/frontend/connection/ConnectionIdGenerator.java
Show resolved
Hide resolved
infra/executor/src/main/java/org/apache/shardingsphere/infra/executor/sql/process/Process.java
Show resolved
Hide resolved
...nd/core/src/main/java/org/apache/shardingsphere/proxy/backend/session/ConnectionSession.java
Outdated
Show resolved
Hide resolved
...ecutor/src/main/java/org/apache/shardingsphere/infra/executor/sql/process/ProcessEngine.java
Show resolved
Hide resolved
...hardingsphere/proxy/frontend/mysql/command/query/text/query/MySQLComQueryPacketExecutor.java
Outdated
Show resolved
Hide resolved
infra/executor/src/main/java/org/apache/shardingsphere/infra/executor/sql/process/Process.java
Show resolved
Hide resolved
.../mysql/src/main/java/org/apache/shardingsphere/proxy/frontend/mysql/MySQLFrontendEngine.java
Show resolved
Hide resolved
|
@terrymanu could you please review this. |
There was a problem hiding this comment.
We need to align on the design before touching more code. The current code keeps changing direction, which raises review cost and makes merging risky. Please first share a concise design note (target behavior, protocol compatibility, core flow: processId creation/
propagation, native connectionId registration and cross-node lookup, KILL/Statement.cancel handling, fallback paths). Once that’s agreed, we can adjust the code accordingly.
Clear change requests:
- Compatibility: MySQLKillProcessExecutor now forces scope=QUERY; KILL throws an exception. This breaks MySQL semantics and regresses existing behavior. The design must spell out how to support both KILL and KILL QUERY , then implement and test it.
- Process registration: ProcessEngine.executeSQL removed the null processId fallback and now returns immediately. If upstream doesn’t set processId (e.g., federated/pipeline paths), cancellations will stop working. The design must define when/how processId is created
and propagated end-to-end, whether a safe fallback is needed, and tests should cover it. - KILL normalization: currently only KILL QUERY is normalized; KILL is neither mapped nor accepted, so #37404 is only partially fixed. Define the full support matrix and behavior, then implement and test it.
- Unrelated changes: cluster connectionId reservation/persistence, ProxyMetaDataInfoExporterTest, PipelineJobProgressPersistService, etc., are not related to this issue. Please revert them to keep the PR focused. If cluster connectionId changes are truly needed,
justify necessity/compatibility/risk in the design and consider a separate PR. - Test plan: include in the design your validation for KILL/Statement.cancel in single-node and two-node cluster scenarios, covering cross-node native connectionId ↔ processId lookup and JDBC Statement.cancel.
Please provide the design/plan covering the above decisions and expected behavior first; that will lower review overhead and build confidence for merging.
Please be advised that I need to see a design proposal first. Any further direct code changes submitted without this will be assigned a lower review priority.
Fixes #37404
Changes proposed in this pull request:
In Apache ShardingSphere Proxy using the MySQL protocol, cancelling a running SQL statement does not correctly interrupt
the underlying JDBC Statement in a cluster environment.
Although a Process instance is created for the execution, the MySQL thread ID is not bound to the connection session. As a
result, KILL QUERY / Statement.cancel() cannot locate or terminate the actual running MySQL query.
This pull request binds the MySQL thread ID to the connection session after authentication, ensuring that cancellation signals
are correctly propagated to the JDBC layer.
Expected behavior
Actual behavior
Environment
ShardingSphere version: 5.5.3-SNAPSHOT
OS: Windows 11
Database: MySQL
Mode: Cluster mode
Before committing this PR, I'm sure that I have checked the following options: