Skip to content

Add LFBP Instrumentation for Tracking Ownership#1632

Open
vazois wants to merge 13 commits intodevfrom
vazois/fix-lfbp-dispose-hang-dev
Open

Add LFBP Instrumentation for Tracking Ownership#1632
vazois wants to merge 13 commits intodevfrom
vazois/fix-lfbp-dispose-hang-dev

Conversation

@vazois
Copy link
Contributor

@vazois vazois commented Mar 16, 2026

Summary

Adds instrumentation for tracking owners and type of buffers acquired from LimitedFixedBufferPool (LFBP) to diagnose dispose issues and buffer leaks. In addition, refactors GarnetTcpNetworkSender (and ClientTcpNetworkSender) to utilize the new WorkerMonitor synchronization primitive.

Changes

Buffer Pool Tracking Instrumentation

  • New enums (PoolEntryTypes.cs): PoolEntryBufferType (identifies buffer role, e.g. NetworkReceiveBuffer, SaeaSendBuffer) and PoolOwnerType (identifies pool creator, e.g. ServerNetwork, Replication, ClientSession).
  • PoolEntry.source field: packed int where the low byte stores PoolEntryBufferType and byte 1 stores PoolOwnerType, set when the entry is acquired via LimitedFixedBufferPool.Get().
  • LimitedFixedBufferPool now accepts PoolOwnerType at construction and PoolEntryBufferType in Get(). Under #if DEBUG, a ConcurrentDictionary tracks all outstanding (checked-out) entries.
  • Dispose diagnostics (DEBUG only): after a 5-second timeout, LimitedFixedBufferPool.Dispose() logs all unreturned buffer details (owner type, buffer type, size) to aid leak diagnosis.
  • All call sites updated to pass appropriate PoolOwnerType and PoolEntryBufferType values (GarnetClientSession, ReplicationManager, GarnetServerTcp, NetworkHandler, TcpNetworkHandlerBase, GarnetSaeaBuffer).

WorkerMonitor Synchronization Primitive

  • New WorkerMonitor class (Synchronization/WorkerMonitor.cs): tracks active workers within a critical region with TryEnter/Exit/Dispose API. On dispose, atomically prevents new workers from entering and blocks efficiently via ManualResetEventSlim (no spin-waiting) until all in-flight workers exit.

GarnetTcpNetworkSender / ClientTcpNetworkSender Refactoring

  • Replaced raw Interlocked counter (throttleCount) with WorkerMonitor sendMonitor for tracking active send operations and providing efficient disposal waiting.
  • Send(), SendResponse(), SeaaBuffer_Completed(), Throttle(), and DisposeNetworkSender() all updated to use sendMonitor.TryEnter()/Exit()/Dispose().

Misc

  • Moved SingleWriterMultiReaderLock.cs into Synchronization/ folder for better code organization.
  • Upgraded DisposeActiveHandlers() diagnostics from HANGDETECT to DEBUG with Stopwatch-based timeout logging of stuck handlers.

Copilot AI review requested due to automatic review settings March 16, 2026 22:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a teardown hang caused by incomplete network-handler disposal and leaked pooled buffers, and adds DEBUG-only diagnostics to identify unreturned pool entries and stuck handlers.

Changes:

  • Ensure TcpNetworkHandlerBase.Dispose() performs full cleanup by invoking DisposeImpl().
  • Fix pooled buffer leak by disposing GarnetTcpNetworkSender’s held responseObject during sender disposal.
  • Add pool ownership/buffer-role tagging and DEBUG diagnostics for pool/handler disposal hangs.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
libs/server/Servers/GarnetServerTcp.cs Tags the server network pool with an owner type for diagnostics.
libs/server/Servers/GarnetServerBase.cs Adds DEBUG hang diagnostics for stuck active handler disposal (but introduces a Release build issue via unused using).
libs/common/Networking/TcpNetworkHandlerBase.cs Calls DisposeImpl() from public Dispose() and tags receive-buffer allocations.
libs/common/Networking/NetworkHandler.cs Tags transport/network buffer allocations to aid leak diagnostics.
libs/common/Networking/GarnetTcpNetworkSender.cs Disposes held responseObject during sender disposal to avoid pool reference leaks.
libs/common/Networking/GarnetSaeaBuffer.cs Tags SAEA send-buffer allocations.
libs/common/NetworkBufferSettings.cs Extends buffer-pool creation to accept an owner type.
libs/common/Memory/PoolEntryTypes.cs Introduces enums for pool owner and buffer role tagging.
libs/common/Memory/PoolEntry.cs Adds packed source field used for diagnostics.
libs/common/Memory/LimitedFixedBufferPool.cs Adds owner tagging, DEBUG outstanding-entry tracking, and disposal-time diagnostics (but introduces a Release build issue via unused using).
libs/cluster/Server/Replication/ReplicationManager.cs Tags replication-created pools with owner type.
libs/client/ClientSession/GarnetClientSession.cs Tags client-session-created pools with owner type.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Two bugs caused LimitedFixedBufferPool.Dispose() to hang indefinitely
during server teardown (blocking ClusterResetHardDuringDisklessReplicationAttach):

1. TcpNetworkHandlerBase.Dispose() never called DisposeImpl(), so when a
   handler thread was blocked synchronously (e.g. in TryBeginDisklessSync),
   the CTS was never cancelled and activeHandlerCount was never decremented.
   DisposeActiveHandlers() would spin forever waiting for it to reach 0.

2. GarnetTcpNetworkSender.DisposeNetworkSender() disposed the saeaStack
   but not the current responseObject, leaking a PoolEntry that was never
   returned to the pool. LimitedFixedBufferPool.Dispose() then spun
   forever waiting for totalReferences to reach 0.

Also adds PoolEntry source tracking infrastructure (PoolEntryBufferType
and PoolOwnerType enums) with DEBUG-only diagnostics that log unreturned
buffer details after a 5-second timeout during pool disposal.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@vazois vazois force-pushed the vazois/fix-lfbp-dispose-hang-dev branch from 59ce319 to ce26903 Compare March 16, 2026 22:24
vazois and others added 4 commits March 16, 2026 15:26
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Use Interlocked.Exchange to atomically take ownership of responseObject
and ReturnBuffer it back to the saeaStack before disposal. This ensures
the PoolEntry is disposed exactly once when saeaStack.Dispose() iterates
all items, and avoids a race with ReturnResponseObject() on the handler
thread that could cause Debug.Assert(!disposed) to fail.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@vazois vazois changed the title Fix dispose hang in network handler and buffer pool cleanup Add LFBP Instrumentation for Tracking Ownership Mar 18, 2026
@vazois vazois force-pushed the vazois/fix-lfbp-dispose-hang-dev branch from 9a4db3b to e113972 Compare March 19, 2026 19:59
@vazois vazois force-pushed the vazois/fix-lfbp-dispose-hang-dev branch from a4dd6eb to e65a0e9 Compare March 20, 2026 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants