Skip to content

Intern queued query text in shared HTAB to bound DSA usage#92

Open
iskakaushik wants to merge 1 commit intomainfrom
dsa-query-intern
Open

Intern queued query text in shared HTAB to bound DSA usage#92
iskakaushik wants to merge 1 commit intomainfrom
dsa-query-intern

Conversation

@iskakaushik
Copy link
Copy Markdown
Collaborator

Summary

  • Add a shared, partition-locked HTAB whose entries point at refcount-managed DSA bodies, and route queued query text through it. Live DSA usage drops from queued_events * query_len to distinct_live_query_texts * query_len.
  • Pattern mirrors pg_stat_statements (shared HTAB sized via hash_estimate_size + ShmemInitHash) and pgstat_shmem (refcounted DSA bodies freed only after the HTAB entry is removed).
  • Error messages stay per-event for now (separate optimization).

Why

Without interning, every queued event owned a private DSA copy of the normalized query text. With query_len ~= 2047 bytes, the DSA string pool was exhausted well before the queue itself reached capacity, especially under workloads that repeatedly executed the same long normalized query.

What changed

  • src/queue/query_intern.{h,c} — pure-C interner: 32-partition LWLock HTAB → DSA-allocated PschQueryInternObject (key + magic + bytes), refcount on entry. Acquire allocates outside the partition lock and re-checks under it; ResolveAndRelease copies bytes (caller's slot is the live reference) then drops the refcount; on last release the entry is removed and the DSA body freed outside the partition lock.
  • src/queue/shmem.cc — split shmem sizing into [state + ring + DSA] (passed to ShmemInitStruct) and the HTAB pool (allocated by ShmemInitHash from the same RequestAddinShmemSpace reservation). Request 1 + 32 LWLocks in the existing pg_stat_ch named tranche. Init the interner under AddinShmemInitLock. Replace the per-event PschDsaAllocString / PschDsaResolveString calls for query text with the new Acquire / ResolveAndRelease.
  • src/queue/psch_dsa.h — add typedef struct PschSharedState PschSharedState; so the bare type is usable from pure C.

Failure modes (best-effort telemetry preserved)

  • DSA OOM on a new body → InvalidDsaPointer, caller sets query_len = 0 (numeric data preserved). dsa_oom_count still bumped.
  • Hash full (HASH_ENTER_NULL) → free loser allocation, InvalidDsaPointer, query_len = 0.
  • Insert race won by another backend → free loser, return existing body, refcount++.
  • Hash key collision (different bytes for same (dbid, queryid, query_hash, query_len)) → treat as miss, return InvalidDsaPointer so we export empty rather than wrong SQL.

Test plan

  • t/032_query_intern.pl — drives 6000 EXECUTEs of a long normalized query through an 8MB DSA pool. Asserts enqueued >= 5000 and dsa_oom_count == 0. The same workload without interning would push ~12MB through an 8MB pool and OOM.
  • All 11 SQL regression tests pass.
  • Non-Docker TAP tests pass: 001-009, 015, 017, 020, 022.
  • ClickHouse-dependent TAP tests (010, 011, 021, 023-025, 027, 031, 016 single-cycle) verified failing locally on a clean main worktree with the same deterministic checksum error — pre-existing local container/version issue, not introduced here. CI should be authoritative.
  • 028, 029 reference a pg_stat_ch.debug_throw_in_export GUC that doesn't exist in the tree — pre-existing, unrelated.

🤖 Generated with Claude Code

Without interning, every queued event owned a private DSA copy of the
normalized query text — live DSA usage grew as `queued_events * query_len`
and exhausted the bounded DSA pool well before the queue reached capacity.
Repeated long normalized queries were the worst case.

Add a shared, partition-locked HTAB whose entries point at refcount-managed
DSA bodies, and route TryEnqueueLocked / PschDequeueEvent through it for
query text. Live DSA usage drops to `distinct_live_query_texts * query_len`.
Error messages stay per-event for now (separate optimization).

The pattern mirrors pg_stat_statements (shared HTAB sized via
hash_estimate_size + ShmemInitHash) and pgstat_shmem (refcounted DSA bodies
freed only after the HTAB entry is removed).

Adds t/032_query_intern.pl: 6000 EXECUTEs of a long normalized query
through an 8MB DSA pool exit with dsa_oom_count == 0; the same workload
without interning would push ~12MB through an 8MB pool and OOM.
Copilot AI review requested due to automatic review settings May 7, 2026 19:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a shared-memory query-text interner for the pg_stat_ch queue so repeated identical normalized queries share a single DSA-backed copy, bounding DSA usage to the number of distinct live query texts instead of the number of queued events.

Changes:

  • Add a partition-locked shared HTAB + refcounted DSA objects for interned query text (query_intern.{h,c}).
  • Route queue slot query text through the interner (acquire on enqueue, resolve+release on dequeue) and adjust shmem sizing/lock tranche usage (shmem.cc).
  • Add a TAP test that stresses tight DSA settings with many repeated long EXECUTEs (t/032_query_intern.pl).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
t/032_query_intern.pl New TAP test to validate that repeated long normalized query text does not exhaust a tight DSA pool.
src/queue/shmem.cc Integrates query interning into enqueue/dequeue and adjusts shared memory sizing + LWLock tranche allocation.
src/queue/query_intern.h Declares the shared query-text interner API and documents its design.
src/queue/query_intern.c Implements partition-locked HTAB interning with refcounted DSA-backed query bodies.
src/queue/psch_dsa.h Adds a forward typedef so PschSharedState can be referenced cleanly from C translation units.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/queue/query_intern.h
Comment on lines +37 to +39
// Add the HTAB size requirement to the running total of extension shmem.
// Caller passes the shared memory size accumulator; this function increases
// it to include hash_estimate_size for the interner.
Comment thread src/queue/query_intern.h
Comment on lines +61 to +66
// the DSA handle is unavailable, DSA allocation fails, the shared HTAB is
// full, or a hash collision is detected against a different query text
// (collisions are treated as a miss with no insert — exporting empty query
// text is preferable to exporting the wrong SQL).
//
// Must be called by a backend that has already attached to the DSA area.
Comment thread t/032_query_intern.pl
Comment on lines +11 to +15
# Strategy:
# 1. Configure an unreachable ClickHouse so the bgworker cannot drain the
# queue. Events accumulate.
# 2. Set `string_area_size = 8MB` (the minimum allowed) so the DSA pool is
# tight. 6000 × ~2KB unique copies would be ~12MB.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants