Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 13 additions & 20 deletions .github/workflows/ci-tap.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,28 +73,19 @@ jobs:
cmake --build build --parallel
cmake --install build

- name: Start ClickHouse
- name: Start ClickHouse and OTel collector
run: |
docker rm -f clickhouse-test 2>/dev/null || true
docker run -d --name clickhouse-test \
--network host \
clickhouse/clickhouse-server:26.1
# Wait for ClickHouse to be ready
for i in {1..30}; do
if curl -sf 'http://localhost:8123/' --data 'SELECT 1' >/dev/null 2>&1; then
echo "ClickHouse ready"
exit 0
docker compose -f docker/docker-compose.test.yml up -d --wait
docker compose -f docker/docker-compose.otel.yml up -d
# Poll OTel health endpoint (distroless image has no shell for healthcheck)
for i in $(seq 1 30); do
if curl -sf http://localhost:13133/ >/dev/null 2>&1; then
echo "OTel collector ready"
break
fi
echo "Waiting for ClickHouse... ($i/30)"
echo "Waiting for OTel collector... ($i/30)"
sleep 1
done
Comment on lines +78 to 88
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The readiness polling loop for the OTel collector never fails the step if the collector doesn’t become ready within 30s (it just falls through). Please add a post-loop check (or track a flag) and exit 1 if the health endpoint never responds, so CI fails fast with a clear error when the collector can’t start.

Copilot uses AI. Check for mistakes.
echo "ClickHouse not ready after 30s"
docker logs clickhouse-test
exit 1

- name: Initialize ClickHouse schema
run: |
docker exec clickhouse-test clickhouse-client --multiquery < docker/init/00-schema.sql

- name: Run TAP tests
run: |
Expand All @@ -110,6 +101,8 @@ jobs:
name: tap-test-logs
path: t/tmp_check/

- name: Cleanup ClickHouse
- name: Cleanup
if: always()
run: docker rm -f clickhouse-test 2>/dev/null || true
run: |
docker compose -f docker/docker-compose.test.yml down --volumes 2>/dev/null || true
docker compose -f docker/docker-compose.otel.yml down --volumes 2>/dev/null || true
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ jobs:
-DCMAKE_CXX_COMPILER_LAUNCHER=ccache

- name: Build
run: cmake --build build_unit --target hostname_test --parallel
run: cmake --build build_unit --parallel

- name: Run unit tests
run: ctest --test-dir build_unit --output-on-failure
Expand Down
14 changes: 14 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ target_include_directories(pg_stat_ch SYSTEM PRIVATE
${CMAKE_SOURCE_DIR}/third_party/opentelemetry-cpp/api/include
${CMAKE_SOURCE_DIR}/third_party/opentelemetry-cpp/sdk/include
${CMAKE_SOURCE_DIR}/third_party/opentelemetry-cpp/exporters/otlp/include
${CMAKE_SOURCE_DIR}/third_party/opentelemetry-cpp/third_party/nlohmann-json/single_include
)
target_link_libraries(pg_stat_ch PRIVATE
PostgreSQLServer::PostgreSQLServer
Expand Down Expand Up @@ -149,6 +150,19 @@ if(PSCH_BUILD_UNIT_TESTS)
)
pg_stat_ch_set_warnings(hostname_test)

add_executable(sqlcommenter_parse_test
test/unit/sqlcommenter_parse_test.cc
src/export/sqlcommenter_parse.cc
)
target_include_directories(sqlcommenter_parse_test PRIVATE src include)
target_include_directories(sqlcommenter_parse_test SYSTEM PRIVATE
${CMAKE_SOURCE_DIR}/third_party/opentelemetry-cpp/third_party/nlohmann-json/single_include
)
target_compile_features(sqlcommenter_parse_test PRIVATE cxx_std_17)
target_link_libraries(sqlcommenter_parse_test PRIVATE GTest::gtest_main)
pg_stat_ch_set_warnings(sqlcommenter_parse_test)

include(GoogleTest)
gtest_discover_tests(hostname_test)
gtest_discover_tests(sqlcommenter_parse_test)
endif()
7 changes: 2 additions & 5 deletions docker/docker-compose.otel.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,5 @@ services:
- "4317:4317" # gRPC OTLP receiver (matches psch_otel_endpoint default)
- "9091:9090" # Prometheus metrics exporter (host:9091 → container:9090)
- "13133:13133" # Health check HTTP endpoint
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:13133/"]
interval: 5s
timeout: 5s
retries: 10
# Note: no healthcheck — the contrib image is distroless (no shell/curl/wget).
# Use host-side polling (curl localhost:13133) to verify readiness.
2 changes: 2 additions & 0 deletions docker/init/00-schema.sql
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,8 @@ CREATE TABLE pg_stat_ch.events_raw

query String COMMENT 'Full SQL query text (may be truncated). Used for debugging and query analysis.',

labels String DEFAULT '{}' COMMENT 'Query labels from sqlcommenter comments (key=value pairs in /* */ blocks). Access subpaths directly: labels.controller, labels.action. Empty {} when no labels present. See: https://google.github.io/sqlcommenter/',

-- ========================================================================
-- Shared buffer metrics (main buffer cache)
-- ========================================================================
Expand Down
15 changes: 15 additions & 0 deletions docker/migrations/001_add_labels_column.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
-- Migration 001: Add labels column for sqlcommenter support
--
-- This migration adds the `labels` column to `events_raw` for existing
-- installations. New installations already include this column via
-- docker/init/00-schema.sql.
--
-- Run with:
-- clickhouse-client < docker/migrations/001_add_labels_column.sql
--
-- Safe to re-run: ALTER TABLE ADD COLUMN IF NOT EXISTS is idempotent.

ALTER TABLE pg_stat_ch.events_raw
ADD COLUMN IF NOT EXISTS labels String DEFAULT '{}'
COMMENT 'Query labels from sqlcommenter comments (key=value pairs in /* */ blocks). Access subpaths directly: labels.controller, labels.action. Empty {} when no labels present. See: https://google.github.io/sqlcommenter/'
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This migration adds labels as a String, but the column COMMENT says you can access subpaths directly via labels.controller/labels.action. That dot-subpath syntax requires the ClickHouse JSON type; for a String column the documented access pattern should use JSONExtract* functions, or the column type should be switched to JSON to match the comment and docs.

Suggested change
COMMENT 'Query labels from sqlcommenter comments (key=value pairs in /* */ blocks). Access subpaths directly: labels.controller, labels.action. Empty {} when no labels present. See: https://google.github.io/sqlcommenter/'
COMMENT 'Query labels from sqlcommenter comments (key=value pairs in /* */ blocks). Extract fields with JSONExtractString(labels, ''controller'') and JSONExtractString(labels, ''action''). Empty {} when no labels present. See: https://google.github.io/sqlcommenter/'

Copilot uses AI. Check for mistakes.
AFTER query;
26 changes: 26 additions & 0 deletions docs/guides/clickhouse.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,32 @@ Four materialized views provide pre-aggregated analytics:

For view schemas, query patterns, and the `-State`/`-Merge` aggregation pattern, see [materialized views](/reference/materialized-views).

## Schema migrations

When upgrading pg_stat_ch, new columns or schema changes may be required. Migration scripts are provided in [`docker/migrations/`](https://github.com/ClickHouse/pg_stat_ch/tree/main/docker/migrations) and are safe to re-run (idempotent).

Apply all pending migrations:

```bash
for f in docker/migrations/*.sql; do
clickhouse-client < "$f"
done
```

Or apply a specific migration:

```bash
clickhouse-client < docker/migrations/001_add_labels_column.sql
```

| Migration | Version | Description |
|---|---|---|
| `001_add_labels_column.sql` | 0.2+ | Adds `labels JSON` column for [sqlcommenter](https://google.github.io/sqlcommenter/) query label support |
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration table says 001_add_labels_column.sql adds a labels JSON column, but the actual schema/migration defines labels as String DEFAULT '{}'. Please align the docs with the actual ClickHouse type (or update the schema if JSON is the real intent).

Suggested change
| `001_add_labels_column.sql` | 0.2+ | Adds `labels JSON` column for [sqlcommenter](https://google.github.io/sqlcommenter/) query label support |
| `001_add_labels_column.sql` | 0.2+ | Adds `labels String DEFAULT '{}'` column for [sqlcommenter](https://google.github.io/sqlcommenter/) query label support |

Copilot uses AI. Check for mistakes.

<Note>
New installations using `docker/init/00-schema.sql` already include all schema changes. Migrations are only needed for existing ClickHouse instances.
</Note>

## Data retention

The `events_raw` table has no TTL by default. To limit storage, add a TTL:
Expand Down
10 changes: 10 additions & 0 deletions docs/reference/events-schema.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,16 @@ The table is partitioned by date (`toDate(ts_start)`) and ordered by `ts_start`
Query normalization replaces literals with placeholders (`$N`). This means `SELECT * FROM users WHERE id = 42` becomes `SELECT * FROM users WHERE id = $1`. No passwords, tokens, or PII are exported in query text.
</Note>

## Query labels

| Column | Type | Description |
|---|---|---|
| `labels` | `String DEFAULT '{}'` | Key-value labels extracted from [sqlcommenter](https://google.github.io/sqlcommenter/) comments appended to the query. For example, `/* controller='users',action='show' */` produces `{"controller":"users","action":"show"}`. Access subpaths directly in ClickHouse: `labels.controller`, `labels.action`. Empty `{}` when no labels are present. |
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs claim you can access JSON subpaths as labels.controller / labels.action, but the schema defines labels as String (see docker/init/00-schema.sql and migration). With a String column, ClickHouse won't support dot-subpath access; you need functions like JSONExtractString(labels, 'controller') (as used in TAP tests), or change the column type to ClickHouse JSON to match the documentation.

Suggested change
| `labels` | `String DEFAULT '{}'` | Key-value labels extracted from [sqlcommenter](https://google.github.io/sqlcommenter/) comments appended to the query. For example, `/* controller='users',action='show' */` produces `{"controller":"users","action":"show"}`. Access subpaths directly in ClickHouse: `labels.controller`, `labels.action`. Empty `{}` when no labels are present. |
| `labels` | `String DEFAULT '{}'` | Key-value labels extracted from [sqlcommenter](https://google.github.io/sqlcommenter/) comments appended to the query. For example, `/* controller='users',action='show' */` produces `{"controller":"users","action":"show"}`. Since this column stores JSON as a string, extract values in ClickHouse with functions such as `JSONExtractString(labels, 'controller')` and `JSONExtractString(labels, 'action')`. Empty `{}` when no labels are present. |

Copilot uses AI. Check for mistakes.

<Note>
Labels are parsed from the **last** `/* */` comment block in the query text. The parser supports URL-encoded values and escaped single quotes per the sqlcommenter specification. Controlled by the [`pg_stat_ch.track_labels`](/reference/configuration) GUC (default: `true`).
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section introduces pg_stat_ch.track_labels and links to /reference/configuration, but that page currently doesn’t mention track_labels (so readers can’t discover details like context/reload semantics). Please add pg_stat_ch.track_labels to the configuration reference and consider adding a short warning that label values are user-supplied and may contain sensitive data if applications put PII/tokens in sqlcommenter comments.

Suggested change
Labels are parsed from the **last** `/* */` comment block in the query text. The parser supports URL-encoded values and escaped single quotes per the sqlcommenter specification. Controlled by the [`pg_stat_ch.track_labels`](/reference/configuration) GUC (default: `true`).
Labels are parsed from the **last** `/* */` comment block in the query text. The parser supports URL-encoded values and escaped single quotes per the sqlcommenter specification. Collection is controlled by the [`pg_stat_ch.track_labels`](/reference/configuration) GUC, which defaults to `true` and can be changed with a configuration reload (no server restart required).
Label values are user-supplied application metadata. If applications include PII, tokens, session identifiers, or other secrets in sqlcommenter comments, those values may be exported in `labels`. Only enable and use labels with trusted, non-sensitive metadata.

Copilot uses AI. Check for mistakes.
</Note>
Comment on lines +35 to +41
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The labels column is documented as String, but the description claims you can access subpaths via labels.controller / labels.action. That dot-notation requires a ClickHouse JSON-typed column; with String you need functions like JSONExtractString(labels, 'controller'). Please either change the column type to JSON(...) (and keep docs as-is) or update the docs/examples to match a String column.

Copilot uses AI. Check for mistakes.

## Shared buffer usage

These columns show how queries interact with PostgreSQL's `shared_buffers` cache. The cache hit ratio for a query is `shared_blks_hit / (shared_blks_hit + shared_blks_read)`. Target above 99% for OLTP workloads.
Expand Down
1 change: 1 addition & 0 deletions include/config/guc.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ extern int psch_otel_log_batch_size;
extern int psch_otel_log_max_bytes;
extern int psch_otel_log_delay_ms;
extern int psch_otel_metric_interval_ms;
extern bool psch_track_labels;
extern bool psch_debug_force_locked_overflow;

// Initialize GUC variables
Expand Down
12 changes: 12 additions & 0 deletions src/config/guc.cc
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ int psch_otel_log_batch_size = 8192;
int psch_otel_log_max_bytes = 3 * 1024 * 1024; // 3 MiB: gRPC default max is 4 MiB
int psch_otel_log_delay_ms = 100;
int psch_otel_metric_interval_ms = 5000;
bool psch_track_labels = true;
bool psch_debug_force_locked_overflow = false;

// Log level options (matches PostgreSQL's server_message_level_options pattern)
Expand Down Expand Up @@ -307,6 +308,17 @@ void PschInitGuc(void) {
PGC_SUSET,
0,
nullptr, nullptr, nullptr);
DefineCustomBoolVariable(
"pg_stat_ch.track_labels",
"Extract sqlcommenter labels from query comments.",
"When enabled, the background worker parses /* key='value' */ comments "
"and exports structured labels to ClickHouse (JSON) or OTel (attributes).",
&psch_track_labels,
true,
PGC_SIGHUP,
0,
nullptr, nullptr, nullptr);

DefineCustomBoolVariable(
"pg_stat_ch.debug_force_locked_overflow",
"Force HandleOverflow in locked path (debug/test only).",
Expand Down
6 changes: 6 additions & 0 deletions src/export/clickhouse_exporter.cc
Original file line number Diff line number Diff line change
Expand Up @@ -75,10 +75,15 @@ class ClickHouseExporter : public StatsExporter {
shared_ptr<Column<string>> DbOperationColumn() final { return TagString("cmd_type"); }
shared_ptr<Column<string_view>> DbQueryTextColumn() final { return RecordString("query"); }

void AppendLabels(const ParseResult& labels) final {
labels_col_->Append(SerializeLabelsJson(labels));
}

void BeginBatch() final {
block = std::make_unique<clickhouse::Block>();
columns.clear();
exported_count = 0;
labels_col_ = Wrap<clickhouse::ColumnString, string_view>("labels");
Comment on lines +79 to +86
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClickHouseExporter::BeginBatch() unconditionally adds a "labels" column to every inserted block. On upgrades where the ClickHouse schema hasn’t yet been migrated to include labels, inserts will fail continuously — even when pg_stat_ch.track_labels = off — because the block still contains an unknown column. Consider only creating/appending the labels column when label tracking is enabled and the destination schema supports it (or otherwise detect the missing column and emit a clear actionable error that points to the migration script, while allowing operation with labels disabled).

Suggested change
labels_col_->Append(SerializeLabelsJson(labels));
}
void BeginBatch() final {
block = std::make_unique<clickhouse::Block>();
columns.clear();
exported_count = 0;
labels_col_ = Wrap<clickhouse::ColumnString, string_view>("labels");
if (labels_col_) {
labels_col_->Append(SerializeLabelsJson(labels));
}
}
void BeginBatch() final {
block = std::make_unique<clickhouse::Block>();
columns.clear();
exported_count = 0;
labels_col_.reset();
if (pg_stat_ch_track_labels) {
labels_col_ = Wrap<clickhouse::ColumnString, string_view>("labels");
}

Copilot uses AI. Check for mistakes.
}
void BeginRow() final { ++exported_count; }
bool CommitBatch() final;
Expand Down Expand Up @@ -116,6 +121,7 @@ class ClickHouseExporter : public StatsExporter {
std::unique_ptr<clickhouse::Client> client;
std::unique_ptr<clickhouse::Block> block;
std::vector<shared_ptr<BasicColumn>> columns;
shared_ptr<Column<string_view>> labels_col_;
int consecutive_failures = 0;
int exported_count = 0;
};
Expand Down
6 changes: 6 additions & 0 deletions src/export/exporter_interface.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
#include <string>
#include <string_view>

#include "export/sqlcommenter_parse.h"

class StatsExporter {
protected:
using string = std::string;
Expand Down Expand Up @@ -65,6 +67,10 @@ class StatsExporter {
virtual shared_ptr<Column<string>> DbOperationColumn() = 0;
// Query text. CH: RecordString "query"; OTel semconv: "db.query.text".
virtual shared_ptr<Column<string_view>> DbQueryTextColumn() = 0;
// Query labels from sqlcommenter comments. Called inside the event loop.
// CH: serializes to JSON, appends to a String "labels" column;
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says ClickHouse appends labels to a String labels column, but the schema change in this PR defines labels as a ClickHouse JSON column. Updating this description to match the actual column type will avoid confusion for future maintainers/users.

Suggested change
// CH: serializes to JSON, appends to a String "labels" column;
// CH: serializes to JSON, appends to a JSON "labels" column;

Copilot uses AI. Check for mistakes.
// OTel: sets per-label attributes prefixed with "db.query.label.".
virtual void AppendLabels(const ParseResult& labels) = 0;

virtual void BeginBatch() = 0;
virtual void BeginRow() = 0;
Expand Down
9 changes: 9 additions & 0 deletions src/export/otel_exporter.cc
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,15 @@ class OTelExporter : public StatsExporter {
shared_ptr<Column<string_view>> DbQueryTextColumn() final {
return Wrap<RecordOnlyColumn<string_view>>("db.query.text");
}
void AppendLabels(const ParseResult& labels) final {
for (int i = 0; i < labels.count; ++i) {
string attr_name = "db.query.label.";
attr_name.append(labels.labels[i].key.data(), labels.labels[i].key.size());
string val(labels.labels[i].value);
current_log_record->SetAttribute(attr_name, val);
current_row_tags[attr_name] = std::move(val);
}
Comment on lines +135 to +139
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AppendLabels() currently inserts parsed sqlcommenter labels into current_row_tags (line 138), which means they become dimensions for all emitted metrics (HistogramColumn::Crunch records metrics with current_row_tags). Since these labels are user-controlled and high-cardinality by nature, this can explode metric cardinality and dramatically increase OTel backend cost/ingestion load. Consider keeping labels as log-record attributes only (do not add them to current_row_tags), or gate them behind a separate config specifically for metrics tagging.

Also, the label key is appended directly into the attribute name; after URL-decoding it may contain spaces/control characters (tests include "my key"), which can violate common OTel attribute-key expectations and may break downstream collectors/backends. Please sanitize/normalize keys (e.g., allowlist [A-Za-z0-9_.-] and replace others / skip invalid keys) before constructing the attribute name.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoshDreamland you've been dealing with otel perf here, does this copilot perf advice seem applicable here?

}

bool EstablishNewConnection() final;
bool IsConnected() const final { return metrics_provider && log_provider; }
Expand Down
Loading
Loading