Skip to content

feat(analytics): attach opaque server-url hash to every event#113

Merged
ar2rsawseen merged 2 commits intomainfrom
feat/analytics-server-hash
Apr 23, 2026
Merged

feat(analytics): attach opaque server-url hash to every event#113
ar2rsawseen merged 2 commits intomainfrom
feat/analytics-server-hash

Conversation

@ar2rsawseen
Copy link
Copy Markdown
Member

Why

We want to answer "how many distinct Countly servers use the MCP" (and the same question sliced per tool, per transport, per auth method, etc.), while respecting the privacy commitment the README makes. Raw URLs and domains should never leave the process.

What this PR does

Adds a short opaque server segment to every analytics event — a 16-hex-char SHA-256 prefix of the normalized Countly server URL. That gives Countly the aggregation signal without ever transmitting an identifiable URL.

Design

  • Field name: server
  • Hash length: 16 hex chars (64 bits) — enough entropy to distinguish billions of servers with negligible collision impact on distinct-count aggregation; keeps per-event payload small.
  • Device ID: stays "mcp" (explicit choice — the hash lives in event segmentation, not device identity). This was discussed and agreed: keep the device-level anonymity and derive distinct-server counts from the server segment breakdown.
  • Normalization: scheme stripped, lowercased, trailing slashes trimmed. So https://Example.com/, http://example.com, and HTTPS://EXAMPLE.com all hash identically.
  • Per-request resolution: analytics.init() accepts a getServerUrl() callback invoked lazily at event-track time. In HTTP transport the callback reads the request-scoped URL from AsyncLocalStorage (same mechanism used for per-tenant auth isolation); in stdio mode it reads the static env-derived config. Multi-tenant deployments naturally emit per-tenant counts.

Code changes

  • src/lib/analytics.ts:
    • New exported normalizeServerUrlForHash() and computeServerHash()
    • Analytics.init(enabled, getServerUrl?) — second arg is the new lazy resolver
    • New private withServerSegment() merges the hash onto the segmentation of every trackEvent/trackTimedEvent call
    • All specialized helpers (trackToolExecution, trackToolCategory, trackAuthMethod, trackApiEndpoint, trackHttpRequest, trackError) delegate to those two, so the segment flows through automatically
  • src/index.ts:
    • Wires the resolver: () => requestContext.getStore()?.serverUrl || this.config?.serverUrl
    • Defensive against this.config being undefined at analytics.init() time (it's populated later in the constructor, resolver is called lazily)
  • README.md: Analytics Tracking section updated — explicitly lists the server hash as tracked (and explains why it's coarse, so no one reads a secrecy guarantee into it), and explicitly calls out raw URLs / domains as NOT tracked
  • CHANGELOG.md: new "Changed" entry under [1.3.0] (nothing published under 1.3.0 yet so no new version bump needed)

Privacy characterization (for the README)

The hash is coarse (64 bits) and server URLs are low-entropy — cloud patterns (*.count.ly) are dictionary-bruteforceable by anyone, Countly most of all. This is intended for aggregation, not secrecy. What it does buy:

  • No plaintext URL/domain in transit
  • No plaintext URL/domain in stored analytics data (so a breach of stats.count.ly doesn't leak every customer's on-prem URL)
  • Stable per-deployment aggregation without requiring file persistence or a random install ID

For strict privacy (e.g. on-prem URLs operators don't want even Countly to see), the server remains opt-out: analytics are disabled by default, ENABLE_ANALYTICS=true is required.

Tests

+17 tests in tests/analytics.test.ts:

  • normalizeServerUrlForHash: scheme / case / trailing-slash equivalence, empty input
  • computeServerHash: same inputs → same hash, different inputs → different hash, 16-hex format, undefined/empty → undefined
  • Segment injection: trackEvent, trackTimedEvent, all specialized helpers propagate server segment
  • Resolver semantics: omitted when no resolver, omitted when resolver returns undefined, re-evaluated per event (HTTP multi-tenant)
  • device_id assertion: stays "mcp", hash is on events not device identity

Total: 356 tests passing (339 existing + 17 new). Lint clean.

Test plan

  • With ENABLE_ANALYTICS=true and a real Countly server URL, run a couple of tools via stdio; confirm events arrive at stats.count.ly with a server segment matching the expected SHA-256 prefix
  • With HTTP transport and two clients using different X-Countly-Server-Url headers, confirm the server segments differ (per-tenant) within the same process
  • Confirm the README's accuracy about what's tracked before merging

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings April 23, 2026 09:36
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opaque, truncated SHA-256–based server hash segment to analytics events so telemetry can be aggregated by distinct Countly server without transmitting raw URLs/domains.

Changes:

  • Introduces URL normalization + 16-hex server hash computation and injects the server segment into all analytics event tracking.
  • Wires a lazy per-event server URL resolver (AsyncLocalStorage-aware for HTTP, config fallback for stdio).
  • Updates docs (README, CHANGELOG) and adds test coverage for hashing + segment injection behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/lib/analytics.ts Adds server URL normalization/hash utilities and injects the server segment into event segmentation.
src/index.ts Passes a per-event resolver for serverUrl (request-scoped when available).
tests/analytics.test.ts Adds unit tests for normalization, hashing, resolver semantics, and segment propagation.
README.md Documents the new server hash telemetry and clarifies what is/isn’t tracked.
CHANGELOG.md Notes the telemetry change under 1.3.0.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/lib/analytics.ts
Comment thread src/lib/analytics.ts Outdated
Comment thread src/index.ts
Adds a short SHA-256 hash of the Countly server URL as the `server`
segment on every analytics event, so stats.count.ly can answer "how
many distinct servers use MCP" — and the same question per tool, per
auth method, per error type, etc. — without ever receiving a raw URL
or domain.

How it works:

- New `computeServerHash(url)` (exported) and
  `normalizeServerUrlForHash(url)` in src/lib/analytics.ts. Normalize
  by stripping scheme, lowercasing, and trimming trailing slashes so
  e.g. https://Example.com/ and http://example.com hash identically.
  SHA-256, first 16 hex chars (64 bits of entropy) — enough to
  distinguish billions of servers with negligible aggregation
  collision risk, keeps event payload small.

- `analytics.init()` now accepts an optional
  `getServerUrl: () => string | undefined` resolver, called lazily on
  every event-track. `trackEvent` / `trackTimedEvent` merge the
  resolved hash into segmentation under key `server`. All specialized
  helpers (`trackToolExecution`, `trackToolCategory`, `trackAuthMethod`,
  `trackApiEndpoint`, `trackHttpRequest`, `trackError`, etc.) delegate
  to those, so the segment flows through automatically.

- `trackView` and `trackUserProperty` are unchanged — they aren't
  event-shaped in Countly.

- `index.ts` wires the resolver: `() => requestContext.getStore()?.serverUrl
  || this.config?.serverUrl`. That makes the hash track the
  per-request URL in HTTP multi-tenant mode (via the AsyncLocalStorage
  set by the HTTP middleware) while still falling back to the static
  env-derived URL in stdio mode. `this.config` is populated after
  `analytics.init` so the resolver is written defensively against
  `this.config === undefined`; in practice it's only called at event-
  track time by which point config is set.

- `device_id` stays `"mcp"` (explicit choice). Distinct-server counts
  come from `server` segmentation breakdown, not from Countly's
  built-in "users" metric.

Privacy note included in the README "Analytics Tracking" section:
the hash is coarse (64 bits) and server URLs are low-entropy, so it
is intended for aggregation and NOT as a secret. Raw URLs and domains
are still never transmitted.

Coverage: +17 tests in tests/analytics.test.ts covering normalization
(scheme/case/trailing-slash equivalence), hash length, resolver
behavior (undefined/empty/varying URLs), injection into all specialized
helpers, omission when no resolver is set, per-event re-evaluation
(HTTP multi-tenant), and the explicit "device_id stays mcp" assertion.
Total 356 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ar2rsawseen ar2rsawseen force-pushed the feat/analytics-server-hash branch from c4d5255 to 4e20449 Compare April 23, 2026 13:30
Three review items, all valid, all fixed:

1. normalizeServerUrlForHash claimed to strip default ports but didn't.
   Was a simple scheme-strip + lowercase + slash-trim over the raw string.
   Semantically-equivalent URLs like `https://example.com` and
   `https://example.com:443` would hash differently and split the
   distinct-server aggregation. Now uses `new URL()` to parse,
   then explicitly strips :80 on http and :443 on https.

2. Lowercasing the full URL was wrong for paths. RFC 3986 says only
   the host is case-insensitive; paths, queries, and fragments are
   case-sensitive. The old code would merge e.g. `/api` and `/API`
   into the same hash. Now only `parsed.hostname` is lowercased; path
   / search / hash case is preserved.

3. `server_started` event fired from inside `analytics.init()` before
   `this.config` was populated, so its resolver call returned
   `undefined` and the very first event shipped without the `server`
   segment — contradicting the "every event" goal. Resolver now
   falls back to `process.env.COUNTLY_SERVER_URL` for the pre-config
   window. Priority order is now:
     1. HTTP per-request URL (AsyncLocalStorage)
     2. this.config.serverUrl (after constructor finishes)
     3. process.env.COUNTLY_SERVER_URL (pre-config fallback)

The URL parse has a safe fallback — if `new URL()` throws (bare
hostname without scheme, weird input), we prepend `https://` and try
once more; ultimate fallback is the old-style regex strip preserving
path case. So we still produce a stable hash for non-URL-ish input
instead of dropping the `server` segment.

Coverage: +5 tests in tests/analytics.test.ts covering the three
semantic fixes:
  - default port stripping (:80, :443) on both hash and normalize
  - non-default ports preserved
  - path case preserved (`/api` vs `/API` differ)
  - bare hostnames without scheme accepted
  - server_started event carries the `server` segment

Total 361 tests pass (up from 356).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ar2rsawseen ar2rsawseen merged commit ee4b7a3 into main Apr 23, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants