[INC-669] Add tail sampling implemented in OTEL SDK by witoszekdev · Pull Request #2164 · saleor/apps

witoszekdev · 2025-12-11T18:12:59Z

Add tests to OTEL package
Add deffered sampler
Add tail sampling processor
Add fetch instrumentation
Add tail sampling to search app

Scope of the PR

Related issues

Checklist

I added changesets and read good practices.

changeset-bot · 2025-12-11T18:13:03Z

⚠️ No Changeset found

Latest commit: 007b6e0

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

vercel · 2025-12-11T18:13:04Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
saleor-app-avatax	Ready	Preview	Comment	Dec 12, 2025 5:45pm
saleor-app-cms	Ready	Preview	Comment	Dec 12, 2025 5:45pm
saleor-app-klaviyo	Ready	Preview	Comment	Dec 12, 2025 5:45pm
saleor-app-payment-np-atobarai	Ready	Preview	Comment	Dec 12, 2025 5:45pm
saleor-app-payment-stripe	Ready	Preview	Comment	Dec 12, 2025 5:45pm
saleor-app-products-feed	Ready	Preview	Comment	Dec 12, 2025 5:45pm
saleor-app-search	Ready	Preview	Comment	Dec 12, 2025 5:45pm
saleor-app-segment	Ready	Preview	Comment	Dec 12, 2025 5:45pm
saleor-app-smtp	Ready	Preview	Comment	Dec 12, 2025 5:45pm

codecov · 2025-12-11T18:16:31Z

Codecov Report

❌ Patch coverage is 0% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 36.07%. Comparing base (b556fdb) to head (519ebe4).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
apps/search/src/instrumentations/otel-node.ts	0.00%	14 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2164      +/-   ##
==========================================
- Coverage   36.07%   36.07%   -0.01%     
==========================================
  Files         938      938              
  Lines       60362    60370       +8     
  Branches     2968     2968              
==========================================
  Hits        21778    21778              
- Misses      38224    38232       +8     
  Partials      360      360

Flag	Coverage Δ
avatax	`56.98% <ø> (ø)`
cms	`15.76% <ø> (ø)`
domain	`100.00% <ø> (ø)`
dynamo-config-repository	`79.29% <ø> (ø)`
errors	`91.66% <ø> (ø)`
logger	`28.81% <ø> (ø)`
np-atobarai	`71.99% <ø> (ø)`
products-feed	`5.23% <ø> (ø)`
search	`27.67% <0.00%> (-0.08%)`	⬇️
segment	`30.20% <ø> (ø)`
shared	`35.88% <ø> (ø)`
smtp	`34.96% <ø> (ø)`
stripe	`70.58% <ø> (ø)`
webhook-utils	`11.02% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2025-12-11T18:17:07Z

Differences Found

✅ No packages or licenses were added.

Summary

Expand

License Name	Package Count	Packages
0BSD	1	Packages tslib
CC BY-SA 4.0	1	Packages @cspell/dict-en-common-misspellings
CC-BY-3.0	1	Packages spdx-exceptions
MIT (http://mootools.net/license.txt)	1	Packages slick
MIT/X11	1	Packages nub
Public Domain	1	Packages jsonify
Python-2.0	1	Packages argparse
SEE LICENSE IN LICENSE	1	Packages spawndamnit
SEE LICENSE IN LICENSE.md	1	Packages lightcookie
Unlicense	1	Packages @sinonjs/text-encoding
WTFPL	1	Packages opener
CC0-1.0	2	Packages spdx-license-ids type-fest
BlueOak-1.0.0	3	Packages jackspeak package-json-from-dist path-scurry
CC-BY-4.0	3	Packages @saleor/macaw-ui caniuse-lite saleor-apps
LGPL-3.0-or-later	11	Packages @img/sharp-libvips-darwin-arm64 @img/sharp-libvips-darwin-x64 @img/sharp-libvips-linux-arm @img/sharp-libvips-linux-arm64 @img/sharp-libvips-linux-s390x @img/sharp-libvips-linux-x64 @img/sharp-libvips-linuxmusl-arm64 @img/sharp-libvips-linuxmusl-x64 @img/sharp-wasm32 @img/sharp-win32-ia32 @img/sharp-win32-x64
BSD-2-Clause	23	Packages cheerio-select css-select css-what domelementtype domhandler domutils dotenv entities escodegen eslint-scope espree esprima esrecurse estraverse esutils glob-to-regexp normalize-package-data nth-check shimmer terser And 3 more...
<<missing>>	25	Packages @saleor/apps-domain @saleor/apps-logger @saleor/apps-otel @saleor/apps-shared @saleor/apps-trpc @saleor/apps-ui @saleor/dynamo-config-repository @saleor/errors @saleor/eslint-config-apps @saleor/react-hook-form-macaw @saleor/sentry-utils @saleor/typescript-config-apps @saleor/webhook-utils busboy json-query saleor-app-avatax saleor-app-cms saleor-app-klaviyo saleor-app-payment-np-atobarai saleor-app-payment-stripe And 5 more...
BSD-3-Clause	48	Packages @protobufjs/aspromise @protobufjs/base64 @protobufjs/codegen @protobufjs/eventemitter @protobufjs/fetch @protobufjs/float @protobufjs/inquire @protobufjs/path @protobufjs/pool @protobufjs/utf8 @saleor/app-sdk @saleor/eslint-plugin-saleor-app @sentry/cli @sentry/cli-darwin @sentry/cli-linux-arm @sentry/cli-linux-arm64 @sentry/cli-linux-i686 @sentry/cli-linux-x64 @sentry/cli-win32-i686 @sentry/cli-win32-x64 And 28 more...
ISC	56	Packages @bundled-es-modules/cookie @bundled-es-modules/statuses @bundled-es-modules/tough-cookie @isaacs/cliui abbrev anymatch boolbase cli-width cliui concat-with-sourcemaps electron-to-chromium fastq flatted foreground-child form-data-lite fs.realpath get-caller-file glob glob-parent graceful-fs And 36 more...
Apache-2.0	235	Packages @ampproject/remapping @aws-crypto/crc32 @aws-crypto/crc32c @aws-crypto/ie11-detection @aws-crypto/sha1-browser @aws-crypto/sha256-browser @aws-crypto/sha256-js @aws-crypto/supports-web-crypto @aws-crypto/util @aws-sdk/abort-controller @aws-sdk/chunked-blob-reader @aws-sdk/client-dynamodb @aws-sdk/client-s3 @aws-sdk/client-sso @aws-sdk/client-sso-oidc @aws-sdk/client-sts @aws-sdk/config-resolver @aws-sdk/core @aws-sdk/credential-provider-env @aws-sdk/credential-provider-http And 215 more...
MIT	1436	Packages @0no-co/graphql.web @adobe/css-tools @algolia/cache-browser-local-storage @algolia/cache-common @algolia/cache-in-memory @algolia/client-account @algolia/client-analytics @algolia/client-common @algolia/client-personalization @algolia/client-search @algolia/logger-common @algolia/logger-console @algolia/recommend @algolia/requester-browser-xhr @algolia/requester-common @algolia/requester-node-http @algolia/transporter @apidevtools/json-schema-ref-parser @ardatan/relay-compiler @ardatan/sync-fetch And 1416 more...

Copilot

Pull request overview

This PR implements SDK-level tail sampling for OpenTelemetry in the Saleor Apps monorepo, enabling applications to intelligently sample traces based on runtime characteristics (errors and latency) rather than upfront decisions. The implementation introduces a DeferredSampler that defers sampling decisions to span end, a TailSamplingProcessor that promotes spans meeting specific criteria (errors or slow responses), and custom fetch instrumentation to support recording non-sampled spans. The feature is integrated into the search app as a proof of concept.

Key Changes:

Added DeferredSampler and TailSamplingProcessor classes with comprehensive test coverage to enable tail sampling decisions based on span characteristics
Implemented TailSamplingFetchInstrumentation to support recording spans even when not initially sampled, allowing error information to be captured
Updated createHttpInstrumentation to support root span creation when using deferred sampling
Integrated tail sampling into the search app's OTEL configuration

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
`packages/otel/src/tail-sampling-processor.ts`	Core implementation of tail sampling processor that promotes non-sampled spans to sampled based on error status or latency thresholds
`packages/otel/src/tail-sampling-processor.test.ts`	Comprehensive unit tests covering error detection, slow span detection, configuration options, and resource attribute handling
`packages/otel/src/deferred-sampler.ts`	Sampler implementation that defers final sampling decisions to span end while respecting parent sampling decisions
`packages/otel/src/deferred-sampler.test.ts`	Unit tests for deferred sampler covering all sampling decision paths
`packages/otel/src/fetch-instrumentation.ts`	Custom fetch instrumentation extending @vercel/otel to continue recording non-sampled spans for tail sampling support
`packages/otel/src/http-instrumentation-factory.ts`	Updated to conditionally allow root span creation when using deferred sampler
`apps/search/src/instrumentations/otel-node.ts`	Integrated tail sampling into search app by configuring DeferredSampler and TailSamplingProcessor
`packages/otel/package.json`	Added test scripts and dependencies (@vercel/otel, vitest, coverage tools)
`pnpm-lock.yaml`	Updated lock file with new dependencies and removed unused Babel packages
`cspell.config.js`	Added "traceparent" to spell check dictionary

Files not reviewed (1)

pnpm-lock.yaml: Language not supported

packages/otel/src/tail-sampling-processor.test.ts

packages/otel/src/http-instrumentation-factory.ts

Copilot · 2025-12-11T18:19:21Z

packages/otel/src/tail-sampling-processor.ts

+export interface TailSamplingProcessorConfig {
+  /**
+   * The downstream processor to send sampled spans to.
+   * Typically a BatchSpanProcessor.
+   */
+  processor: SpanProcessor;
+
+  /**
+   * Latency threshold in milliseconds. Spans slower than this are always exported.
+   * @default 5000 (5 seconds)
+   */
+  slowThresholdMs?: number;
+
+  /**
+   * Whether to always export error spans.
+   * @default true
+   */
+  exportErrors?: boolean;
+
+  /**
+   * Whether to always export slow spans.
+   * @default true
+   */
+  exportSlowSpans?: boolean;
+}


The TailSamplingProcessorConfig interface lacks documentation explaining the relationship between the configuration options. It would be helpful to document that when both exportErrors and exportSlowSpans are false, no spans will be promoted (effectively disabling tail sampling). Also document that this processor should be used with DeferredSampler for proper operation.

Copilot · 2025-12-11T18:19:21Z

packages/otel/package.json

+    "test": "vitest",
+    "test:ci": "vitest run --coverage"


The package.json defines test scripts but there's no vitest.config.ts file in the packages/otel directory. Based on other apps in the repository (like stripe and np-atobarai), a vitest configuration file is needed to properly set up the test environment, specify test file patterns, and configure test setup files.

Copilot · 2025-12-11T18:19:22Z

apps/search/src/instrumentations/otel-node.ts

-  ],
-  instrumentations: [createHttpInstrumentation()],
+  spanProcessors: [tailSamplingProcessor],
+  instrumentations: [createHttpInstrumentation({ usingDeferredSpanProcessor: true })],


The parameter name "usingDeferredSpanProcessor" is misleading. It refers to using a DeferredSampler, not a DeferredSpanProcessor. Consider using "usingDeferredSampler" or "allowRootSpans" for clarity.

Suggested change

instrumentations: [createHttpInstrumentation({ usingDeferredSpanProcessor: true })],

instrumentations: [createHttpInstrumentation({ usingDeferredSampler: true })],

Copilot · 2025-12-11T18:19:22Z

packages/otel/src/fetch-instrumentation.ts

+        ? resolveTemplate(resourceNameTemplate, attrs)
+        : removeSearch(req.url);
+
+      const spanName = init?.opentelemetry?.spanName ?? `fetch ${req.method} ${req.url}`;


The default span name includes the full URL which could lead to very long span names and potential performance issues or data exposure. The URL might contain sensitive query parameters or be extremely long. Consider using just the method and hostname or path without query parameters for the default span name, similar to how resourceName uses removeSearch().

Suggested change

const spanName = init?.opentelemetry?.spanName ?? `fetch ${req.method} ${req.url}`;

const spanName = init?.opentelemetry?.spanName ?? `fetch ${req.method} ${removeSearch(req.url)}`;

Copilot · 2025-12-11T18:19:22Z

packages/otel/src/fetch-instrumentation.ts

+
+      const attrs = {
+        [SEMATTRS_HTTP_METHOD]: req.method,
+        [SEMATTRS_HTTP_URL]: req.url,


The full URL including query parameters is stored in the SEMATTRS_HTTP_URL attribute. This could expose sensitive information like API keys, tokens, or personal data that might be in query parameters. Consider sanitizing the URL to remove query parameters or use a more privacy-conscious approach for the http.url attribute.

Suggested change

[SEMATTRS_HTTP_URL]: req.url,

[SEMATTRS_HTTP_URL]: removeSearch(req.url),

Copilot · 2025-12-11T18:19:23Z

packages/otel/src/fetch-instrumentation.ts

+export class TailSamplingFetchInstrumentation extends FetchInstrumentation {
+  override instrumentationName = "@saleor/otel/fetch";
+
+  private tailSamplingConfig: FetchInstrumentationConfig;
+  private tailSamplingOriginalFetch: typeof fetch | undefined;
+  private tailSamplingTracerProvider: TracerProvider | undefined;
+
+  constructor(config: FetchInstrumentationConfig = {}) {
+    super(config);
+    this.tailSamplingConfig = config;
+  }
+
+  override setTracerProvider(tracerProvider: TracerProvider): void {
+    super.setTracerProvider(tracerProvider);
+    this.tailSamplingTracerProvider = tracerProvider;
+  }
+
+  /**
+   * Override enable() to remove the `!isSampled()` check.
+   *
+   * This is a copy of @vercel/otel's FetchInstrumentation.enable() with one key change:
+   * The original check `if (!span.isRecording() || !isSampled(span.spanContext().traceFlags))`
+   * is changed to just `if (!span.isRecording())` to support tail sampling.
+   */
+  override enable(): void {
+    this.disable();
+
+    const tracerProvider = this.tailSamplingTracerProvider;
+
+    if (!tracerProvider) {
+      return;
+    }
+
+    const tracer = tracerProvider.getTracer(this.instrumentationName, this.instrumentationVersion);
+
+    const config = this.tailSamplingConfig;
+    const ignoreUrls = config.ignoreUrls ?? [];
+
+    const shouldIgnore = (url: URL, init: InternalRequestInit | undefined): boolean => {
+      if (init?.opentelemetry?.ignore !== undefined) {
+        return init.opentelemetry.ignore;
+      }
+
+      if (ignoreUrls.length === 0) {
+        return false;
+      }
+
+      const urlString = url.toString();
+
+      return ignoreUrls.some((match) => {
+        if (typeof match === "string") {
+          if (match === "*") {
+            return true;
+          }
+
+          return urlString.startsWith(match);
+        }
+
+        return match.test(urlString);
+      });
+    };
+
+    const host = process.env.VERCEL_URL || process.env.NEXT_PUBLIC_VERCEL_URL || null;
+    const branchHost =
+      // eslint-disable-next-line turbo/no-undeclared-env-vars -- Vercel system env var
+      process.env.VERCEL_BRANCH_URL || process.env.NEXT_PUBLIC_VERCEL_BRANCH_URL || null;
+    const propagateContextUrls = config.propagateContextUrls ?? [];
+    const dontPropagateContextUrls = config.dontPropagateContextUrls ?? [];
+    const resourceNameTemplate = config.resourceNameTemplate;
+    const { attributesFromRequestHeaders, attributesFromResponseHeaders } = config;
+
+    const shouldPropagate = (url: URL, init: InternalRequestInit | undefined): boolean => {
+      if (init?.opentelemetry?.propagateContext) {
+        return init.opentelemetry.propagateContext;
+      }
+
+      const urlString = url.toString();
+
+      if (
+        dontPropagateContextUrls.length > 0 &&
+        dontPropagateContextUrls.some((match) => {
+          if (typeof match === "string") {
+            if (match === "*") {
+              return true;
+            }
+
+            return urlString.startsWith(match);
+          }
+
+          return match.test(urlString);
+        })
+      ) {
+        return false;
+      }
+
+      // Allow same origin.
+      if (host && url.protocol === "https:" && (url.host === host || url.host === branchHost)) {
+        return true;
+      }
+
+      // Allow localhost for testing in dev mode.
+      if (!host && url.protocol === "http:" && url.hostname === "localhost") {
+        return true;
+      }
+
+      return propagateContextUrls.some((match) => {
+        if (typeof match === "string") {
+          if (match === "*") {
+            return true;
+          }
+
+          return urlString.startsWith(match);
+        }
+
+        return match.test(urlString);
+      });
+    };
+
+    // Disable fetch tracing in Next.js to avoid double instrumentation.
+    // eslint-disable-next-line turbo/no-undeclared-env-vars -- Next.js internal env var
+    process.env.NEXT_OTEL_FETCH_DISABLED = "1";
+
+    const originalFetch = globalThis.fetch;
+
+    this.tailSamplingOriginalFetch = originalFetch;
+
+    const doFetch: typeof fetch = async (input, initArg) => {
+      const init = initArg as InternalRequestInit | undefined;
+
+      // Passthrough internal Next.js requests.
+      if (init?.next?.internal) {
+        return originalFetch(input, init);
+      }
+
+      const req = new Request(input instanceof Request ? input.clone() : input, init);
+      const url = new URL(req.url);
+
+      if (shouldIgnore(url, init)) {
+        return originalFetch(input, init);
+      }
+
+      const attrs = {
+        [SEMATTRS_HTTP_METHOD]: req.method,
+        [SEMATTRS_HTTP_URL]: req.url,
+        [SEMATTRS_HTTP_HOST]: url.host,
+        [SEMATTRS_HTTP_SCHEME]: url.protocol.replace(":", ""),
+        [SEMATTRS_NET_PEER_NAME]: url.hostname,
+        [SEMATTRS_NET_PEER_PORT]: url.port,
+      };
+      const resourceName = resourceNameTemplate
+        ? resolveTemplate(resourceNameTemplate, attrs)
+        : removeSearch(req.url);
+
+      const spanName = init?.opentelemetry?.spanName ?? `fetch ${req.method} ${req.url}`;
+
+      const parentContext = context.active();
+
+      const span = tracer.startSpan(
+        spanName,
+        {
+          kind: SpanKind.CLIENT,
+          attributes: {
+            ...attrs,
+            "operation.name": `fetch.${req.method}`,
+            "resource.name": resourceName,
+            ...init?.opentelemetry?.attributes,
+          },
+        },
+        parentContext,
+      );
+
+      /*
+       * TAIL SAMPLING CHANGE: Only check isRecording(), NOT isSampled().
+       *
+       * Original @vercel/otel check was:
+       *   if (!span.isRecording() || !isSampled(span.spanContext().traceFlags))
+       *
+       * This caused non-sampled spans to be ended early without error info.
+       * For tail sampling, we need to continue recording so that errors
+       * can be captured and the span can be promoted at end time.
+       *
+       * We still skip non-recording spans (NOT_RECORD sampling decision).
+       */
+      if (!span.isRecording()) {
+        span.end();
+
+        return originalFetch(input, init);
+      }
+
+      /*
+       * Only propagate context if the span is actually sampled.
+       * Non-sampled spans should not propagate trace context to avoid
+       * forcing downstream services to sample.
+       */
+      if (shouldPropagate(url, init) && isSampled(span.spanContext().traceFlags)) {
+        const fetchContext = traceApi.setSpan(parentContext, span);
+
+        propagation.inject(fetchContext, req.headers, HEADERS_SETTER);
+      }
+
+      if (attributesFromRequestHeaders) {
+        headersToAttributes(span, attributesFromRequestHeaders, req.headers);
+      }
+
+      try {
+        const startTime = Date.now();
+
+        /*
+         * Remove "content-type" for FormData body because undici regenerates
+         * a new multipart separator each time.
+         */
+        if (init?.body && init.body instanceof FormData) {
+          req.headers.delete("content-type");
+        }
+
+        const res = await originalFetch(input, {
+          ...init,
+          headers: req.headers,
+        });
+        const duration = Date.now() - startTime;
+
+        span.setAttribute(SEMATTRS_HTTP_STATUS_CODE, res.status);
+        span.setAttribute("http.response_time", duration);
+
+        if (attributesFromResponseHeaders) {
+          headersToAttributes(span, attributesFromResponseHeaders, res.headers);
+        }
+
+        if (res.status >= 500) {
+          onError(span, `Status: ${res.status} (${res.statusText})`);
+        }
+
+        // Flush body, but non-blocking.
+        if (res.body) {
+          void pipeResponse(res).then(
+            (byteLength) => {
+              if (span.isRecording()) {
+                span.setAttribute(SEMATTRS_HTTP_RESPONSE_CONTENT_LENGTH_UNCOMPRESSED, byteLength);
+                span.end();
+              }
+            },
+            (err) => {
+              if (span.isRecording()) {
+                onError(span, err);
+                span.end();
+              }
+            },
+          );
+        } else {
+          span.end();
+        }
+
+        return res;
+      } catch (e) {
+        onError(span, e);
+        span.end();
+        throw e;
+      }
+    };
+
+    globalThis.fetch = doFetch;
+  }
+
+  override disable(): void {
+    if (this.tailSamplingOriginalFetch) {
+      globalThis.fetch = this.tailSamplingOriginalFetch;
+    }
+  }
+}


The TailSamplingFetchInstrumentation class lacks test coverage. Given that it overrides critical enable() logic and handles complex scenarios like context propagation, span recording decisions, and error handling, comprehensive unit tests are needed to ensure the tail sampling behavior works correctly. Consider adding tests for scenarios like: non-sampled spans recording errors, context propagation being skipped for non-sampled spans, and interaction with the TailSamplingProcessor.

packages/otel/src/deferred-sampler.ts

Copilot

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 1 comment.

Files not reviewed (1)

pnpm-lock.yaml: Language not supported

Copilot · 2025-12-11T18:31:38Z

packages/otel/package.json

+    "test": "vitest",
+    "test:ci": "vitest run --coverage"


The otel package now includes test scripts but is missing a vitest.config.ts file. Other apps in the repository (like apps/np-atobarai, apps/stripe) have vitest configuration files that define test workspaces, setup files, and environment settings. Without this configuration, vitest may not find the test files or run with the correct settings.

Create a vitest.config.ts file similar to other packages in the monorepo, defining the test workspace configuration for unit tests.

lkostrowski · 2025-12-12T07:36:58Z

packages/otel/src/vercel-resource-attributes.ts

+ * We replicate @vercel/otel's getResourceAttributes() logic here so that
+ * TailSamplingProcessor can add these attributes when promoting spans.


do you have to replicate it?

can you

class TailSamplingProcess { constrcutor(private vercelProcessor: CompositeSpanProcessor ){} promoteToSampled() { ... this.vercelPRocessor.something()... } }

Unfortunately, this is not exported method from Vercel SDK :/

https://github.com/vercel/otel/blob/a4a86622ddd186c1f32db2c832cab5f6b623c717/packages/otel/src/processor/composite-span-processor.ts?plain=1#L149

We also cannot call it again because this span processor is internal, it runs before we can do anything (because registerOTel wraps any processor we pass to it, and Next.js processors run first)

new version also doesn't export this: https://github.com/vercel/otel/blob/main/packages/otel/src/processor/composite-span-processor.ts#L150

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 10 comments.

Files not reviewed (1)

pnpm-lock.yaml: Language not supported

Copilot · 2025-12-12T12:38:52Z

packages/otel/src/fetch-instrumentation.ts

+        return originalFetch(input, init);
+      }
+
+      if (shouldPropagate(url, init) && isSampled(span.spanContext().traceFlags)) {


The logic for propagating context only for sampled spans may break distributed tracing. When a non-sampled span makes a fetch request and later gets promoted by TailSamplingProcessor, the downstream service won't have the trace context because it wasn't propagated. This can result in broken trace chains where error spans appear disconnected from their parent traces.

Consider either:

Always propagating context regardless of sampling (the parent trace decision should be maintained), or

Document this limitation clearly that tail-sampled spans will have broken distributed traces for outgoing requests

Suggested change

if (shouldPropagate(url, init) && isSampled(span.spanContext().traceFlags)) {

if (shouldPropagate(url, init)) {

Copilot · 2025-12-12T12:38:52Z

packages/otel/src/vercel-resource-attributes.ts

@@ -0,0 +1,115 @@
+/**
+ * DataDog resource attributes for @vercel/otel compatibility.


Typo in comment: "DataDog" should be "Datadog" (one word, capital D only at the start)

Suggested change

* DataDog resource attributes for @vercel/otel compatibility.

* Datadog resource attributes for @vercel/otel compatibility.

Copilot · 2025-12-12T12:38:52Z

packages/otel/src/vercel-resource-attributes.ts

+ * 3. TailSamplingProcessor.onEnd() runs → promotes span to SAMPLED
+ * 4. Promoted span goes to BatchSpanProcessor (never back through CompositeSpanProcessor)
+ *
+ * Result: promoted spans are missing DataDog attributes that normally sampled spans have.


Typo in comment: "DataDog attributes" should be "Datadog attributes" (one word, capital D only at the start)

Suggested change

* Result: promoted spans are missing DataDog attributes that normally sampled spans have.

* Result: promoted spans are missing Datadog attributes that normally sampled spans have.

Copilot · 2025-12-12T12:38:53Z

packages/otel/src/vercel-resource-attributes.ts

+import { ReadableSpan } from "@opentelemetry/sdk-trace-node";
+
+/**
+ * Compute DataDog-compatible resource attributes for a span.


Typo in comment: "DataDog-compatible" should be "Datadog-compatible" (one word, capital D only at the start)

Suggested change

* Compute DataDog-compatible resource attributes for a span.

* Compute Datadog-compatible resource attributes for a span.

Copilot · 2025-12-12T12:38:53Z

packages/otel/src/fetch-instrumentation.ts

+ *
+ * ## When to update
+ *
+ * If @vercel/otel updates their FetchInstrumentation, review their changes


This file is a nearly 400-line copy of @vercel/otel's FetchInstrumentation with minimal modifications (only the isSampled check on line 322). This creates significant maintenance burden as it must be manually synced with upstream changes. Consider alternative approaches like: 1) Contributing a configuration option to @vercel/otel to support tail sampling, 2) Creating a wrapper/proxy pattern that decorates the original instrumentation rather than copying it, or 3) Using monkey-patching to override only the specific behavior. Document the trade-offs of this approach in the file header.

Suggested change

* If @vercel/otel updates their FetchInstrumentation, review their changes

* If @vercel/otel updates their FetchInstrumentation, review their changes

*

* ## Trade-offs and alternatives considered

*

* This file is a near-verbatim copy of @vercel/otel's FetchInstrumentation, with only minimal

* modifications (notably, the isSampled check). This approach introduces significant maintenance

* burden, as any upstream changes must be manually reviewed and merged here.

*

* Alternatives considered:

* 1. **Contributing upstream**: Ideally, @vercel/otel would support a configuration option for tail sampling.

* This would eliminate the need for a fork. However, this requires upstream acceptance and release,

* which may not be timely or feasible for our needs.

* 2. **Wrapper/proxy pattern**: Creating a wrapper that decorates the original FetchInstrumentation

* could avoid code duplication, but the relevant logic is not easily overridable without changes

* to the upstream implementation.

* 3. **Monkey-patching**: Overriding only the specific behavior via monkey-patching is fragile and

* may break with upstream changes, especially if internal APIs are not stable.

*

* Given these constraints, we chose to copy the implementation and clearly mark our modifications.

* This file should be reviewed and updated whenever @vercel/otel's FetchInstrumentation changes.

* The trade-off is increased maintenance burden in exchange for correct tail sampling support.

Copilot · 2025-12-12T12:38:53Z

packages/otel/src/vercel-resource-attributes.ts

+ * ## Why this file exists
+ *
+ * @vercel/otel's CompositeSpanProcessor adds DataDog-specific attributes
+ * (`operation.name`, `resource.name`) in onEnd() - but ONLY for sampled spans.


Typo in comment: "DataDog-specific" should be "Datadog-specific" (one word, capital D only at the start)

Copilot · 2025-12-12T12:38:53Z

packages/otel/src/tail-sampling-processor.ts

+      enumerable: true,
+    });
+
+    // Build new attributes with promotion marker + Vercel/DataDog resource attributes


Typo in comment: "Vercel/DataDog" should be "Vercel/Datadog" (one word, capital D only at the start)

Suggested change

// Build new attributes with promotion marker + Vercel/DataDog resource attributes

// Build new attributes with promotion marker + Vercel/Datadog resource attributes

Copilot · 2025-12-12T12:38:54Z

packages/otel/package.json

+    "@vercel/otel": "catalog:",
    "next": "catalog:",
    "urql": "catalog:"


Adding @vercel/otel as a peer dependency means that any package consuming @saleor/apps-otel must install @vercel/otel even if they don't use the tail sampling features. Consider whether tail sampling functionality should be optional or if @vercel/otel should remain only a dev dependency. If TailSamplingFetchInstrumentation is the only export requiring @vercel/otel, you might want to make it an optional peer dependency or split it into a separate package.

Suggested change

"@vercel/otel": "catalog:",

"next": "catalog:",

"urql": "catalog:"

"next": "catalog:",

"urql": "catalog:"

},

"optionalDependencies": {

"@vercel/otel": "catalog:"

Copilot · 2025-12-12T12:38:54Z

packages/otel/src/http-instrumentation-factory.ts

+   * This allows root spans (requests without traceparent set by Saleor and others)
+   * to be sampled if app decides to sample.


The comment states "Set to true when using DeferredSampler + TailSamplingProcessor" but doesn't explain WHY this allows root spans to be sampled. Consider expanding the documentation to explain that with requireParentforIncomingSpans=true, incoming requests without a traceparent header are rejected, but with tail sampling we need to create root spans for these requests so TailSamplingProcessor can decide whether to export them based on errors/latency. This would help future maintainers understand the architectural decision.

Suggested change

* This allows root spans (requests without traceparent set by Saleor and others)

* to be sampled if app decides to sample.

*

* By default, with `requireParentforIncomingSpans=true`, incoming requests without a

* `traceparent` header are rejected and no root span is created. However, when using

* tail-based sampling (e.g., with TailSamplingProcessor), we want to create root spans

* for these requests so that the TailSamplingProcessor can later decide whether to export

* them based on errors, latency, or other criteria.

*

* This flag allows root spans (requests without traceparent set by Saleor and others)

* to be created and sampled if the app decides to sample, enabling tail-based sampling

* to function as intended.

Copilot · 2025-12-12T12:38:54Z

packages/otel/src/deferred-sampler.ts

+    if (parentSampled) {
+      // Parent decided to sample - we MUST sample too
+      return {
+        decision: SamplingDecision.RECORD_AND_SAMPLED,
+        attributes: {
+          [SALEOR_SAMPLING_DECISION_ATTR]: "sampled",
+        },
+      };
+    }


When a parent span is sampled, this returns RECORD_AND_SAMPLED for the child span, which bypasses TailSamplingProcessor entirely. This means child spans under a sampled parent won't benefit from tail sampling logic (e.g., they won't get the TAIL_SAMPLING_PROMOTED_ATTR attribute or Vercel/Datadog resource attributes added by TailSamplingProcessor). This creates inconsistency where some spans in a trace have these attributes and others don't, which could affect observability and filtering. Consider if TailSamplingProcessor should process all spans, or document this limitation.

Suggested change

if (parentSampled) {

// Parent decided to sample - we MUST sample too

return {

decision: SamplingDecision.RECORD_AND_SAMPLED,

attributes: {

[SALEOR_SAMPLING_DECISION_ATTR]: "sampled",

},

};

}

// Always return RECORD to allow TailSamplingProcessor to make the final export decision,

// even if the parent is sampled. This ensures all spans can be processed for tail sampling

// and receive consistent attributes (e.g., TAIL_SAMPLING_PROMOTED_ATTR).

return {

decision: SamplingDecision.RECORD,

attributes: {

[SALEOR_SAMPLING_DECISION_ATTR]: parentSampled ? "sampled" : (parentSpanContext ? "not_sampled" : "none"),

},

};

Copilot

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 5 comments.

Files not reviewed (1)

pnpm-lock.yaml: Language not supported

Copilot · 2025-12-12T16:25:57Z

packages/otel/src/vercel-resource-attributes.ts

+      // SALEOR MODIFICATION: Added type assertion for stricter TypeScript config
+      kind === SpanKind.INTERNAL ? "" : SPAN_KIND_NAME[kind as SpanKind],


The type assertion kind as SpanKind is unnecessary and potentially dangerous. The kind variable is already typed as SpanKind from line 60 where it's destructured from span, and span.kind is of type SpanKind. Additionally, the check kind === SpanKind.INTERNAL means that when it's NOT INTERNAL, kind could be any of the other valid SpanKind values (SERVER, CLIENT, PRODUCER, CONSUMER), which are all valid keys in SPAN_KIND_NAME. The type assertion bypasses TypeScript's safety without providing any benefit.

Suggested change

// SALEOR MODIFICATION: Added type assertion for stricter TypeScript config

kind === SpanKind.INTERNAL ? "" : SPAN_KIND_NAME[kind as SpanKind],

// SALEOR MODIFICATION: Removed unnecessary type assertion for stricter TypeScript config

kind === SpanKind.INTERNAL ? "" : SPAN_KIND_NAME[kind],

Copilot · 2025-12-12T16:25:58Z

apps/search/src/instrumentations/otel-node.ts

-  ],
-  instrumentations: [createHttpInstrumentation()],
+  spanProcessors: [tailSamplingProcessor],
+  instrumentations: [createHttpInstrumentation({ usingDeferredSpanProcessor: true })],


The TailSamplingFetchInstrumentation is not being added to the instrumentations array. According to the documentation in fetch-instrumentation.ts, this instrumentation was created specifically to support tail sampling by recording span data for non-sampled spans. Without it, fetch calls won't be properly instrumented for tail sampling. The instrumentations array should include an instance of TailSamplingFetchInstrumentation.

Copilot · 2025-12-12T16:25:58Z

apps/search/src/instrumentations/otel-node.ts

+const batchProcessor = createBatchSpanProcessor({
+  accessToken: process.env.OTEL_ACCESS_TOKEN,
+});
+
+const tailSamplingProcessor = new TailSamplingProcessor({
+  processor: batchProcessor,
+  slowThresholdMs: 5000,
+  exportErrors: true,
+  exportSlowSpans: true,


The tail sampling configuration values (slowThresholdMs: 5000, exportErrors: true, exportSlowSpans: true) are hardcoded. Consider extracting these to environment variables or a configuration file to allow tuning in different environments (development, staging, production) without code changes.

Suggested change

const batchProcessor = createBatchSpanProcessor({

accessToken: process.env.OTEL_ACCESS_TOKEN,

});

const tailSamplingProcessor = new TailSamplingProcessor({

processor: batchProcessor,

slowThresholdMs: 5000,

exportErrors: true,

exportSlowSpans: true,

// Tail sampling configuration from environment variables with defaults

const TAIL_SAMPLING_SLOW_THRESHOLD_MS = process.env.TAIL_SAMPLING_SLOW_THRESHOLD_MS

? parseInt(process.env.TAIL_SAMPLING_SLOW_THRESHOLD_MS, 10)

: 5000;

const TAIL_SAMPLING_EXPORT_ERRORS = process.env.TAIL_SAMPLING_EXPORT_ERRORS

? process.env.TAIL_SAMPLING_EXPORT_ERRORS.toLowerCase() === "true"

: true;

const TAIL_SAMPLING_EXPORT_SLOW_SPANS = process.env.TAIL_SAMPLING_EXPORT_SLOW_SPANS

? process.env.TAIL_SAMPLING_EXPORT_SLOW_SPANS.toLowerCase() === "true"

: true;

const batchProcessor = createBatchSpanProcessor({

accessToken: process.env.OTEL_ACCESS_TOKEN,

});

const tailSamplingProcessor = new TailSamplingProcessor({

processor: batchProcessor,

slowThresholdMs: TAIL_SAMPLING_SLOW_THRESHOLD_MS,

exportErrors: TAIL_SAMPLING_EXPORT_ERRORS,

exportSlowSpans: TAIL_SAMPLING_EXPORT_SLOW_SPANS,

Copilot · 2025-12-12T16:25:58Z

packages/otel/src/tail-sampling-processor.ts

+  private isError(span: ReadableSpan): boolean {
+    // Check span status
+    if (span.status.code === SpanStatusCode.ERROR) {
+      return true;
+    }
+
+    // Check for recorded exceptions
+    if (span.events.some((event) => event.name === "exception")) {
+      return true;
+    }
+
+    // Check HTTP status code attribute (using semantic convention)
+    const httpStatusCode = span.attributes[ATTR_HTTP_RESPONSE_STATUS_CODE];
+
+    if (typeof httpStatusCode === "number" && httpStatusCode >= 500) {
+      return true;
+    }
+
+    return false;
+  }


The error detection logic checks for HTTP 500+ status codes, but HTTP 4xx client errors (like 401, 403, 429) might also indicate important issues worth sampling in some scenarios. Consider whether rate limiting (429) or authentication failures (401, 403) should trigger tail sampling, as these could indicate security issues or service degradation that might be valuable to trace.

Copilot · 2025-12-12T16:25:59Z

packages/otel/package.json

  "scripts": {
    "check-types": "tsc",
    "lint": "eslint .",
-    "lint:fix": "eslint --fix ."
+    "lint:fix": "eslint --fix .",
+    "test": "vitest",
+    "test:ci": "vitest run --coverage"


The test commands reference setup files (pattern seen in other packages), but there is no setup-tests.ts file in the otel package src directory. This file is needed to configure the test environment according to repository conventions.

witoszekdev · 2025-12-12T17:31:24Z

I found out that this current implementation doesn't actually export all spans in our trace when it matches our logic (error or timeout). This means that we would have only a single sparse spans on errors, but not the entire trace.

Going to implement different solution to address this.

witoszekdev added 5 commits December 11, 2025 18:58

Add tests to OTEL package

66b90b1

Add deffered sampler

c3c0867

Add tail sampling processor

461ecba

Add fetch instrumentation

d1e3e68

Add tail sampling to search app

c61247d

Copilot AI review requested due to automatic review settings December 11, 2025 18:13

github-actions bot assigned witoszekdev Dec 11, 2025

github-actions bot added the App: Search label Dec 11, 2025

Copilot started reviewing on behalf of witoszekdev December 11, 2025 18:13 View session

Remove useless test

0c2097e

Copilot AI reviewed Dec 11, 2025

View reviewed changes

witoszekdev added 2 commits December 11, 2025 19:24

Add better comments

e7771d8

CR: fix test abstraction

b1bf52e

Copilot AI review requested due to automatic review settings December 11, 2025 18:26

Copilot started reviewing on behalf of witoszekdev December 11, 2025 18:27 View session

Copilot AI reviewed Dec 11, 2025

View reviewed changes

Add better description for Vercel OTEL hack

519ebe4

lkostrowski reviewed Dec 12, 2025

View reviewed changes

Update packages/otel/src/deferred-sampler.ts

81d6231

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings December 12, 2025 12:30

Copilot started reviewing on behalf of witoszekdev December 12, 2025 12:30 View session

Copilot AI reviewed Dec 12, 2025

View reviewed changes

Add more explanation for hack

9779ee0

Copilot AI review requested due to automatic review settings December 12, 2025 16:18

Copilot started reviewing on behalf of witoszekdev December 12, 2025 16:19 View session

Copy-paste actual files from @vercel/sdk, and apply patches

afbf49a

witoszekdev force-pushed the inc-669-sample-more-spans-in-apps branch from 82bf650 to afbf49a Compare December 12, 2025 16:21

Copilot AI reviewed Dec 12, 2025

View reviewed changes

WIP: Add sampling processor with buffering

007b6e0

	instrumentations: [createHttpInstrumentation({ usingDeferredSpanProcessor: true })],
	instrumentations: [createHttpInstrumentation({ usingDeferredSampler: true })],

	const spanName = init?.opentelemetry?.spanName ?? `fetch ${req.method} ${req.url}`;
	const spanName = init?.opentelemetry?.spanName ?? `fetch ${req.method} ${removeSearch(req.url)}`;

	[SEMATTRS_HTTP_URL]: req.url,
	[SEMATTRS_HTTP_URL]: removeSearch(req.url),

		* We replicate @vercel/otel's getResourceAttributes() logic here so that
		* TailSamplingProcessor can add these attributes when promoting spans.

	if (shouldPropagate(url, init) && isSampled(span.spanContext().traceFlags)) {
	if (shouldPropagate(url, init)) {

		@@ -0,0 +1,115 @@
		/**
		* DataDog resource attributes for @vercel/otel compatibility.

	* Result: promoted spans are missing DataDog attributes that normally sampled spans have.
	* Result: promoted spans are missing Datadog attributes that normally sampled spans have.

	* Compute DataDog-compatible resource attributes for a span.
	* Compute Datadog-compatible resource attributes for a span.

- * If @vercel/otel updates their FetchInstrumentation, review their changes
+ * If @vercel/otel updates their FetchInstrumentation, review their changes
+ *
+ * ## Trade-offs and alternatives considered
+ *
+ * This file is a near-verbatim copy of @vercel/otel's FetchInstrumentation, with only minimal
+ * modifications (notably, the isSampled check). This approach introduces significant maintenance
+ * burden, as any upstream changes must be manually reviewed and merged here.
+ *
+ * Alternatives considered:
+ * 1. **Contributing upstream**: Ideally, @vercel/otel would support a configuration option for tail sampling.
+ *    This would eliminate the need for a fork. However, this requires upstream acceptance and release,
+ *    which may not be timely or feasible for our needs.
+ * 2. **Wrapper/proxy pattern**: Creating a wrapper that decorates the original FetchInstrumentation
+ *    could avoid code duplication, but the relevant logic is not easily overridable without changes
+ *    to the upstream implementation.
+ * 3. **Monkey-patching**: Overriding only the specific behavior via monkey-patching is fragile and
+ *    may break with upstream changes, especially if internal APIs are not stable.
+ *
+ * Given these constraints, we chose to copy the implementation and clearly mark our modifications.
+ * This file should be reviewed and updated whenever @vercel/otel's FetchInstrumentation changes.
+ * The trade-off is increased maintenance burden in exchange for correct tail sampling support.

	// Build new attributes with promotion marker + Vercel/DataDog resource attributes
	// Build new attributes with promotion marker + Vercel/Datadog resource attributes

		* This allows root spans (requests without traceparent set by Saleor and others)
		* to be sampled if app decides to sample.

-   * This allows root spans (requests without traceparent set by Saleor and others)
-   * to be sampled if app decides to sample.
+   *
+   * By default, with `requireParentforIncomingSpans=true`, incoming requests without a
+   * `traceparent` header are rejected and no root span is created. However, when using
+   * tail-based sampling (e.g., with TailSamplingProcessor), we want to create root spans
+   * for these requests so that the TailSamplingProcessor can later decide whether to export
+   * them based on errors, latency, or other criteria.
+   *
+   * This flag allows root spans (requests without traceparent set by Saleor and others)
+   * to be created and sampled if the app decides to sample, enabling tail-based sampling
+   * to function as intended.

		// SALEOR MODIFICATION: Added type assertion for stricter TypeScript config
		kind === SpanKind.INTERNAL ? "" : SPAN_KIND_NAME[kind as SpanKind],

Conversation

witoszekdev commented Dec 11, 2025

Scope of the PR

Related issues

Checklist

Uh oh!

changeset-bot bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

vercel bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Differences Found

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

lkostrowski Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

witoszekdev Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

witoszekdev Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

changeset-bot bot commented Dec 11, 2025 •

edited

Loading

vercel bot commented Dec 11, 2025 •

edited

Loading

codecov bot commented Dec 11, 2025 •

edited

Loading

github-actions bot commented Dec 11, 2025 •

edited

Loading

witoszekdev Dec 12, 2025 •

edited

Loading

witoszekdev Dec 12, 2025 •

edited

Loading