Skip to content

[INC-669] Add tail sampling implemented in OTEL SDK#2164

Draft
witoszekdev wants to merge 13 commits intomainfrom
inc-669-sample-more-spans-in-apps
Draft

[INC-669] Add tail sampling implemented in OTEL SDK#2164
witoszekdev wants to merge 13 commits intomainfrom
inc-669-sample-more-spans-in-apps

Conversation

@witoszekdev
Copy link
Copy Markdown
Member

  • Add tests to OTEL package
  • Add deffered sampler
  • Add tail sampling processor
  • Add fetch instrumentation
  • Add tail sampling to search app

Scope of the PR

Related issues

Checklist

Copilot AI review requested due to automatic review settings December 11, 2025 18:13
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Dec 11, 2025

⚠️ No Changeset found

Latest commit: 007b6e0

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link
Copy Markdown

vercel bot commented Dec 11, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
saleor-app-avatax Ready Ready Preview Comment Dec 12, 2025 5:45pm
saleor-app-cms Ready Ready Preview Comment Dec 12, 2025 5:45pm
saleor-app-klaviyo Ready Ready Preview Comment Dec 12, 2025 5:45pm
saleor-app-payment-np-atobarai Ready Ready Preview Comment Dec 12, 2025 5:45pm
saleor-app-payment-stripe Ready Ready Preview Comment Dec 12, 2025 5:45pm
saleor-app-products-feed Ready Ready Preview Comment Dec 12, 2025 5:45pm
saleor-app-search Ready Ready Preview Comment Dec 12, 2025 5:45pm
saleor-app-segment Ready Ready Preview Comment Dec 12, 2025 5:45pm
saleor-app-smtp Ready Ready Preview Comment Dec 12, 2025 5:45pm

@codecov
Copy link
Copy Markdown

codecov bot commented Dec 11, 2025

Codecov Report

❌ Patch coverage is 0% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 36.07%. Comparing base (b556fdb) to head (519ebe4).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
apps/search/src/instrumentations/otel-node.ts 0.00% 14 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2164      +/-   ##
==========================================
- Coverage   36.07%   36.07%   -0.01%     
==========================================
  Files         938      938              
  Lines       60362    60370       +8     
  Branches     2968     2968              
==========================================
  Hits        21778    21778              
- Misses      38224    38232       +8     
  Partials      360      360              
Flag Coverage Δ
avatax 56.98% <ø> (ø)
cms 15.76% <ø> (ø)
domain 100.00% <ø> (ø)
dynamo-config-repository 79.29% <ø> (ø)
errors 91.66% <ø> (ø)
logger 28.81% <ø> (ø)
np-atobarai 71.99% <ø> (ø)
products-feed 5.23% <ø> (ø)
search 27.67% <0.00%> (-0.08%) ⬇️
segment 30.20% <ø> (ø)
shared 35.88% <ø> (ø)
smtp 34.96% <ø> (ø)
stripe 70.58% <ø> (ø)
webhook-utils 11.02% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 11, 2025

Differences Found

✅ No packages or licenses were added.

Summary

Expand
License Name Package Count Packages
0BSD 1
Packages
  • tslib
CC BY-SA 4.0 1
Packages
  • @cspell/dict-en-common-misspellings
CC-BY-3.0 1
Packages
  • spdx-exceptions
MIT (http://mootools.net/license.txt) 1
Packages
  • slick
MIT/X11 1
Packages
  • nub
Public Domain 1
Packages
  • jsonify
Python-2.0 1
Packages
  • argparse
SEE LICENSE IN LICENSE 1
Packages
  • spawndamnit
SEE LICENSE IN LICENSE.md 1
Packages
  • lightcookie
Unlicense 1
Packages
  • @sinonjs/text-encoding
WTFPL 1
Packages
  • opener
CC0-1.0 2
Packages
  • spdx-license-ids
  • type-fest
BlueOak-1.0.0 3
Packages
  • jackspeak
  • package-json-from-dist
  • path-scurry
CC-BY-4.0 3
Packages
  • @saleor/macaw-ui
  • caniuse-lite
  • saleor-apps
LGPL-3.0-or-later 11
Packages
  • @img/sharp-libvips-darwin-arm64
  • @img/sharp-libvips-darwin-x64
  • @img/sharp-libvips-linux-arm
  • @img/sharp-libvips-linux-arm64
  • @img/sharp-libvips-linux-s390x
  • @img/sharp-libvips-linux-x64
  • @img/sharp-libvips-linuxmusl-arm64
  • @img/sharp-libvips-linuxmusl-x64
  • @img/sharp-wasm32
  • @img/sharp-win32-ia32
  • @img/sharp-win32-x64
BSD-2-Clause 23
Packages
  • cheerio-select
  • css-select
  • css-what
  • domelementtype
  • domhandler
  • domutils
  • dotenv
  • entities
  • escodegen
  • eslint-scope
  • espree
  • esprima
  • esrecurse
  • estraverse
  • esutils
  • glob-to-regexp
  • normalize-package-data
  • nth-check
  • shimmer
  • terser
  • And 3 more...
<<missing>> 25
Packages
  • @saleor/apps-domain
  • @saleor/apps-logger
  • @saleor/apps-otel
  • @saleor/apps-shared
  • @saleor/apps-trpc
  • @saleor/apps-ui
  • @saleor/dynamo-config-repository
  • @saleor/errors
  • @saleor/eslint-config-apps
  • @saleor/react-hook-form-macaw
  • @saleor/sentry-utils
  • @saleor/typescript-config-apps
  • @saleor/webhook-utils
  • busboy
  • json-query
  • saleor-app-avatax
  • saleor-app-cms
  • saleor-app-klaviyo
  • saleor-app-payment-np-atobarai
  • saleor-app-payment-stripe
  • And 5 more...
BSD-3-Clause 48
Packages
  • @protobufjs/aspromise
  • @protobufjs/base64
  • @protobufjs/codegen
  • @protobufjs/eventemitter
  • @protobufjs/fetch
  • @protobufjs/float
  • @protobufjs/inquire
  • @protobufjs/path
  • @protobufjs/pool
  • @protobufjs/utf8
  • @saleor/app-sdk
  • @saleor/eslint-plugin-saleor-app
  • @sentry/cli
  • @sentry/cli-darwin
  • @sentry/cli-linux-arm
  • @sentry/cli-linux-arm64
  • @sentry/cli-linux-i686
  • @sentry/cli-linux-x64
  • @sentry/cli-win32-i686
  • @sentry/cli-win32-x64
  • And 28 more...
ISC 56
Packages
  • @bundled-es-modules/cookie
  • @bundled-es-modules/statuses
  • @bundled-es-modules/tough-cookie
  • @isaacs/cliui
  • abbrev
  • anymatch
  • boolbase
  • cli-width
  • cliui
  • concat-with-sourcemaps
  • electron-to-chromium
  • fastq
  • flatted
  • foreground-child
  • form-data-lite
  • fs.realpath
  • get-caller-file
  • glob
  • glob-parent
  • graceful-fs
  • And 36 more...
Apache-2.0 235
Packages
  • @ampproject/remapping
  • @aws-crypto/crc32
  • @aws-crypto/crc32c
  • @aws-crypto/ie11-detection
  • @aws-crypto/sha1-browser
  • @aws-crypto/sha256-browser
  • @aws-crypto/sha256-js
  • @aws-crypto/supports-web-crypto
  • @aws-crypto/util
  • @aws-sdk/abort-controller
  • @aws-sdk/chunked-blob-reader
  • @aws-sdk/client-dynamodb
  • @aws-sdk/client-s3
  • @aws-sdk/client-sso
  • @aws-sdk/client-sso-oidc
  • @aws-sdk/client-sts
  • @aws-sdk/config-resolver
  • @aws-sdk/core
  • @aws-sdk/credential-provider-env
  • @aws-sdk/credential-provider-http
  • And 215 more...
MIT 1436
Packages
  • @0no-co/graphql.web
  • @adobe/css-tools
  • @algolia/cache-browser-local-storage
  • @algolia/cache-common
  • @algolia/cache-in-memory
  • @algolia/client-account
  • @algolia/client-analytics
  • @algolia/client-common
  • @algolia/client-personalization
  • @algolia/client-search
  • @algolia/logger-common
  • @algolia/logger-console
  • @algolia/recommend
  • @algolia/requester-browser-xhr
  • @algolia/requester-common
  • @algolia/requester-node-http
  • @algolia/transporter
  • @apidevtools/json-schema-ref-parser
  • @ardatan/relay-compiler
  • @ardatan/sync-fetch
  • And 1416 more...

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements SDK-level tail sampling for OpenTelemetry in the Saleor Apps monorepo, enabling applications to intelligently sample traces based on runtime characteristics (errors and latency) rather than upfront decisions. The implementation introduces a DeferredSampler that defers sampling decisions to span end, a TailSamplingProcessor that promotes spans meeting specific criteria (errors or slow responses), and custom fetch instrumentation to support recording non-sampled spans. The feature is integrated into the search app as a proof of concept.

Key Changes:

  • Added DeferredSampler and TailSamplingProcessor classes with comprehensive test coverage to enable tail sampling decisions based on span characteristics
  • Implemented TailSamplingFetchInstrumentation to support recording spans even when not initially sampled, allowing error information to be captured
  • Updated createHttpInstrumentation to support root span creation when using deferred sampling
  • Integrated tail sampling into the search app's OTEL configuration

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
packages/otel/src/tail-sampling-processor.ts Core implementation of tail sampling processor that promotes non-sampled spans to sampled based on error status or latency thresholds
packages/otel/src/tail-sampling-processor.test.ts Comprehensive unit tests covering error detection, slow span detection, configuration options, and resource attribute handling
packages/otel/src/deferred-sampler.ts Sampler implementation that defers final sampling decisions to span end while respecting parent sampling decisions
packages/otel/src/deferred-sampler.test.ts Unit tests for deferred sampler covering all sampling decision paths
packages/otel/src/fetch-instrumentation.ts Custom fetch instrumentation extending @vercel/otel to continue recording non-sampled spans for tail sampling support
packages/otel/src/http-instrumentation-factory.ts Updated to conditionally allow root span creation when using deferred sampler
apps/search/src/instrumentations/otel-node.ts Integrated tail sampling into search app by configuring DeferredSampler and TailSamplingProcessor
packages/otel/package.json Added test scripts and dependencies (@vercel/otel, vitest, coverage tools)
pnpm-lock.yaml Updated lock file with new dependencies and removed unused Babel packages
cspell.config.js Added "traceparent" to spell check dictionary
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

Comment on lines +5 to +29
export interface TailSamplingProcessorConfig {
/**
* The downstream processor to send sampled spans to.
* Typically a BatchSpanProcessor.
*/
processor: SpanProcessor;

/**
* Latency threshold in milliseconds. Spans slower than this are always exported.
* @default 5000 (5 seconds)
*/
slowThresholdMs?: number;

/**
* Whether to always export error spans.
* @default true
*/
exportErrors?: boolean;

/**
* Whether to always export slow spans.
* @default true
*/
exportSlowSpans?: boolean;
}
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TailSamplingProcessorConfig interface lacks documentation explaining the relationship between the configuration options. It would be helpful to document that when both exportErrors and exportSlowSpans are false, no spans will be promoted (effectively disabling tail sampling). Also document that this processor should be used with DeferredSampler for proper operation.

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +10
"test": "vitest",
"test:ci": "vitest run --coverage"
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package.json defines test scripts but there's no vitest.config.ts file in the packages/otel directory. Based on other apps in the repository (like stripe and np-atobarai), a vitest configuration file is needed to properly set up the test environment, specify test file patterns, and configure test setup files.

Copilot uses AI. Check for mistakes.
],
instrumentations: [createHttpInstrumentation()],
spanProcessors: [tailSamplingProcessor],
instrumentations: [createHttpInstrumentation({ usingDeferredSpanProcessor: true })],
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter name "usingDeferredSpanProcessor" is misleading. It refers to using a DeferredSampler, not a DeferredSpanProcessor. Consider using "usingDeferredSampler" or "allowRootSpans" for clarity.

Suggested change
instrumentations: [createHttpInstrumentation({ usingDeferredSpanProcessor: true })],
instrumentations: [createHttpInstrumentation({ usingDeferredSampler: true })],

Copilot uses AI. Check for mistakes.
? resolveTemplate(resourceNameTemplate, attrs)
: removeSearch(req.url);

const spanName = init?.opentelemetry?.spanName ?? `fetch ${req.method} ${req.url}`;
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default span name includes the full URL which could lead to very long span names and potential performance issues or data exposure. The URL might contain sensitive query parameters or be extremely long. Consider using just the method and hostname or path without query parameters for the default span name, similar to how resourceName uses removeSearch().

Suggested change
const spanName = init?.opentelemetry?.spanName ?? `fetch ${req.method} ${req.url}`;
const spanName = init?.opentelemetry?.spanName ?? `fetch ${req.method} ${removeSearch(req.url)}`;

Copilot uses AI. Check for mistakes.

const attrs = {
[SEMATTRS_HTTP_METHOD]: req.method,
[SEMATTRS_HTTP_URL]: req.url,
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The full URL including query parameters is stored in the SEMATTRS_HTTP_URL attribute. This could expose sensitive information like API keys, tokens, or personal data that might be in query parameters. Consider sanitizing the URL to remove query parameters or use a more privacy-conscious approach for the http.url attribute.

Suggested change
[SEMATTRS_HTTP_URL]: req.url,
[SEMATTRS_HTTP_URL]: removeSearch(req.url),

Copilot uses AI. Check for mistakes.
Comment on lines +130 to +398
export class TailSamplingFetchInstrumentation extends FetchInstrumentation {
override instrumentationName = "@saleor/otel/fetch";

private tailSamplingConfig: FetchInstrumentationConfig;
private tailSamplingOriginalFetch: typeof fetch | undefined;
private tailSamplingTracerProvider: TracerProvider | undefined;

constructor(config: FetchInstrumentationConfig = {}) {
super(config);
this.tailSamplingConfig = config;
}

override setTracerProvider(tracerProvider: TracerProvider): void {
super.setTracerProvider(tracerProvider);
this.tailSamplingTracerProvider = tracerProvider;
}

/**
* Override enable() to remove the `!isSampled()` check.
*
* This is a copy of @vercel/otel's FetchInstrumentation.enable() with one key change:
* The original check `if (!span.isRecording() || !isSampled(span.spanContext().traceFlags))`
* is changed to just `if (!span.isRecording())` to support tail sampling.
*/
override enable(): void {
this.disable();

const tracerProvider = this.tailSamplingTracerProvider;

if (!tracerProvider) {
return;
}

const tracer = tracerProvider.getTracer(this.instrumentationName, this.instrumentationVersion);

const config = this.tailSamplingConfig;
const ignoreUrls = config.ignoreUrls ?? [];

const shouldIgnore = (url: URL, init: InternalRequestInit | undefined): boolean => {
if (init?.opentelemetry?.ignore !== undefined) {
return init.opentelemetry.ignore;
}

if (ignoreUrls.length === 0) {
return false;
}

const urlString = url.toString();

return ignoreUrls.some((match) => {
if (typeof match === "string") {
if (match === "*") {
return true;
}

return urlString.startsWith(match);
}

return match.test(urlString);
});
};

const host = process.env.VERCEL_URL || process.env.NEXT_PUBLIC_VERCEL_URL || null;
const branchHost =
// eslint-disable-next-line turbo/no-undeclared-env-vars -- Vercel system env var
process.env.VERCEL_BRANCH_URL || process.env.NEXT_PUBLIC_VERCEL_BRANCH_URL || null;
const propagateContextUrls = config.propagateContextUrls ?? [];
const dontPropagateContextUrls = config.dontPropagateContextUrls ?? [];
const resourceNameTemplate = config.resourceNameTemplate;
const { attributesFromRequestHeaders, attributesFromResponseHeaders } = config;

const shouldPropagate = (url: URL, init: InternalRequestInit | undefined): boolean => {
if (init?.opentelemetry?.propagateContext) {
return init.opentelemetry.propagateContext;
}

const urlString = url.toString();

if (
dontPropagateContextUrls.length > 0 &&
dontPropagateContextUrls.some((match) => {
if (typeof match === "string") {
if (match === "*") {
return true;
}

return urlString.startsWith(match);
}

return match.test(urlString);
})
) {
return false;
}

// Allow same origin.
if (host && url.protocol === "https:" && (url.host === host || url.host === branchHost)) {
return true;
}

// Allow localhost for testing in dev mode.
if (!host && url.protocol === "http:" && url.hostname === "localhost") {
return true;
}

return propagateContextUrls.some((match) => {
if (typeof match === "string") {
if (match === "*") {
return true;
}

return urlString.startsWith(match);
}

return match.test(urlString);
});
};

// Disable fetch tracing in Next.js to avoid double instrumentation.
// eslint-disable-next-line turbo/no-undeclared-env-vars -- Next.js internal env var
process.env.NEXT_OTEL_FETCH_DISABLED = "1";

const originalFetch = globalThis.fetch;

this.tailSamplingOriginalFetch = originalFetch;

const doFetch: typeof fetch = async (input, initArg) => {
const init = initArg as InternalRequestInit | undefined;

// Passthrough internal Next.js requests.
if (init?.next?.internal) {
return originalFetch(input, init);
}

const req = new Request(input instanceof Request ? input.clone() : input, init);
const url = new URL(req.url);

if (shouldIgnore(url, init)) {
return originalFetch(input, init);
}

const attrs = {
[SEMATTRS_HTTP_METHOD]: req.method,
[SEMATTRS_HTTP_URL]: req.url,
[SEMATTRS_HTTP_HOST]: url.host,
[SEMATTRS_HTTP_SCHEME]: url.protocol.replace(":", ""),
[SEMATTRS_NET_PEER_NAME]: url.hostname,
[SEMATTRS_NET_PEER_PORT]: url.port,
};
const resourceName = resourceNameTemplate
? resolveTemplate(resourceNameTemplate, attrs)
: removeSearch(req.url);

const spanName = init?.opentelemetry?.spanName ?? `fetch ${req.method} ${req.url}`;

const parentContext = context.active();

const span = tracer.startSpan(
spanName,
{
kind: SpanKind.CLIENT,
attributes: {
...attrs,
"operation.name": `fetch.${req.method}`,
"resource.name": resourceName,
...init?.opentelemetry?.attributes,
},
},
parentContext,
);

/*
* TAIL SAMPLING CHANGE: Only check isRecording(), NOT isSampled().
*
* Original @vercel/otel check was:
* if (!span.isRecording() || !isSampled(span.spanContext().traceFlags))
*
* This caused non-sampled spans to be ended early without error info.
* For tail sampling, we need to continue recording so that errors
* can be captured and the span can be promoted at end time.
*
* We still skip non-recording spans (NOT_RECORD sampling decision).
*/
if (!span.isRecording()) {
span.end();

return originalFetch(input, init);
}

/*
* Only propagate context if the span is actually sampled.
* Non-sampled spans should not propagate trace context to avoid
* forcing downstream services to sample.
*/
if (shouldPropagate(url, init) && isSampled(span.spanContext().traceFlags)) {
const fetchContext = traceApi.setSpan(parentContext, span);

propagation.inject(fetchContext, req.headers, HEADERS_SETTER);
}

if (attributesFromRequestHeaders) {
headersToAttributes(span, attributesFromRequestHeaders, req.headers);
}

try {
const startTime = Date.now();

/*
* Remove "content-type" for FormData body because undici regenerates
* a new multipart separator each time.
*/
if (init?.body && init.body instanceof FormData) {
req.headers.delete("content-type");
}

const res = await originalFetch(input, {
...init,
headers: req.headers,
});
const duration = Date.now() - startTime;

span.setAttribute(SEMATTRS_HTTP_STATUS_CODE, res.status);
span.setAttribute("http.response_time", duration);

if (attributesFromResponseHeaders) {
headersToAttributes(span, attributesFromResponseHeaders, res.headers);
}

if (res.status >= 500) {
onError(span, `Status: ${res.status} (${res.statusText})`);
}

// Flush body, but non-blocking.
if (res.body) {
void pipeResponse(res).then(
(byteLength) => {
if (span.isRecording()) {
span.setAttribute(SEMATTRS_HTTP_RESPONSE_CONTENT_LENGTH_UNCOMPRESSED, byteLength);
span.end();
}
},
(err) => {
if (span.isRecording()) {
onError(span, err);
span.end();
}
},
);
} else {
span.end();
}

return res;
} catch (e) {
onError(span, e);
span.end();
throw e;
}
};

globalThis.fetch = doFetch;
}

override disable(): void {
if (this.tailSamplingOriginalFetch) {
globalThis.fetch = this.tailSamplingOriginalFetch;
}
}
}
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TailSamplingFetchInstrumentation class lacks test coverage. Given that it overrides critical enable() logic and handles complex scenarios like context propagation, span recording decisions, and error handling, comprehensive unit tests are needed to ensure the tail sampling behavior works correctly. Consider adding tests for scenarios like: non-sampled spans recording errors, context propagation being skipped for non-sampled spans, and interaction with the TailSamplingProcessor.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings December 11, 2025 18:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

Comment on lines +9 to +10
"test": "vitest",
"test:ci": "vitest run --coverage"
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The otel package now includes test scripts but is missing a vitest.config.ts file. Other apps in the repository (like apps/np-atobarai, apps/stripe) have vitest configuration files that define test workspaces, setup files, and environment settings. Without this configuration, vitest may not find the test files or run with the correct settings.

Create a vitest.config.ts file similar to other packages in the monorepo, defining the test workspace configuration for unit tests.

Copilot uses AI. Check for mistakes.
Comment on lines +34 to +35
* We replicate @vercel/otel's getResourceAttributes() logic here so that
* TailSamplingProcessor can add these attributes when promoting spans.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have to replicate it?

can you

class TailSamplingProcess {
  constrcutor(private vercelProcessor: CompositeSpanProcessor ){}

   promoteToSampled() {
  ... 

   this.vercelPRocessor.something()...
}
}

Copy link
Copy Markdown
Member Author

@witoszekdev witoszekdev Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this is not exported method from Vercel SDK :/

https://github.com/vercel/otel/blob/a4a86622ddd186c1f32db2c832cab5f6b623c717/packages/otel/src/processor/composite-span-processor.ts?plain=1#L149

We also cannot call it again because this span processor is internal, it runs before we can do anything (because registerOTel wraps any processor we pass to it, and Next.js processors run first)

Copy link
Copy Markdown
Member Author

@witoszekdev witoszekdev Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings December 12, 2025 12:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 10 comments.

Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

return originalFetch(input, init);
}

if (shouldPropagate(url, init) && isSampled(span.spanContext().traceFlags)) {
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for propagating context only for sampled spans may break distributed tracing. When a non-sampled span makes a fetch request and later gets promoted by TailSamplingProcessor, the downstream service won't have the trace context because it wasn't propagated. This can result in broken trace chains where error spans appear disconnected from their parent traces.

Consider either:

  1. Always propagating context regardless of sampling (the parent trace decision should be maintained), or
  2. Document this limitation clearly that tail-sampled spans will have broken distributed traces for outgoing requests
Suggested change
if (shouldPropagate(url, init) && isSampled(span.spanContext().traceFlags)) {
if (shouldPropagate(url, init)) {

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,115 @@
/**
* DataDog resource attributes for @vercel/otel compatibility.
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "DataDog" should be "Datadog" (one word, capital D only at the start)

Suggested change
* DataDog resource attributes for @vercel/otel compatibility.
* Datadog resource attributes for @vercel/otel compatibility.

Copilot uses AI. Check for mistakes.
* 3. TailSamplingProcessor.onEnd() runs → promotes span to SAMPLED
* 4. Promoted span goes to BatchSpanProcessor (never back through CompositeSpanProcessor)
*
* Result: promoted spans are missing DataDog attributes that normally sampled spans have.
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "DataDog attributes" should be "Datadog attributes" (one word, capital D only at the start)

Suggested change
* Result: promoted spans are missing DataDog attributes that normally sampled spans have.
* Result: promoted spans are missing Datadog attributes that normally sampled spans have.

Copilot uses AI. Check for mistakes.
import { ReadableSpan } from "@opentelemetry/sdk-trace-node";

/**
* Compute DataDog-compatible resource attributes for a span.
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "DataDog-compatible" should be "Datadog-compatible" (one word, capital D only at the start)

Suggested change
* Compute DataDog-compatible resource attributes for a span.
* Compute Datadog-compatible resource attributes for a span.

Copilot uses AI. Check for mistakes.
*
* ## When to update
*
* If @vercel/otel updates their FetchInstrumentation, review their changes
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is a nearly 400-line copy of @vercel/otel's FetchInstrumentation with minimal modifications (only the isSampled check on line 322). This creates significant maintenance burden as it must be manually synced with upstream changes. Consider alternative approaches like: 1) Contributing a configuration option to @vercel/otel to support tail sampling, 2) Creating a wrapper/proxy pattern that decorates the original instrumentation rather than copying it, or 3) Using monkey-patching to override only the specific behavior. Document the trade-offs of this approach in the file header.

Suggested change
* If @vercel/otel updates their FetchInstrumentation, review their changes
* If @vercel/otel updates their FetchInstrumentation, review their changes
*
* ## Trade-offs and alternatives considered
*
* This file is a near-verbatim copy of @vercel/otel's FetchInstrumentation, with only minimal
* modifications (notably, the isSampled check). This approach introduces significant maintenance
* burden, as any upstream changes must be manually reviewed and merged here.
*
* Alternatives considered:
* 1. **Contributing upstream**: Ideally, @vercel/otel would support a configuration option for tail sampling.
* This would eliminate the need for a fork. However, this requires upstream acceptance and release,
* which may not be timely or feasible for our needs.
* 2. **Wrapper/proxy pattern**: Creating a wrapper that decorates the original FetchInstrumentation
* could avoid code duplication, but the relevant logic is not easily overridable without changes
* to the upstream implementation.
* 3. **Monkey-patching**: Overriding only the specific behavior via monkey-patching is fragile and
* may break with upstream changes, especially if internal APIs are not stable.
*
* Given these constraints, we chose to copy the implementation and clearly mark our modifications.
* This file should be reviewed and updated whenever @vercel/otel's FetchInstrumentation changes.
* The trade-off is increased maintenance burden in exchange for correct tail sampling support.

Copilot uses AI. Check for mistakes.
* ## Why this file exists
*
* @vercel/otel's CompositeSpanProcessor adds DataDog-specific attributes
* (`operation.name`, `resource.name`) in onEnd() - but ONLY for sampled spans.
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "DataDog-specific" should be "Datadog-specific" (one word, capital D only at the start)

Copilot uses AI. Check for mistakes.
enumerable: true,
});

// Build new attributes with promotion marker + Vercel/DataDog resource attributes
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "Vercel/DataDog" should be "Vercel/Datadog" (one word, capital D only at the start)

Suggested change
// Build new attributes with promotion marker + Vercel/DataDog resource attributes
// Build new attributes with promotion marker + Vercel/Datadog resource attributes

Copilot uses AI. Check for mistakes.
Comment on lines +47 to 49
"@vercel/otel": "catalog:",
"next": "catalog:",
"urql": "catalog:"
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding @vercel/otel as a peer dependency means that any package consuming @saleor/apps-otel must install @vercel/otel even if they don't use the tail sampling features. Consider whether tail sampling functionality should be optional or if @vercel/otel should remain only a dev dependency. If TailSamplingFetchInstrumentation is the only export requiring @vercel/otel, you might want to make it an optional peer dependency or split it into a separate package.

Suggested change
"@vercel/otel": "catalog:",
"next": "catalog:",
"urql": "catalog:"
"next": "catalog:",
"urql": "catalog:"
},
"optionalDependencies": {
"@vercel/otel": "catalog:"

Copilot uses AI. Check for mistakes.
Comment on lines +6 to +7
* This allows root spans (requests without traceparent set by Saleor and others)
* to be sampled if app decides to sample.
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "Set to true when using DeferredSampler + TailSamplingProcessor" but doesn't explain WHY this allows root spans to be sampled. Consider expanding the documentation to explain that with requireParentforIncomingSpans=true, incoming requests without a traceparent header are rejected, but with tail sampling we need to create root spans for these requests so TailSamplingProcessor can decide whether to export them based on errors/latency. This would help future maintainers understand the architectural decision.

Suggested change
* This allows root spans (requests without traceparent set by Saleor and others)
* to be sampled if app decides to sample.
*
* By default, with `requireParentforIncomingSpans=true`, incoming requests without a
* `traceparent` header are rejected and no root span is created. However, when using
* tail-based sampling (e.g., with TailSamplingProcessor), we want to create root spans
* for these requests so that the TailSamplingProcessor can later decide whether to export
* them based on errors, latency, or other criteria.
*
* This flag allows root spans (requests without traceparent set by Saleor and others)
* to be created and sampled if the app decides to sample, enabling tail-based sampling
* to function as intended.

Copilot uses AI. Check for mistakes.
Comment on lines +41 to +49
if (parentSampled) {
// Parent decided to sample - we MUST sample too
return {
decision: SamplingDecision.RECORD_AND_SAMPLED,
attributes: {
[SALEOR_SAMPLING_DECISION_ATTR]: "sampled",
},
};
}
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a parent span is sampled, this returns RECORD_AND_SAMPLED for the child span, which bypasses TailSamplingProcessor entirely. This means child spans under a sampled parent won't benefit from tail sampling logic (e.g., they won't get the TAIL_SAMPLING_PROMOTED_ATTR attribute or Vercel/Datadog resource attributes added by TailSamplingProcessor). This creates inconsistency where some spans in a trace have these attributes and others don't, which could affect observability and filtering. Consider if TailSamplingProcessor should process all spans, or document this limitation.

Suggested change
if (parentSampled) {
// Parent decided to sample - we MUST sample too
return {
decision: SamplingDecision.RECORD_AND_SAMPLED,
attributes: {
[SALEOR_SAMPLING_DECISION_ATTR]: "sampled",
},
};
}
// Always return RECORD to allow TailSamplingProcessor to make the final export decision,
// even if the parent is sampled. This ensures all spans can be processed for tail sampling
// and receive consistent attributes (e.g., TAIL_SAMPLING_PROMOTED_ATTR).
return {
decision: SamplingDecision.RECORD,
attributes: {
[SALEOR_SAMPLING_DECISION_ATTR]: parentSampled ? "sampled" : (parentSpanContext ? "not_sampled" : "none"),
},
};

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings December 12, 2025 16:18
@witoszekdev witoszekdev force-pushed the inc-669-sample-more-spans-in-apps branch from 82bf650 to afbf49a Compare December 12, 2025 16:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 5 comments.

Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

Comment on lines +115 to +116
// SALEOR MODIFICATION: Added type assertion for stricter TypeScript config
kind === SpanKind.INTERNAL ? "" : SPAN_KIND_NAME[kind as SpanKind],
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type assertion kind as SpanKind is unnecessary and potentially dangerous. The kind variable is already typed as SpanKind from line 60 where it's destructured from span, and span.kind is of type SpanKind. Additionally, the check kind === SpanKind.INTERNAL means that when it's NOT INTERNAL, kind could be any of the other valid SpanKind values (SERVER, CLIENT, PRODUCER, CONSUMER), which are all valid keys in SPAN_KIND_NAME. The type assertion bypasses TypeScript's safety without providing any benefit.

Suggested change
// SALEOR MODIFICATION: Added type assertion for stricter TypeScript config
kind === SpanKind.INTERNAL ? "" : SPAN_KIND_NAME[kind as SpanKind],
// SALEOR MODIFICATION: Removed unnecessary type assertion for stricter TypeScript config
kind === SpanKind.INTERNAL ? "" : SPAN_KIND_NAME[kind],

Copilot uses AI. Check for mistakes.
],
instrumentations: [createHttpInstrumentation()],
spanProcessors: [tailSamplingProcessor],
instrumentations: [createHttpInstrumentation({ usingDeferredSpanProcessor: true })],
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TailSamplingFetchInstrumentation is not being added to the instrumentations array. According to the documentation in fetch-instrumentation.ts, this instrumentation was created specifically to support tail sampling by recording span data for non-sampled spans. Without it, fetch calls won't be properly instrumented for tail sampling. The instrumentations array should include an instance of TailSamplingFetchInstrumentation.

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +24
const batchProcessor = createBatchSpanProcessor({
accessToken: process.env.OTEL_ACCESS_TOKEN,
});

const tailSamplingProcessor = new TailSamplingProcessor({
processor: batchProcessor,
slowThresholdMs: 5000,
exportErrors: true,
exportSlowSpans: true,
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tail sampling configuration values (slowThresholdMs: 5000, exportErrors: true, exportSlowSpans: true) are hardcoded. Consider extracting these to environment variables or a configuration file to allow tuning in different environments (development, staging, production) without code changes.

Suggested change
const batchProcessor = createBatchSpanProcessor({
accessToken: process.env.OTEL_ACCESS_TOKEN,
});
const tailSamplingProcessor = new TailSamplingProcessor({
processor: batchProcessor,
slowThresholdMs: 5000,
exportErrors: true,
exportSlowSpans: true,
// Tail sampling configuration from environment variables with defaults
const TAIL_SAMPLING_SLOW_THRESHOLD_MS = process.env.TAIL_SAMPLING_SLOW_THRESHOLD_MS
? parseInt(process.env.TAIL_SAMPLING_SLOW_THRESHOLD_MS, 10)
: 5000;
const TAIL_SAMPLING_EXPORT_ERRORS = process.env.TAIL_SAMPLING_EXPORT_ERRORS
? process.env.TAIL_SAMPLING_EXPORT_ERRORS.toLowerCase() === "true"
: true;
const TAIL_SAMPLING_EXPORT_SLOW_SPANS = process.env.TAIL_SAMPLING_EXPORT_SLOW_SPANS
? process.env.TAIL_SAMPLING_EXPORT_SLOW_SPANS.toLowerCase() === "true"
: true;
const batchProcessor = createBatchSpanProcessor({
accessToken: process.env.OTEL_ACCESS_TOKEN,
});
const tailSamplingProcessor = new TailSamplingProcessor({
processor: batchProcessor,
slowThresholdMs: TAIL_SAMPLING_SLOW_THRESHOLD_MS,
exportErrors: TAIL_SAMPLING_EXPORT_ERRORS,
exportSlowSpans: TAIL_SAMPLING_EXPORT_SLOW_SPANS,

Copilot uses AI. Check for mistakes.
Comment on lines +117 to +136
private isError(span: ReadableSpan): boolean {
// Check span status
if (span.status.code === SpanStatusCode.ERROR) {
return true;
}

// Check for recorded exceptions
if (span.events.some((event) => event.name === "exception")) {
return true;
}

// Check HTTP status code attribute (using semantic convention)
const httpStatusCode = span.attributes[ATTR_HTTP_RESPONSE_STATUS_CODE];

if (typeof httpStatusCode === "number" && httpStatusCode >= 500) {
return true;
}

return false;
}
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error detection logic checks for HTTP 500+ status codes, but HTTP 4xx client errors (like 401, 403, 429) might also indicate important issues worth sampling in some scenarios. Consider whether rate limiting (429) or authentication failures (401, 403) should trigger tail sampling, as these could indicate security issues or service degradation that might be valuable to trace.

Copilot uses AI. Check for mistakes.
Comment on lines 5 to +10
"scripts": {
"check-types": "tsc",
"lint": "eslint .",
"lint:fix": "eslint --fix ."
"lint:fix": "eslint --fix .",
"test": "vitest",
"test:ci": "vitest run --coverage"
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test commands reference setup files (pattern seen in other packages), but there is no setup-tests.ts file in the otel package src directory. This file is needed to configure the test environment according to repository conventions.

Copilot uses AI. Check for mistakes.
@witoszekdev
Copy link
Copy Markdown
Member Author

I found out that this current implementation doesn't actually export all spans in our trace when it matches our logic (error or timeout). This means that we would have only a single sparse spans on errors, but not the entire trace.

Going to implement different solution to address this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants