Skip to content

feat(logger): disk-backed log store for bounded CLI memory (PER-7809)#2200

Open
pranavz28 wants to merge 1 commit intomasterfrom
feat/per-7809-disk-logs-minimal
Open

feat(logger): disk-backed log store for bounded CLI memory (PER-7809)#2200
pranavz28 wants to merge 1 commit intomasterfrom
feat/per-7809-disk-logs-minimal

Conversation

@pranavz28
Copy link
Copy Markdown
Contributor

@pranavz28 pranavz28 commented Apr 28, 2026

Summary

Replace the unbounded messages = new Set() in @percy/logger with a disk-backed JSONL store and a lazily-built in-memory cache for snapshot-tagged entries. CLI memory stays bounded across long debug-heavy builds (8h × 10k snapshots) without changing the /logs upload payload or per-snapshot log resource bytes.

Jira: PER-7809

Approach

  • JSONL is source of truth, sized down by a buffer. Every entry is appended to ${TMPDIR}/percy-logs/<pid>-<ts>-<r>.jsonl in batches of 500 entries or every 100 ms — so disk syscalls stay at ~10/sec regardless of log volume.
  • snapshotIndex holds offsets, not entries. ~16 bytes per tagged entry → ~5 MB worst case for an 8-hour 10k-snapshot debug build.
  • cache is built lazily on first snapshotLogs(...) call by reading the disk delta since the previous call. In normal mode the cache holds at most one snapshot's worth (evictSnapshot runs immediately after each snapshot uploads). In defer mode the cache fills at end-of-build and drains as snapshots upload sequentially.
  • query(filter) is sync, flushes the buffer, chunked-reads the disk in 64 KB chunks, JSON-parses linewise, filters. Used twice at end of build (checkForNoSnapshotCommandError, sendBuildLogs).
  • Redaction stays exactly where master has itredactSecrets lives in @percy/core/src/utils.js and runs only on ci-logs at upload. Disk content matches master's in-memory shape.
  • Graceful degradation. Any disk failure (ENOSPC, RO fs, unwritable tmpdir, or PERCY_LOGS_IN_MEMORY=1) transparently falls back to master's unbounded in-memory Set. Build continues either way.

Parity guarantee

A diff of the /logs HTTP payload and per-snapshot log resource bytes between a build run on master vs this branch should be empty:

  • Same { debug, level, message, meta, timestamp, error } shape.
  • Same insertion order (JSONL append-only, single-threaded JS).
  • Same redaction (only ci-logs at upload time).
  • Same JSON.stringify semantics (entries round-trip through disk before upload, but the shape is preserved for everything that lands in real Percy log calls).

What changed

File Change
packages/logger/src/logger.js Disk + buffer + lazy snapshot cache + memory-fallback path. Add snapshotLogs, evictSnapshot, reset. Keep messages as a back-compat getter.
packages/logger/src/index.js Re-export new methods on the singleton.
packages/core/src/discovery.js 2 lines: per-snapshot resource uses logger.snapshotLogs(meta) + logger.evictSnapshot(meta).
packages/core/src/api.js 1 line: /test/api/reset calls logger.instance.reset() instead of mutating the (now-derived) messages Set.
packages/logger/test/helpers.js Tests run in unbounded in-memory mode (PERCY_LOGS_IN_MEMORY=1) by default; reset properly tears down disk state.
packages/logger/test/logger.test.js New disk-mode tests (round-trip, buffer flush triggers, snapshot index + eviction, reset, circular meta, in-memory rollback).
packages/core/test/api.test.js 1 line: same messages.clear()reset() change.

Test plan

Local

  • @percy/logger tests: 45/45 passing (37 existing + 8 new disk-mode).
  • yarn lint clean.
  • @percy/core tests: running on CI (local environment lacks built @percy/dom bundle for many discovery tests).

End-to-end (parity)

Each scenario will be run twice — once against @percy/[email protected] (latest published) and once against this branch — with logs compared via percy-support-cli, and RSS/disk/runtime monitored during the run:

  1. Baseline web build (~5 snapshots, normal mode).
  2. Baseline + PERCY_DEBUG=* (chatty logs / hot path).
  3. Medium build (~50 snapshots, normal mode) — exercises per-snapshot eviction.
  4. percy upload defer mode (~20 image snapshots) — the deferUploads gotcha.
  5. Medium build + PERCY_DEBUG=* — stress combo.
  6. PERCY_LOGS_IN_MEMORY=1 rollback — confirms master parity.
  7. Build with deliberately failing URL — verifies error/error-stack round-trip.
  8. Two builds running concurrently — verifies unique-filename and best-effort folder clear.

For each: ✅ logs match / ❌ diff lines, RSS-master vs RSS-branch (peak), runtime-master vs runtime-branch, disk file present-during / absent-after.

Known follow-ups (not in this PR)

  • Streaming /logs HTTP upload (would eliminate the end-of-build memory spike).
  • Wire bench-logger.js into CI (perf gates).

🤖 Generated with Claude Opus 4.7 (1M context) via Claude Code

@pranavz28 pranavz28 requested a review from a team as a code owner April 28, 2026 22:57
pranavz28 added a commit that referenced this pull request Apr 29, 2026
P1 fixes:
- Track evicted snapshot keys in `evicted` Set so late entries that land
  on disk after evictSnapshot don't repopulate the cache (defeats the
  bounded-memory promise in defer mode).
- Per-pid subdir under percy-logs/ instead of clearing the shared
  percy-logs/ dir at init — concurrent percy processes (CI matrix,
  parallel workers, npx) no longer clobber each other's JSONLs.
- Remove unused `messages` Set-shaped getter; api.js + percy.test.js use
  query() directly. Also drops the helpers.js force-memory-mode flag so
  the existing 60-case suite now exercises the disk path end-to-end.

P2 fixes:
- Drop the write-only snapshotIndex Map (was a per-entry write with no
  reader).
- Consolidate query() and _readAllFromDisk() into one _readDiskFiltered()
  helper; query() = flush + filter, fallback path = filter all.
- Add SIGINT/SIGTERM/uncaughtException hooks (via _installSignalHooks)
  so Ctrl-C / runner kill don't orphan the JSONL or lose buffered logs.
- Bump filename randomness from randomBytes(2) to randomBytes(8).
- Preserve meta.snapshot when stringify falls back for circular meta so
  the entry still routes through snapshotLogs().

Coverage: 100% on logger.js / index.js / utils.js / timing.js. 62/62 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Replaces @percy/logger's unbounded `messages` Set with a JSONL-backed
hybrid store that keeps resident memory bounded across long builds
(10k snapshots, 8-hour deferUploads runs) while preserving byte-for-byte
parity with master's `/logs` upload payload and per-snapshot log resources.

Design
------
- writes go through a 500-entry / 100ms buffer flushed via
  fs.appendFileSync to ${tmpdir}/percy-logs/<pid>/<ts>-<rand>.jsonl
- snapshotLogs(meta) reads the disk delta into a bounded `cache` keyed by
  snapshot meta; evictSnapshot drops the cache entry once the snapshot's
  upload is complete; late entries are allowed to repopulate (retry-safe)
- query(filter) streams the JSONL once per call (chunked 64KB read);
  in-memory mode preserves master's identity-mutation contract that
  redactSecrets relies on
- disk-init failures, mid-build appendFileSync failures, and the
  PERCY_LOGS_IN_MEMORY=1 rollback all flip to an unbounded in-memory Set
  (master parity) — including replaying entries already on disk so the
  upload still includes them
- circular meta is sanitized via JSON.stringify try/catch, but
  meta.snapshot is preserved so the entry still routes via snapshotLogs
- exit cleanup uses a process-scoped Set on `process[Symbol.for(...)]`
  shared across module copies; supports multiple live instances and
  unlinks every active disk file on `exit` / `beforeExit`
- per-pid subdir prevents concurrent percy processes (CI matrix, parallel
  workers, npx invocations) from clobbering each other's files; cleanup
  best-effort rmdirs the subdir so long-lived runners don't accumulate

Wiring
------
- packages/core/src/discovery.js — uses snapshotLogs/evictSnapshot for
  per-snapshot log resources
- packages/core/src/api.js — /test/logs and the test-mode reset path now
  use logger.query() / logger.instance.reset()
- packages/core/test/helpers/index.js — defaults setupTest to memory mode
  (master parity) so downstream tests using mockfs don't flake against
  the disk path's live volume

Tests
-----
- 68 specs, 100% statements/branches/functions/lines on logger.js
- the existing 37-case logger suite runs under the disk path by default
  (no PERCY_LOGS_IN_MEMORY set in helpers.mock); 25+ new specs in
  describe('disk-backed storage') cover round-trip, snapshotLogs / evict
  retry semantics, fallback after appendFileSync / mkdirSync failures,
  the 100ms timer, the 500-entry size cap, per-pid subdir, the
  Symbol.for latch, multi-instance cleanup, rmdir best-effort, and
  circular-meta snapshot preservation
- cli-exec / cli-snapshot / cli-build / cli-doctor / cli-upload /
  cli-command / cli / core / config / client / env / monitoring /
  webdriver-utils all green locally on Node 14

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@pranavz28 pranavz28 force-pushed the feat/per-7809-disk-logs-minimal branch from 9f3539b to 3660a7a Compare April 29, 2026 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant