Skip to content

fix(metadata): replace ReadTimeout with ReadHeaderTimeout on metadata server#38

Closed
prosdev wants to merge 1 commit intoantflydb:mainfrom
prosdev:fix/metadata-read-header-timeout
Closed

fix(metadata): replace ReadTimeout with ReadHeaderTimeout on metadata server#38
prosdev wants to merge 1 commit intoantflydb:mainfrom
prosdev:fix/metadata-read-header-timeout

Conversation

@prosdev
Copy link
Copy Markdown

@prosdev prosdev commented Apr 2, 2026

Summary

Replace ReadTimeout: 10 * time.Second with ReadHeaderTimeout: 30 * time.Second on the metadata HTTP server. This fixes large Linear Merge payloads failing with JSON decode errors.

Fixes #37

Problem

The metadata server's ReadTimeout applies to the entire HTTP request (headers + body). Large Linear Merge payloads (~7MB for 6,000 records) can't be fully transmitted within 10 seconds, causing the server to close the connection mid-read:

decoding request: json: string unexpected end of JSON input

Fix

ReadHeaderTimeout limits only header reading (protection against slowloris attacks) while allowing request bodies to take as long as needed. This matches the pattern already used by:

  • Raft server: ReadHeaderTimeout: 30 * time.Second (src/raft/multiraft.go:268)
  • Health server: ReadHeaderTimeout: 40 * time.Second (pkg/libaf/healthserver/healthserver.go:63)

Linear Merge still enforces MaxRecordsPerRequest = 10,000 — no size protection is removed.

Changes

File Change
src/metadata/runner.go:76 ReadTimeout: 10sReadHeaderTimeout: 30s
e2e/swarm.go:289 Same fix for E2E test server
src/metadata/api_linear_merge_test.go 2 new tests: large payload encoding (5k records), key sorting at scale

Testing

  • All 17 Linear Merge tests pass (15 existing + 2 new)
  • Manual: indexed cli/cli (830 Go files, 5,933 components) using dev-agent — succeeded where it previously failed


// TestLinearMerge_KeySortingLargeScale verifies key extraction and sorting
// at scale (5,000 records), matching the handler's sort behavior.
func TestLinearMerge_KeySortingLargeScale(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly does this fail without the patch?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation was E2E: cli/cli (5,933 records) fails with ReadTimeout=10s, works with ReadHeaderTimeout=30s.

… server

ReadTimeout=10s applied to the entire HTTP request (headers + body),
causing JSON decode failures on large Linear Merge payloads. With ~6k
records (~7MB), the server closed the connection mid-read, producing:

  decoding request: json: string unexpected end of JSON input

ReadHeaderTimeout=30s limits only header reading, allowing large request
bodies to complete. This matches the pattern used by the Raft server
(30s) and Health server (40s) in this codebase.

Tested: cli/cli (830 Go files, 5,933 components) now indexes
successfully via dev-agent where it previously failed.

Fixes antflydb#37
@prosdev prosdev force-pushed the fix/metadata-read-header-timeout branch from 3daa34a to bc04ecf Compare April 2, 2026 02:26
@ajroetker
Copy link
Copy Markdown
Contributor

Thanks for the fix @prosdev I'll put this fix in the changelog with credit to you when I release, but I wanted to bump it up a little higher and abstract the server for clarity and add gzip to handle a little bit bigger payloads too #39

@ajroetker ajroetker closed this Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Linear Merge fails on large payloads (~6k+ records) — JSON decode error

2 participants