Skip to content

feat(s3inbox): use postgres instead of in memory cache for fileIds#2382

Merged
KarlG-nbis merged 23 commits intomainfrom
feature/s3inbox_replace_memory_cache_with_postgres
Apr 16, 2026
Merged

feat(s3inbox): use postgres instead of in memory cache for fileIds#2382
KarlG-nbis merged 23 commits intomainfrom
feature/s3inbox_replace_memory_cache_with_postgres

Conversation

@KarlG-nbis
Copy link
Copy Markdown
Contributor

@KarlG-nbis KarlG-nbis commented Apr 10, 2026

Related issue(s) and PR(s)

This PR closes #2375.

Description

  • Add new DB func GetFileIDInInbox which returns the file id of a file if the latest file event deems it should exist in the inbox

  • Also update dedection and mapping of s3 actions

    • Only explicitly allow GetBucketLocation, ListObjects, ListObjectsV2, PutObject, UploadPart, CreateMultiPartUpload, CompleteMultiPartUpload, AbortMultiPartUpload, ListMultiPartUpload, ListParts
  • Remove unecessary logging which do not provide any useful information

  • Update handling of s3 action to hopefully be more readable

  • Reduce information exposed on errors, ie do not return errors, etc instead log them and return generic error message

  • Update AlwaysAllow authenticator to set "dummy" as the subject on the token

ADR-0003 states to specifically use GetFileIDByUserPathAndStatus, but instead new GetFileIDInInbox with different filter semantics was used to ensure that we find files based on all file event states that indicate that a file should be in the inbox.

How to test

  • Unit tests pass
  • Integration tests pass

@KarlG-nbis KarlG-nbis force-pushed the feature/s3inbox_replace_memory_cache_with_postgres branch from 6564cd9 to b60f69a Compare April 10, 2026 09:46
@github-advanced-security
Copy link
Copy Markdown

You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool.

What Enabling Code Scanning Means:

  • The 'Security' tab will display more code scanning analysis results (e.g., for the default branch).
  • Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results.
  • You will be able to see the analysis results for the pull request's branch on this overview once the scans have completed and the checks have passed.

For more information about GitHub Code Scanning, check out the documentation.

@KarlG-nbis KarlG-nbis requested a review from Copilot April 10, 2026 09:46
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 10, 2026

Codecov Report

❌ Patch coverage is 70.35176% with 59 lines in your changes missing coverage. Please review.
✅ Project coverage is 42.45%. Comparing base (2283e71) to head (22d3ae7).
⚠️ Report is 44 commits behind head on main.

Files with missing lines Patch % Lines
sda/cmd/s3inbox/proxy.go 69.14% 36 Missing and 18 partials ⚠️
sda/internal/helper/helper.go 0.00% 4 Missing ⚠️
sda/internal/database/db_functions.go 95.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2382      +/-   ##
==========================================
- Coverage   42.61%   42.45%   -0.16%     
==========================================
  Files         117      117              
  Lines       11844    11820      -24     
==========================================
- Hits         5047     5018      -29     
  Misses       6134     6134              
- Partials      663      668       +5     
Flag Coverage Δ
unittests 42.45% <70.35%> (-0.16%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sda/internal/database/db_functions.go 71.92% <95.00%> (+0.47%) ⬆️
sda/internal/helper/helper.go 27.27% <0.00%> (-0.38%) ⬇️
sda/cmd/s3inbox/proxy.go 72.24% <69.14%> (-4.06%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 10, 2026

🔍 Trivy Scan - SDA Services 🔍

Target ghcr.io/neicnordic/sensitive-data-archive:PR2382 (debian 13.4)

No Vulnerabilities found

Target usr/local/bin/sda-api

No Vulnerabilities found

Target usr/local/bin/sda-auth

Vulnerabilities (1)

Package ID Severity Installed Version Fixed Version Title
github.com/gomarkdown/markdown CVE-2026-40890 HIGH v0.0.0-20260217112301-37c66b85d6ab 0.0.0-20260411013819-759bbc3e3207 Go Markdown has an Out-of-bounds Read in SmartypantsRenderer

Target usr/local/bin/sda-download

No Vulnerabilities found

Target usr/local/bin/sda-finalize

No Vulnerabilities found

Target usr/local/bin/sda-ingest

No Vulnerabilities found

Target usr/local/bin/sda-intercept

No Vulnerabilities found

Target usr/local/bin/sda-mapper

No Vulnerabilities found

Target usr/local/bin/sda-notify

No Vulnerabilities found

Target usr/local/bin/sda-orchestrate

No Vulnerabilities found

Target usr/local/bin/sda-reencrypt

No Vulnerabilities found

Target usr/local/bin/sda-rotatekey

No Vulnerabilities found

Target usr/local/bin/sda-s3inbox

No Vulnerabilities found

Target usr/local/bin/sda-sync

No Vulnerabilities found

Target usr/local/bin/sda-syncapi

No Vulnerabilities found

Target usr/local/bin/sda-verify

No Vulnerabilities found

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the s3inbox service to stop relying on an in-memory fileID cache and instead consult PostgreSQL for existing file IDs, while tightening/clarifying which S3 actions are supported and reducing client-facing error detail.

Changes:

  • Add SDAdb.GetFileIDInInbox (+ unit test) to retrieve an existing file ID based on latest file event state.
  • Refactor s3inbox request handling: explicit S3 action detection, split forwarding vs upload handling, and adjust error/logging behavior.
  • Update integration tests to match new error text and revised expected event-log row counts.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
sda/internal/helper/helper.go AlwaysAllow authenticator now sets a dummy sub claim on generated tokens.
sda/internal/database/db_functions.go Adds GetFileIDInInbox query helper for inbox fileID lookup.
sda/internal/database/db_functions_test.go Adds a unit test covering GetFileIDInInbox.
sda/cmd/s3inbox/proxy.go Major refactor: S3 request type detection, DB-backed fileID handling, forwarding helpers, and generic error responses.
sda/cmd/s3inbox/proxy_test.go Updates tests to call ServeHTTP and to set sub on the test token.
.github/integration/tests/sda/15_s3cmd-errorcodes_test.sh Updates expected error strings (“Forbidden” / “Unauthorized”).
.github/integration/tests/sda/10_upload_test.sh Updates expected file_event_log row count after upload.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sda/cmd/s3inbox/proxy.go Outdated
Comment thread sda/cmd/s3inbox/proxy.go Outdated
Comment thread sda/cmd/s3inbox/proxy.go Outdated
Comment thread sda/internal/database/db_functions.go
@KarlG-nbis KarlG-nbis force-pushed the feature/s3inbox_replace_memory_cache_with_postgres branch from 2dabff8 to d7ed64a Compare April 10, 2026 10:05
@KarlG-nbis KarlG-nbis requested a review from Copilot April 10, 2026 10:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sda/internal/database/db_functions.go
Comment thread sda/internal/database/db_functions.go
Comment thread sda/cmd/s3inbox/proxy.go Outdated
Comment thread sda/cmd/s3inbox/proxy.go Outdated
Comment thread sda/cmd/s3inbox/proxy.go Outdated
Comment thread sda/cmd/s3inbox/proxy.go Outdated
@KarlG-nbis KarlG-nbis marked this pull request as ready for review April 10, 2026 10:27
@KarlG-nbis KarlG-nbis requested a review from a team as a code owner April 10, 2026 10:27
Copilot AI review requested due to automatic review settings April 10, 2026 10:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sda/cmd/s3inbox/proxy.go Outdated
Comment thread sda/cmd/s3inbox/proxy.go Outdated
Comment thread sda/cmd/s3inbox/proxy.go Outdated
Comment thread sda/cmd/s3inbox/proxy_test.go
Copilot AI review requested due to automatic review settings April 10, 2026 10:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sda/cmd/s3inbox/proxy.go Outdated
Comment thread sda/cmd/s3inbox/proxy.go
Comment thread sda/cmd/s3inbox/proxy.go
Copilot AI review requested due to automatic review settings April 10, 2026 11:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sda/cmd/s3inbox/proxy.go Outdated
@KarlG-nbis KarlG-nbis force-pushed the feature/s3inbox_replace_memory_cache_with_postgres branch from 22e2baf to d92f958 Compare April 10, 2026 12:26
Copilot AI review requested due to automatic review settings April 10, 2026 12:32
@KarlG-nbis KarlG-nbis force-pushed the feature/s3inbox_replace_memory_cache_with_postgres branch from d92f958 to 4ecec26 Compare April 10, 2026 12:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sda/cmd/s3inbox/proxy.go
Comment thread sda/cmd/s3inbox/proxy.go
Comment thread sda/cmd/s3inbox/proxy.go
pahatz
pahatz previously approved these changes Apr 10, 2026
Copy link
Copy Markdown
Contributor

@pahatz pahatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Comment thread sda/cmd/s3inbox/proxy.go
Comment thread sda/cmd/s3inbox/proxy.go Outdated
Copy link
Copy Markdown
Collaborator

@jbygdell jbygdell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as is.

We might need to redo the reupload logic after testing in the FEGA scope. Since a file can be reuploaded and reset to uploaded state as long as it has not been mapped to a dataset according to the current working model there.

@KarlG-nbis KarlG-nbis enabled auto-merge April 16, 2026 10:18
KarlG-nbis and others added 23 commits April 16, 2026 12:19
- Add new DB func GetFileIDInInbox which returns the file id of a file if the latest file event deems it should exist in the inbox

- Also update dedection and mapping of s3 actions
  - Only explicitly allow GetBucketLocation, ListObjects, ListObjectsV2, PutObject, UploadPart, CreateMultiPartUpload, CompleteMultiPartUpload, AbortMultiPartUpload, ListMultiPartUpload

- Remove unecessary logging which do not provide any useful information

- Update handling of s3 action to hopefully be more readable

- Reduce about of information exposed on errors, ie do not return errors, etc instead log them and return generic error message

- Update AlwaysAllow authenticator to set "dummy" as the subject on the token
…d headers before writing respone code, update prepareForwardPathAndQuery list actions prefix logic, fix some linter issues
…thAndQuery if path does not follow expected structure
…(but check if it will be an overwrite before), and fix defer of s3Response to also ensure its closed incase error
…ing unsupported query params as unsupported no matter the http method
…errorcodes_test.sh fix to use list-objects-v2 with user expecting files

Co-authored-by: Jonas Hagberg <[email protected]>
…or to user, update integration test to expect generic error, and update aws s3api cmd to use --endpoint-url arg instead of --endpoint
… instead stable_id IS NULL to rely on unique_ingested constraint, and update TestGetFileIDInInbox to test additional latest file events
Companion to the race-reproducing test on main. Drives 50 parallel PUT
requests per round over 20 rounds through ServeHTTP, exercising the
same upload path that used to crash with "concurrent map writes"
(proxy.go:182/185/195) before this PR removed the shared fileIDs map.

Under `go test -race`:
- On main: fails with WARNING: DATA RACE + fatal error: concurrent map writes
- On this branch (DB-backed fileID lookup): passes cleanly

Acts as a regression guard against reintroducing unsynchronised shared
state on the upload path. Addresses the third confirmation criterion in
ADR-0003 (#2319).

Test setup notes:
- Uses real MinIO s3Client (not FakeServer, which has its own
  unsynchronised `pinged` field and would produce unrelated race reports).
- Uses helper.NewAlwaysAllow() which now sets sub="dummy" on this branch.
- Sets proxy.s3Conf.Endpoint=":" to make forwardRequestToBackend fail
  fast after the upload path does its DB lookup and bookkeeping, so the
  test exercises only the code path of interest.
@KarlG-nbis KarlG-nbis force-pushed the feature/s3inbox_replace_memory_cache_with_postgres branch from 6e67bfe to 22d3ae7 Compare April 16, 2026 10:19
@KarlG-nbis KarlG-nbis added this pull request to the merge queue Apr 16, 2026
Merged via the queue into main with commit 1b1dcb5 Apr 16, 2026
25 checks passed
@KarlG-nbis KarlG-nbis deleted the feature/s3inbox_replace_memory_cache_with_postgres branch April 16, 2026 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement ADR-0003: Replace s3inbox in-memory file ID cache with postgreSQL

8 participants