feat(s3inbox): use postgres instead of in memory cache for fileIds#2382
feat(s3inbox): use postgres instead of in memory cache for fileIds#2382KarlG-nbis merged 23 commits intomainfrom
Conversation
6564cd9 to
b60f69a
Compare
|
You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool. What Enabling Code Scanning Means:
For more information about GitHub Code Scanning, check out the documentation. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2382 +/- ##
==========================================
- Coverage 42.61% 42.45% -0.16%
==========================================
Files 117 117
Lines 11844 11820 -24
==========================================
- Hits 5047 5018 -29
Misses 6134 6134
- Partials 663 668 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
🔍 Trivy Scan - SDA Services 🔍Target
|
| Package | ID | Severity | Installed Version | Fixed Version | Title |
|---|---|---|---|---|---|
github.com/gomarkdown/markdown |
CVE-2026-40890 | HIGH | v0.0.0-20260217112301-37c66b85d6ab | 0.0.0-20260411013819-759bbc3e3207 | Go Markdown has an Out-of-bounds Read in SmartypantsRenderer |
Target usr/local/bin/sda-download
No Vulnerabilities found
Target usr/local/bin/sda-finalize
No Vulnerabilities found
Target usr/local/bin/sda-ingest
No Vulnerabilities found
Target usr/local/bin/sda-intercept
No Vulnerabilities found
Target usr/local/bin/sda-mapper
No Vulnerabilities found
Target usr/local/bin/sda-notify
No Vulnerabilities found
Target usr/local/bin/sda-orchestrate
No Vulnerabilities found
Target usr/local/bin/sda-reencrypt
No Vulnerabilities found
Target usr/local/bin/sda-rotatekey
No Vulnerabilities found
Target usr/local/bin/sda-s3inbox
No Vulnerabilities found
Target usr/local/bin/sda-sync
No Vulnerabilities found
Target usr/local/bin/sda-syncapi
No Vulnerabilities found
Target usr/local/bin/sda-verify
No Vulnerabilities found
There was a problem hiding this comment.
Pull request overview
This PR updates the s3inbox service to stop relying on an in-memory fileID cache and instead consult PostgreSQL for existing file IDs, while tightening/clarifying which S3 actions are supported and reducing client-facing error detail.
Changes:
- Add
SDAdb.GetFileIDInInbox(+ unit test) to retrieve an existing file ID based on latest file event state. - Refactor
s3inboxrequest handling: explicit S3 action detection, split forwarding vs upload handling, and adjust error/logging behavior. - Update integration tests to match new error text and revised expected event-log row counts.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| sda/internal/helper/helper.go | AlwaysAllow authenticator now sets a dummy sub claim on generated tokens. |
| sda/internal/database/db_functions.go | Adds GetFileIDInInbox query helper for inbox fileID lookup. |
| sda/internal/database/db_functions_test.go | Adds a unit test covering GetFileIDInInbox. |
| sda/cmd/s3inbox/proxy.go | Major refactor: S3 request type detection, DB-backed fileID handling, forwarding helpers, and generic error responses. |
| sda/cmd/s3inbox/proxy_test.go | Updates tests to call ServeHTTP and to set sub on the test token. |
| .github/integration/tests/sda/15_s3cmd-errorcodes_test.sh | Updates expected error strings (“Forbidden” / “Unauthorized”). |
| .github/integration/tests/sda/10_upload_test.sh | Updates expected file_event_log row count after upload. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2dabff8 to
d7ed64a
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
22e2baf to
d92f958
Compare
d92f958 to
4ecec26
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
jbygdell
left a comment
There was a problem hiding this comment.
Approving as is.
We might need to redo the reupload logic after testing in the FEGA scope. Since a file can be reuploaded and reset to uploaded state as long as it has not been mapped to a dataset according to the current working model there.
- Add new DB func GetFileIDInInbox which returns the file id of a file if the latest file event deems it should exist in the inbox - Also update dedection and mapping of s3 actions - Only explicitly allow GetBucketLocation, ListObjects, ListObjectsV2, PutObject, UploadPart, CreateMultiPartUpload, CompleteMultiPartUpload, AbortMultiPartUpload, ListMultiPartUpload - Remove unecessary logging which do not provide any useful information - Update handling of s3 action to hopefully be more readable - Reduce about of information exposed on errors, ie do not return errors, etc instead log them and return generic error message - Update AlwaysAllow authenticator to set "dummy" as the subject on the token
…d headers before writing respone code, update prepareForwardPathAndQuery list actions prefix logic, fix some linter issues
…it has prefix query arg
…haviour when file not found
…tParts s3 actions
…thAndQuery if path does not follow expected structure
…iPartUpload handling
…(but check if it will be an overwrite before), and fix defer of s3Response to also ensure its closed incase error
…ing unsupported query params as unsupported no matter the http method
… other users files with s3 list actions
…errorcodes_test.sh fix to use list-objects-v2 with user expecting files Co-authored-by: Jonas Hagberg <[email protected]>
…or to user, update integration test to expect generic error, and update aws s3api cmd to use --endpoint-url arg instead of --endpoint
…rors/warning of failed requests
… instead stable_id IS NULL to rely on unique_ingested constraint, and update TestGetFileIDInInbox to test additional latest file events
Companion to the race-reproducing test on main. Drives 50 parallel PUT requests per round over 20 rounds through ServeHTTP, exercising the same upload path that used to crash with "concurrent map writes" (proxy.go:182/185/195) before this PR removed the shared fileIDs map. Under `go test -race`: - On main: fails with WARNING: DATA RACE + fatal error: concurrent map writes - On this branch (DB-backed fileID lookup): passes cleanly Acts as a regression guard against reintroducing unsynchronised shared state on the upload path. Addresses the third confirmation criterion in ADR-0003 (#2319). Test setup notes: - Uses real MinIO s3Client (not FakeServer, which has its own unsynchronised `pinged` field and would produce unrelated race reports). - Uses helper.NewAlwaysAllow() which now sets sub="dummy" on this branch. - Sets proxy.s3Conf.Endpoint=":" to make forwardRequestToBackend fail fast after the upload path does its DB lookup and bookkeeping, so the test exercises only the code path of interest.
6e67bfe to
22d3ae7
Compare
Related issue(s) and PR(s)
This PR closes #2375.
Description
Add new DB func GetFileIDInInbox which returns the file id of a file if the latest file event deems it should exist in the inbox
Also update dedection and mapping of s3 actions
Remove unecessary logging which do not provide any useful information
Update handling of s3 action to hopefully be more readable
Reduce information exposed on errors, ie do not return errors, etc instead log them and return generic error message
Update AlwaysAllow authenticator to set "dummy" as the subject on the token
ADR-0003 states to specifically use GetFileIDByUserPathAndStatus, but instead new GetFileIDInInbox with different filter semantics was used to ensure that we find files based on all file event states that indicate that a file should be in the inbox.
How to test