Skip to content

Allow uploading mags, attachments to existing datasets#9402

Open
fm3 wants to merge 44 commits intomasterfrom
upload-mags-attachments
Open

Allow uploading mags, attachments to existing datasets#9402
fm3 wants to merge 44 commits intomasterfrom
upload-mags-attachments

Conversation

@fm3
Copy link
Copy Markdown
Member

@fm3 fm3 commented Mar 16, 2026

  • Introduced API version 14, changes to dataset upload routes
  • Allow uploading attachments and mags via reserve/chunk/finish resumableUpload routes, analogous to dataset upload. Used by corresponding libs PR. Upload mags+attachments to existing dataset, adapt to wk api 14 webknossos-libs#1450
  • reserveAttachmentUploadToPath now takes additional parameter overwritePending (in request body), analogous to reserveMagUploadToPath
  • refactored the upload server code (extracted UploadMetadataStore, introduced UploadDomain enum)

Steps to test:

TODOs:

  • Refactor upload data (json in redis?)
  • add routes for mags, attachments
    • reserve
      • reserve them in postgres (adapt queries, add evolution)
      • allow only for virtual?
      • introduce overwritePending param; if set, delete pending (also pending as uploadToPaths)
    • uploadChunk, testchunk
    • where to put mags, attachments during upload?
    • finish
      • where to put them in the end?
      • update in postgres
      • clean up
  • LegacyAPIController
    • bump version
    • adapt existing routes for dataset upload
    • libs use unversioned! need to provide backwards compat without version number
  • adapt frontend
  • adapt tests
  • adapt libs to test
  • test legacy dataset upload as well as uploadToPaths with old libs

Issues:


  • Added changelog entry (create a $PR_NUMBER.md file in unreleased_changes or use ./tools/create-changelog-entry.py)
  • Added migration guide entry if applicable (edit the same file as for the changelog)
  • Removed dev-only changes like prints and application.conf edits
  • Considered common edge cases
  • Needs datastore update after deployment

@fm3 fm3 self-assigned this Mar 16, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 16, 2026

📝 Walkthrough

Walkthrough

Implement upload endpoints for dataset mags and attachments, refactor DAO names and pending-upload tracking, introduce domain-partitioned upload metadata and services, bump API version to 14, and update frontend, routes, and database schema for the new upload domain model.

Changes

Cohort / File(s) Summary
DAO Layer & Models
app/models/dataset/Dataset.scala, app/models/dataset/DatasetService.scala, app/models/storage/UsedStorageService.scala
Renamed DatasetMagsDAODatasetMagDAO and DatasetLayerAttachmentsDAODatasetLayerAttachmentDAO. Added uploadIsPending column handling; split pending insert semantics into insertWithUploadToPathPending / insertWithUploadPending, added findOneWithPendingUpload* helpers and unified finishUpload semantics. Updated constructor injections.
Dataset & Remote Controllers
app/controllers/DatasetController.scala, app/controllers/WKRemoteDataStoreController.scala, app/controllers/LegacyApiController.scala
Added overwritePending: Option[Boolean] to ReserveAttachmentUploadToPathRequest. DatasetController now checks findOneWithPendingUploadToPath(...) and fails early when no pending upload exists; finishUploadToPath argument order adjusted for attachments. WKRemoteDataStoreController and legacy controllers gained mag/attachment reserve/report endpoints and model types.
Upload Services & Metadata
app/models/dataset/UploadToPathsService.scala, webknossos-datastore/.../UploadService.scala, webknossos-datastore/.../UploadMetadataStore.scala, webknossos-datastore/.../UploadDomain.scala
Introduced domain-aware upload flow (dataset, mag, attachment). Added Redis-backed per-domain metadata stores and refactored UploadService to select per-domain stores and domain-specific reserve/finish/cancel APIs. UploadToPathsService now handles existing-pending collision/overwrite logic and exposes attachment/mag handlers.
Datastore Controller, Client & Helpers
webknossos-datastore/.../UploadController.scala, webknossos-datastore/.../DSLegacyApiController.scala, webknossos-datastore/.../DSRemoteWebknossosClient.scala, webknossos-datastore/.../DatasetDeleter.scala, webknossos-datastore/.../DataSourceService.scala
Added new UploadController exposing domain-aware endpoints; DSLegacyApiController delegates v13 flows to UploadController. DSRemoteWebknossosClient gained report/reserve methods for mags and attachments. DatasetDeleter and DataSourceService signatures adjusted (deleteOnDisk path param; ensureDataDirWritable uses DataSourceId).
Redis & Temporary Stores
webknossos-datastore/.../RedisTemporaryStore.scala
Switched to RedisClientPool with client-scoped handler; added JSON serialize/parse helpers; changed some find/findLong return types to non-Option variants.
Frontend: Upload API & Views
frontend/javascripts/admin/rest_api.ts, frontend/javascripts/admin/dataset/dataset_upload_view.tsx, frontend/javascripts/test/backend_snapshot_tests/datasets.e2e.ts
Restructured reservation payloads to ResumableUploadInfo + DatasetUploadInfo; updated endpoints to /data/datasets/upload/dataset/*. finish/cancel APIs simplified (finish accepts uploadId query and returns { datasetId }). Updated E2E tests and UI flow accordingly.
SQL, Schema & Migrations
schema/schema.sql, schema/evolutions/161-upload-mags-attachments.sql, schema/evolutions/reversions/161-upload-mags-attachments.sql
Added uploadIsPending BOOLEAN NOT NULL DEFAULT FALSE to dataset_mags and dataset_layer_attachments; bumped schema version 160→161 and added reversible migration.
Routes & API Versioning
conf/webknossos.latest.routes, conf/webknossos.versioned.routes, webknossos-datastore/conf/datastore.latest.routes, webknossos-datastore/conf/datastore.versioned.routes, webknossos-tracingstore/conf/tracingstore.versioned.routes, util/src/main/scala/.../ApiVersioning.scala
Added/renamed upload routes for dataset/mag/attachment reservations and reports; replaced unfinished-upload route names; added /v14/ mappings and bumped CURRENT_API_VERSION from 13→14.
Misc / Utilities
app/utils/sql/SimpleSQLDAO.scala, unreleased_changes/9402.md, other small model helpers
Expanded retryable SQL error list to include "cache lookup failed for type". Added release note and small helpers (e.g., MagLocator.withoutCredentials, LayerAttachment.withoutCredential).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested labels

refactoring

Suggested reviewers

  • MichaelBuessemeyer

Poem

🐰 I hopped through routes and Redis stacks,

Pending mags and attachments in my pack.
v14 I scurried to proudly lend a paw,
DAOs renamed, migrations filed with awe.
A carrot-sized update — hop, deploy, hurrah!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'Allow uploading mags, attachments to existing datasets' directly and clearly describes the main change: enabling uploads of mags and attachments to existing datasets.
Linked Issues check ✅ Passed The PR partially addresses issue #9278 objectives by implementing the resumable upload protocol for mags and attachments (add_mag_as_copy and add_attachment_as_copy with HTTP transfer mode), but does not implement reference operations or attachment deletion logic mentioned in the issue.
Out of Scope Changes check ✅ Passed All code changes are scoped to uploading mags and attachments via resumable upload, API versioning, route changes, and related supporting infrastructure (DAO updates, metadata stores, services). Changes align with PR objectives and linked issue requirements for upload functionality.
Description check ✅ Passed The pull request description clearly relates to the changeset, describing the introduction of API version 14, enabling uploads of attachments and mags, refactoring upload server code, and providing implementation details for new features.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch upload-mags-attachments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@fm3 fm3 mentioned this pull request Mar 18, 2026
13 tasks
fm3 added a commit that referenced this pull request Mar 30, 2026
Allow more dataset mutation via API, as used by the python libs.

### Steps to test:
- Test CRUD via API
- Test that old compose via frontend still works (renamed some api
fields)
- Corresponding libs PR: #1438

### TODOs:
- [x] RemoteAttachments.delete_attachment
- [x] Backend needs to respect missing attachments in updatePartial call
- [x] ensure frontend or older libs versions don’t already skip the
attachments?
- ensure libs don’t drop attachments on updates when opened via zarr
streaming? →LIBS
- [x] bump API version to v13, ignore attachment changes on older
(otherwise we lose attachments unwanted)
- [x] RemoteAttachments.rename_attachment
- [x] RemoteDataset.add_layer_as_ref (linked layer)
  - Needs new protocol similar to compose (ensure permission checks)
- [x] RemoteLayer.add_mag_as_ref (linked mag)
  - Same as above
- [x] RemoteAttachments.add_attachments_as_ref
   - Same as above
- [x] Determine which updates are allowed for non-virtual datasets, make
sure those work, make sure others are rejected

### Follow-up

- RemoteLayer.add_mag_as_copy with transfer_mode=HTTP
  - Needs new upload protocol. Use resumable here too?
- RemoteAttachments.add_attachment_as_copy with transfer_mode=HTTP
  - Needs new upload protocol (analog to mag)

Both will follow in #9402

### Issues:
- contributes to #9278

------
- [x] Added changelog entry (create a `$PR_NUMBER.md` file in
`unreleased_changes` or use `./tools/create-changelog-entry.py`)
- [x] Removed dev-only changes like prints and application.conf edits
- [x] Considered [common edge
cases](../blob/master/.github/common_edge_cases.md)
- [x] Needs datastore update after deployment

---------

Co-authored-by: Norman Rzepka <code@normanrz.com>
@fm3 fm3 changed the title WIP Allow uploading mags, attachments to existing datasets Allow uploading mags, attachments to existing datasets Apr 9, 2026
@fm3 fm3 marked this pull request as ready for review April 9, 2026 11:55
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Nitpick comments (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RedisTemporaryStore.scala (2)

85-89: Redundant implicit ExecutionContext parameter.

The implicit ec: ExecutionContext parameter on findParsed is redundant since the trait already declares implicit def ec: ExecutionContext at line 12.

♻️ Suggested fix
-  def findParsed[T: Reads](key: String)(implicit ec: ExecutionContext): Fox[T] =
+  def findParsed[T: Reads](key: String): Fox[T] =
     for {
       objectString <- find(key)
       parsed <- JsonHelper.parseAs[T](objectString).toFox
     } yield parsed
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RedisTemporaryStore.scala`
around lines 85 - 89, The method findParsed currently declares an unnecessary
implicit parameter implicit ec: ExecutionContext even though the trait defines
implicit def ec: ExecutionContext; remove the implicit parameter from
findParsed's signature so it simply reads def findParsed[T: Reads](key: String):
Fox[T], and update any internal references (the for-comprehension using find and
JsonHelper.parseAs) to use the trait-level implicit ec; keep the method body
unchanged otherwise and adjust any call sites if they explicitly passed an
ExecutionContext.

18-22: Consider simplifying the Fox chaining.

The pattern withExceptionHandler(_.get(id)).map(_.toFox).flatten works but could be more idiomatic using flatMap:

♻️ Suggested simplification
 def find(id: String): Fox[String] =
-    withExceptionHandler(_.get(id)).map(_.toFox).flatten
+    withExceptionHandler(_.get(id)).flatMap(_.toFox)

 def findLong(id: String): Fox[Long] =
-    withExceptionHandler(_.get(id).map(s => s.toLong)).map(_.toFox).flatten
+    withExceptionHandler(_.get(id).map(_.toLong)).flatMap(_.toFox)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RedisTemporaryStore.scala`
around lines 18 - 22, The current implementations of find and findLong use
map(...).flatten on the Fox returned by withExceptionHandler; replace the
map(...).flatten pattern with flatMap for clarity and idiomatic Scala: change
withExceptionHandler(_.get(id)).map(_.toFox).flatten in find to
withExceptionHandler(_.get(id)).flatMap(_.toFox) (or equivalent) and similarly
replace withExceptionHandler(_.get(id).map(s => s.toLong)).map(_.toFox).flatten
in findLong with a flatMap-based version so both methods use flatMap instead of
map(...).flatten.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadDomain.scala (1)

3-3: Unnecessary semicolon in import statement.

♻️ Proposed fix
-import com.scalableminds.util.enumeration.ExtendedEnumeration;
+import com.scalableminds.util.enumeration.ExtendedEnumeration
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadDomain.scala`
at line 3, The import statement "import
com.scalableminds.util.enumeration.ExtendedEnumeration;" contains an unnecessary
trailing semicolon; open UploadDomain.scala and remove the semicolon so the
import reads "import com.scalableminds.util.enumeration.ExtendedEnumeration"
(ensure no other imports in the file are modified inadvertently and run a quick
compile to verify).
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)

940-961: Consider: Orphan cleanup checks all domains but directories are domain-specific.

The cleanUpOrphanUploadsForOrgaAndDomain method builds a domain-specific directory path (line 941) but then calls isKnownUploadOfAnyDomain which checks all three metadata stores. Since upload directories are partitioned by domain, checking only the relevant domain's store would be more efficient. However, this defensive approach is safe if upload IDs could theoretically overlap.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala`
around lines 940 - 961, The cleanup loop in cleanUpOrphanUploadsForOrgaAndDomain
builds a domain-specific path (uploadDomain) but calls isKnownUploadOfAnyDomain,
which redundantly checks all domains; change it to perform a domain-specific
existence check instead by using uploadDomain to pick the correct metadata-store
predicate (e.g., switch on uploadDomain and call the corresponding method like
isKnownUploadInTransferStore / isKnownUploadInDatasetStore /
isKnownUploadInVolumeStore or add a new helper isKnownUploadInDomain(uploadId,
uploadDomain)), and replace the
isKnownUploadOfAnyDomain(uploadDir.getFileName.toString) call with that
domain-scoped check so only the relevant store is consulted while preserving the
same delete logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/controllers/DatasetController.scala`:
- Around line 730-731: The code clears uploadToPathIsPending before verifying
it, causing findOneWithPendingUploadToPath to fail; change the order so you
first call datasetMagsDAO.findOneWithPendingUploadToPath(datasetId,
request.body.layerName, request.body.mag) and only if that succeeds call
datasetMagsDAO.finishUploadToPath(datasetId, request.body.layerName,
request.body.mag) (or alternatively have finishUploadToPath return/validate the
previous pending state), ensuring uploadToPathIsPending is checked before being
cleared.

In `@app/controllers/LegacyApiController.scala`:
- Around line 49-53: Remove the dead model by deleting the
LegacyUploadInformation case class and its companion object (the implicit
jsonFormat) from the LegacyApiController file; ensure there are no remaining
references to LegacyUploadInformation in the codebase (if any exist, point them
to the datastore’s implementation with the needsConversion field), remove any
now-unused imports introduced only for this model, and run tests/compile to
confirm no other code depends on this obsolete definition.

In `@app/controllers/WKRemoteDataStoreController.scala`:
- Around line 218-255: Add the same ownership and storage-refresh steps used by
the reserve handlers and reportDatasetUpload to both reportMagUpload and
reportAttachmentUpload: after loading dataset in reportMagUpload and
reportAttachmentUpload, verify dataset._dataStore == name and return a
NOT_FOUND/forbidden-like error if it does not match; after calling finishUpload
on datasetMagDAO/datasetAttachmentDAO, call the same datasetService
refresh-used-storage method used by reportDatasetUpload (preserving
GlobalAccessContext) to update quota/used storage, then proceed to obtain
dataStoreClient and invalidate the DS cache. Reference: reportMagUpload,
reportAttachmentUpload, dataset._dataStore, finishUpload,
datasetService.clientFor, and the refresh-used-storage call used in
reportDatasetUpload.
- Around line 135-153: The current guard only rejects attachments with the same
type AND name, allowing duplicate singleton-type attachments (e.g.,
"segmentIndex", "cumsum") with different names; update the validation before
calling datasetAttachmentDAO.insertWithUploadPending to also reject when any
attachment of a singleton type already exists on the layer (ignore name) or when
a pending singleton already exists. Concretely, in the flow around
datasetService.getDataSourceAndLayerFor / existingAttachmentOpt /
uploadToPathsService.handleExistingPendingAttachment, add a check for singleton
types (e.g., a Set("segmentIndex","cumsum")) that verifies dataLayer.attachments
has no attachment.getByType(request.body.attachmentType) (or flatMap/.exists)
and fail with a clear message if one exists, so insertWithUploadPending cannot
create a second singleton entry.

In `@app/models/dataset/Dataset.scala`:
- Around line 956-978: The two insert helpers insertWithUploadToPathPending and
insertWithUploadPending must persist credentialId like the non-WKW branch in
updateMags; add a credentialId: Option[ObjectId] parameter to both methods and
include credentialId in the INSERT column list and VALUES expression (e.g. add
credentialId column and use ${credentialId} in the q"""...VALUES(...)""" for
each method) so pending uploads on credential-backed datasets retain the locator
credential; ensure the inserted column name matches the table schema and that
parameter is threaded through callers consistent with how updateMags handles
credentialId.

In `@app/models/dataset/UploadToPathsService.scala`:
- Around line 302-303: The second lookup incorrectly calls
datasetMagDAO.findOneWithPendingUploadToPath again, so withPendingUploadBox
never detects an already pending mag; change the second call to
datasetMagDAO.findOneWithPendingUpload (keeping the first call to
findOneWithPendingUploadToPath for withPendingUploadToPathsBox) so
withPendingUploadBox correctly reflects any existing uploadIsPending mag and
causes an immediate conflict instead of deferring to the DB insert/constraint.
- Around line 323-334: The overwritePending branch currently deletes the DB row
via datasetLayerAttachmentsDAO.deletePendingAttachment but doesn't remove the
previously reserved attachment path, causing orphaned storage; update the
overwritePending path in UploadToPathsService so that before or immediately
after calling datasetLayerAttachmentsDAO.deletePendingAttachment you also call
the service that deletes reserved storage for the previous attachment path (the
same cleanup used by the mag flow), passing the previously fetched
withPendingUploadToPathsBox/withPendingUploadBox path info to remove the file;
ensure you lookup the prior path from withPendingUploadToPathsBox or
withPendingUploadBox and perform the delete operation (same helper used
elsewhere) then proceed with database deletion to avoid leaking reserved files.

In `@conf/webknossos.versioned.routes`:
- Line 7: Fix the typo in the comment string "# chnaged in v14: Dataset upload
routes and parameters have been refactored, introduced upload domain" by
replacing "chnaged" with "changed" so the line reads "# changed in v14: Dataset
upload routes and parameters have been refactored, introduced upload domain".

In `@frontend/javascripts/test/backend_snapshot_tests/datasets.e2e.ts`:
- Around line 197-205: The reserved filename in resumableUploadInfo.filePaths
("test-dataset-upload.zip") doesn't match the actual filename used by the chunk
upload ("test-dataset.zip"), so update resumableUploadInfo.filePaths to use
"test-dataset.zip" to match the upload flow (ensure the resumableUploadInfo
object — specifically the filePaths array — and the later chunk upload filename
are identical).

In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala`:
- Around line 17-20: The deleteOnDisk method uses the caller-provided path
(dataSourcePath) without validating it; normalize and canonicalize the incoming
path and enforce it is inside the expected base directory before performing any
move/delete. Specifically, for deleteOnDisk, compute a normalized/canonical Path
(e.g. path.getOrElse(...).toAbsolutePath.normalize or preferably
toRealPath(LinkOption.NOFOLLOW_LINKS)) and verify it startsWith the expected
root (dataBaseDir.resolve(organizationId).toRealPath(...)); if the check fails,
return a failed Fox with a clear error rather than proceeding with deletion;
apply the same normalization/check to any intermediate paths used in the method
to guard against .. segments or symlink escapes.

In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadMetadataStore.scala`:
- Around line 179-199: MagUploadMetadataStore currently inserts Redis keys for
mag and layerName but doesn't override cleanUp, leaving orphaned keys; add an
override def cleanUp(uploadId: String)(implicit ec: ExecutionContext): Fox[Unit]
in the MagUploadMetadataStore class that deletes the two keys produced by
redisKeyForMag(uploadId) and redisKeyForLayerName(uploadId) (e.g. via
store.delete or store.deleteMany) and then delegates to the parent cleanUp (or
ensures other domain cleanup runs), returning the appropriate Fox[Unit].
- Around line 201-230: AttachmentUploadMetadataStore is missing an override of
cleanUp, so attachment-related Redis keys remain orphaned; add a
cleanUp(uploadId: String) override in class AttachmentUploadMetadataStore that
deletes the three keys generated by redisKeyForAttachment(uploadId),
redisKeyForAttachmentType(uploadId) and redisKeyForLayerName(uploadId) (use the
store.delete or equivalent method used elsewhere and return the appropriate
Fox[Unit] result), and ensure the method signature matches the base
UploadMetadataStore.cleanUp contract and composes/delegates to any superclass
cleanup if needed.

In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala`:
- Around line 416-421: The call inside finishMagUpload to
moveUnpackedMagOrAttachmentToTarget is passing the wrong domain
(UploadDomain.attachment) which causes incorrect handling; update the call in
finishMagUpload to pass UploadDomain.mag instead so
moveUnpackedMagOrAttachmentToTarget, any logging and downstream behavior
correctly treat the item as a mag rather than an attachment (look for the
invocation with parameters unpackToDir, layerName, datasetId, dataSourceId,
s"${mag.mag.toMagLiteral(true)}__${ObjectId.generate}" and replace the final
parameter).

---

Nitpick comments:
In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadDomain.scala`:
- Line 3: The import statement "import
com.scalableminds.util.enumeration.ExtendedEnumeration;" contains an unnecessary
trailing semicolon; open UploadDomain.scala and remove the semicolon so the
import reads "import com.scalableminds.util.enumeration.ExtendedEnumeration"
(ensure no other imports in the file are modified inadvertently and run a quick
compile to verify).

In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala`:
- Around line 940-961: The cleanup loop in cleanUpOrphanUploadsForOrgaAndDomain
builds a domain-specific path (uploadDomain) but calls isKnownUploadOfAnyDomain,
which redundantly checks all domains; change it to perform a domain-specific
existence check instead by using uploadDomain to pick the correct metadata-store
predicate (e.g., switch on uploadDomain and call the corresponding method like
isKnownUploadInTransferStore / isKnownUploadInDatasetStore /
isKnownUploadInVolumeStore or add a new helper isKnownUploadInDomain(uploadId,
uploadDomain)), and replace the
isKnownUploadOfAnyDomain(uploadDir.getFileName.toString) call with that
domain-scoped check so only the relevant store is consulted while preserving the
same delete logic.

In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RedisTemporaryStore.scala`:
- Around line 85-89: The method findParsed currently declares an unnecessary
implicit parameter implicit ec: ExecutionContext even though the trait defines
implicit def ec: ExecutionContext; remove the implicit parameter from
findParsed's signature so it simply reads def findParsed[T: Reads](key: String):
Fox[T], and update any internal references (the for-comprehension using find and
JsonHelper.parseAs) to use the trait-level implicit ec; keep the method body
unchanged otherwise and adjust any call sites if they explicitly passed an
ExecutionContext.
- Around line 18-22: The current implementations of find and findLong use
map(...).flatten on the Fox returned by withExceptionHandler; replace the
map(...).flatten pattern with flatMap for clarity and idiomatic Scala: change
withExceptionHandler(_.get(id)).map(_.toFox).flatten in find to
withExceptionHandler(_.get(id)).flatMap(_.toFox) (or equivalent) and similarly
replace withExceptionHandler(_.get(id).map(s => s.toLong)).map(_.toFox).flatten
in findLong with a flatMap-based version so both methods use flatMap instead of
map(...).flatten.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 463ebd30-db4f-4155-accc-446e1df0b642

📥 Commits

Reviewing files that changed from the base of the PR and between 4081fb3 and 3db3091.

⛔ Files ignored due to path filters (1)
  • test/db/dataset_mags.csv is excluded by !**/*.csv
📒 Files selected for processing (34)
  • app/controllers/DatasetController.scala
  • app/controllers/LegacyApiController.scala
  • app/controllers/WKRemoteDataStoreController.scala
  • app/models/dataset/Dataset.scala
  • app/models/dataset/DatasetService.scala
  • app/models/dataset/UploadToPathsService.scala
  • app/models/storage/UsedStorageService.scala
  • app/utils/sql/SimpleSQLDAO.scala
  • conf/webknossos.latest.routes
  • conf/webknossos.versioned.routes
  • frontend/javascripts/admin/dataset/dataset_upload_view.tsx
  • frontend/javascripts/admin/rest_api.ts
  • frontend/javascripts/test/backend_snapshot_tests/datasets.e2e.ts
  • schema/evolutions/161-upload-mags-attachments.sql
  • schema/evolutions/reversions/161-upload-mags-attachments.sql
  • schema/schema.sql
  • unreleased_changes/9402.md
  • util/src/main/scala/com/scalableminds/util/mvc/ApiVersioning.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/UploadController.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/MagLocator.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/UnfinishedUpload.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayerAttachments.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadDomain.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadMetadataStore.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RedisTemporaryStore.scala
  • webknossos-datastore/conf/datastore.latest.routes
  • webknossos-datastore/conf/datastore.versioned.routes
  • webknossos-tracingstore/conf/tracingstore.versioned.routes

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
app/models/dataset/UploadToPathsService.scala (1)

329-330: ⚠️ Potential issue | 🟠 Major

Attachment overwrite does not delete the previously reserved path.

The mag flow (lines 306-308) properly deletes old pending paths via deletePathsForOldPending before removing the DB record. However, the attachment overwrite branch only calls deletePendingAttachment without first deleting the previously reserved filesystem path. This will leak storage when overwriting pending attachment reservations.

Suggested fix
       _ <- if (overwritePending) {
-        datasetLayerAttachmentsDAO.deletePendingAttachment(dataset._id, layerName, attachmentType, attachmentName)
+        for {
+          _ <- Fox.runOptional(withPendingUploadToPathsBox.toOption) { oldPending =>
+            deletePathsForOldPending(dataset, oldPending.path)
+          }
+          _ <- datasetLayerAttachmentsDAO.deletePendingAttachment(dataset._id, layerName, attachmentType, attachmentName)
+        } yield ()
       } else
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/dataset/UploadToPathsService.scala` around lines 329 - 330, The
overwritePending branch currently only calls
datasetLayerAttachmentsDAO.deletePendingAttachment(...) which removes the DB
record but doesn't free the previously reserved filesystem path; update the
branch to first call deletePathsForOldPending(dataset._id, layerName,
attachmentType, attachmentName) (the same helper used in the mag flow) to remove
the reserved path(s) and then call
datasetLayerAttachmentsDAO.deletePendingAttachment(...) so storage is not leaked
when overwriting pending attachments.
app/controllers/WKRemoteDataStoreController.scala (3)

135-153: ⚠️ Potential issue | 🟠 Major

Singleton attachment types can have duplicates created.

The check at lines 136-138 uses getByTypeAndName, which only rejects attachments with matching type AND name. For singleton attachment types like segmentIndex and cumsum, a second attachment with the same type but different name would pass validation and be inserted. Downstream parsing typically uses headOption, causing one upload to be silently ignored.

Suggested fix
          (dataSource, dataLayer) <- datasetService.getDataSourceAndLayerFor(dataset, request.body.layerName)
+         isSingletonAttachment = LayerAttachmentType.isSingletonAttachment(request.body.attachmentType)
          existingAttachmentOpt = dataLayer.attachments.flatMap(
-           _.getByTypeAndName(request.body.attachmentType, request.body.attachment.name))
-         _ <- Fox.fromBool(existingAttachmentOpt.isEmpty) ?~> s"Layer already has ${request.body.attachmentType} attachment named ${request.body.attachment.name}"
+           if (isSingletonAttachment)
+             _.getByType(request.body.attachmentType).headOption
+           else
+             _.getByTypeAndName(request.body.attachmentType, request.body.attachment.name))
+         existsError = if (isSingletonAttachment) s"Layer already has a ${request.body.attachmentType} attachment (singleton type)" 
+                       else s"Layer already has ${request.body.attachmentType} attachment named ${request.body.attachment.name}"
+         _ <- Fox.fromBool(existingAttachmentOpt.isEmpty) ?~> existsError
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/controllers/WKRemoteDataStoreController.scala` around lines 135 - 153,
The current validation uses dataLayer.attachments.getByTypeAndName (in the block
before datasetAttachmentDAO.insertWithUploadPending) which only prevents
duplicates with the same type+name but allows creating a second attachment with
the same singleton type (e.g., "segmentIndex", "cumsum") under a different name;
change the check to detect any existing attachment of that type for singleton
types: add a singletonTypes set (or use existing constant) and, when
request.body.attachmentType is in that set, validate by searching attachments
for matching type (not type+name) and fail if any exists (update the error text
as needed), otherwise keep the existing getByTypeAndName behavior; ensure this
check is applied before uploadToPathsService.handleExistingPendingAttachment and
before insertWithUploadPending so duplicates are prevented.

218-234: ⚠️ Potential issue | 🟠 Major

Missing datastore ownership check and storage refresh.

Unlike reserveMagUpload (line 113) which verifies dataset._dataStore == dataStore.name, reportMagUpload does not validate that the reporting datastore owns this dataset. Additionally, unlike reportDatasetUpload (line 213), storage usage is not refreshed after the mag upload completes.

Suggested fix
   def reportMagUpload(name: String, key: String): Action[ReportMagUploadParameters] =
     Action.async(validateJson[ReportMagUploadParameters]) { implicit request =>
       dataStoreService.validateAccess(name, key) { _ =>
         for {
           dataset <- datasetDAO.findOne(request.body.datasetId)(GlobalAccessContext) ?~> Messages(
             "dataset.notFound",
             request.body.datasetId) ~> NOT_FOUND
+          _ <- Fox.fromBool(dataset._dataStore == name) ?~> "Cannot report mag upload via different datastore." ~> NOT_FOUND
           _ <- datasetMagDAO.findOneWithPendingUpload(request.body.datasetId,
                                                       request.body.layerName,
                                                       request.body.mag.mag) ?~> "dataset.finishMagUpload.notPending"
           _ <- request.body.mag.path.toFox ?~> "dataset.finishMagUpload.pathNotSet"
           _ <- datasetMagDAO.finishUpload(request.body.datasetId, request.body.layerName, request.body.mag)
+          _ <- usedStorageService.refreshStorageReportForDataset(dataset)
           dataStoreClient <- datasetService.clientFor(dataset)(GlobalAccessContext)
           _ <- dataStoreClient.invalidateDatasetInDSCache(dataset._id)
         } yield Ok
       }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/controllers/WKRemoteDataStoreController.scala` around lines 218 - 234,
reportMagUpload is missing the same datastore ownership check done in
reserveMagUpload and the storage-usage refresh done in reportDatasetUpload; add
a guard that verifies dataset._dataStore == dataStore.name (same check pattern
as in reserveMagUpload) before proceeding to finish the mag upload, and after
datasetMagDAO.finishUpload and dataStoreClient.invalidateDatasetInDSCache invoke
the same storage refresh call used in reportDatasetUpload (e.g.
datasetService.refreshStorageUsage(dataset)(GlobalAccessContext) or the
equivalent method your codebase uses) so ownership is enforced and storage usage
is refreshed.

236-256: ⚠️ Potential issue | 🟠 Major

Missing datastore ownership check and storage refresh.

Same concerns as reportMagUpload: no verification that the reporting datastore owns this dataset, and storage usage is not refreshed after attachment upload completes.

Suggested fix
   def reportAttachmentUpload(name: String, key: String): Action[ReportAttachmentUploadParameters] =
     Action.async(validateJson[ReportAttachmentUploadParameters]) { implicit request =>
       dataStoreService.validateAccess(name, key) { _ =>
         for {
           dataset <- datasetDAO.findOne(request.body.datasetId)(GlobalAccessContext) ?~> Messages(
             "dataset.notFound",
             request.body.datasetId) ~> NOT_FOUND
+          _ <- Fox.fromBool(dataset._dataStore == name) ?~> "Cannot report attachment upload via different datastore." ~> NOT_FOUND
           _ <- datasetAttachmentDAO.findOneWithPendingUpload(
             request.body.datasetId,
             request.body.layerName,
             request.body.attachmentType,
             request.body.attachment.name) ?~> "dataset.finishAttachmentUpload.notPending"
           _ <- datasetAttachmentDAO.finishUpload(request.body.datasetId,
                                                  request.body.layerName,
                                                  request.body.attachmentType,
                                                  request.body.attachment)
+          _ <- usedStorageService.refreshStorageReportForDataset(dataset)
           dataStoreClient <- datasetService.clientFor(dataset)(GlobalAccessContext)
           _ <- dataStoreClient.invalidateDatasetInDSCache(dataset._id)
         } yield Ok
       }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/controllers/WKRemoteDataStoreController.scala` around lines 236 - 256,
The reportAttachmentUpload flow is missing a datastore-ownership check and a
storage refresh: after finding the dataset (datasetDAO.findOne) add a
verification that the calling datastore owns the dataset (use dataStoreService
to perform an ownership check or compare the dataset ownership field to the
validated datastore name from dataStoreService.validateAccess) and return
FORBIDDEN if it doesn't own it; then, after datasetAttachmentDAO.finishUpload
and dataStoreClient.invalidateDatasetInDSCache, trigger a storage usage refresh
for the dataset by calling the appropriate datasetService storage refresh method
(e.g. datasetService.refreshStorageUsage or equivalent) with the dataset._id and
GlobalAccessContext so storage quotas are updated.
🧹 Nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (1)

134-139: Guard the legacy response rewrite against a missing datasetId.

This shim currently uses .get, so any upstream 200 OK payload without datasetId becomes a datastore 500. Since this is only a compatibility rename, it should fall back to the upstream result instead of throwing.

Suggested hardening
         if (result.header.status == OK) {
           result.body match {
             case play.api.http.HttpEntity.Strict(data, _) =>
               val json = Json.parse(data.toArray).as[JsObject]
-              Ok((json - "datasetId") ++ Json.obj("newDatasetId" -> (json \ "datasetId").get))
+              (json \ "datasetId").toOption match {
+                case Some(datasetId) =>
+                  Ok((json - "datasetId") ++ Json.obj("newDatasetId" -> datasetId))
+                case None =>
+                  result
+              }
             case _ => result
           }
         } else result
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala`
around lines 134 - 139, The legacy response rewrite in DSLegacyApiController
currently calls .get on (json \ "datasetId") which throws if missing; change the
logic to check for the presence of "datasetId" before rewriting: when matching
play.api.http.HttpEntity.Strict, parse the JSON into JsObject, use (json \
"datasetId").asOpt (or pattern-match on .toOption) and only return Ok((json -
"datasetId") ++ Json.obj("newDatasetId" -> datasetId)) when datasetId is
present; otherwise fall back to returning the original result unchanged. Ensure
you update the code paths where result.header.status == OK and the
HttpEntity.Strict branch to use this guarded check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala`:
- Around line 182-196: The adapter is dropping isVirtual and needsConversion
when constructing DatasetUploadInfo in the v11 path; update the
DatasetUploadInfo creation in reserveUploadV11 (where adaptedRequestBody is
built) to pass through request.body.isVirtual and request.body.needsConversion
instead of hardcoding None, so the values parsed by
LegacyReserveUploadInformationV11 are preserved when creating the
DatasetUploadInfo (references: DatasetUploadInfo, ResumableUploadInfo,
reserveUploadV11 / adaptedRequestBody).

---

Duplicate comments:
In `@app/controllers/WKRemoteDataStoreController.scala`:
- Around line 135-153: The current validation uses
dataLayer.attachments.getByTypeAndName (in the block before
datasetAttachmentDAO.insertWithUploadPending) which only prevents duplicates
with the same type+name but allows creating a second attachment with the same
singleton type (e.g., "segmentIndex", "cumsum") under a different name; change
the check to detect any existing attachment of that type for singleton types:
add a singletonTypes set (or use existing constant) and, when
request.body.attachmentType is in that set, validate by searching attachments
for matching type (not type+name) and fail if any exists (update the error text
as needed), otherwise keep the existing getByTypeAndName behavior; ensure this
check is applied before uploadToPathsService.handleExistingPendingAttachment and
before insertWithUploadPending so duplicates are prevented.
- Around line 218-234: reportMagUpload is missing the same datastore ownership
check done in reserveMagUpload and the storage-usage refresh done in
reportDatasetUpload; add a guard that verifies dataset._dataStore ==
dataStore.name (same check pattern as in reserveMagUpload) before proceeding to
finish the mag upload, and after datasetMagDAO.finishUpload and
dataStoreClient.invalidateDatasetInDSCache invoke the same storage refresh call
used in reportDatasetUpload (e.g.
datasetService.refreshStorageUsage(dataset)(GlobalAccessContext) or the
equivalent method your codebase uses) so ownership is enforced and storage usage
is refreshed.
- Around line 236-256: The reportAttachmentUpload flow is missing a
datastore-ownership check and a storage refresh: after finding the dataset
(datasetDAO.findOne) add a verification that the calling datastore owns the
dataset (use dataStoreService to perform an ownership check or compare the
dataset ownership field to the validated datastore name from
dataStoreService.validateAccess) and return FORBIDDEN if it doesn't own it;
then, after datasetAttachmentDAO.finishUpload and
dataStoreClient.invalidateDatasetInDSCache, trigger a storage usage refresh for
the dataset by calling the appropriate datasetService storage refresh method
(e.g. datasetService.refreshStorageUsage or equivalent) with the dataset._id and
GlobalAccessContext so storage quotas are updated.

In `@app/models/dataset/UploadToPathsService.scala`:
- Around line 329-330: The overwritePending branch currently only calls
datasetLayerAttachmentsDAO.deletePendingAttachment(...) which removes the DB
record but doesn't free the previously reserved filesystem path; update the
branch to first call deletePathsForOldPending(dataset._id, layerName,
attachmentType, attachmentName) (the same helper used in the mag flow) to remove
the reserved path(s) and then call
datasetLayerAttachmentsDAO.deletePendingAttachment(...) so storage is not leaked
when overwriting pending attachments.

---

Nitpick comments:
In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala`:
- Around line 134-139: The legacy response rewrite in DSLegacyApiController
currently calls .get on (json \ "datasetId") which throws if missing; change the
logic to check for the presence of "datasetId" before rewriting: when matching
play.api.http.HttpEntity.Strict, parse the JSON into JsObject, use (json \
"datasetId").asOpt (or pattern-match on .toOption) and only return Ok((json -
"datasetId") ++ Json.obj("newDatasetId" -> datasetId)) when datasetId is
present; otherwise fall back to returning the original result unchanged. Ensure
you update the code paths where result.header.status == OK and the
HttpEntity.Strict branch to use this guarded check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e6f86d62-1436-43f3-a1c3-db0418f3799d

📥 Commits

Reviewing files that changed from the base of the PR and between 3db3091 and 9e0cd3d.

📒 Files selected for processing (5)
  • app/controllers/DatasetController.scala
  • app/controllers/WKRemoteDataStoreController.scala
  • app/models/dataset/UploadToPathsService.scala
  • conf/webknossos.versioned.routes
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala
🚧 Files skipped from review as they are similar to previous changes (1)
  • conf/webknossos.versioned.routes

@fm3 fm3 requested a review from normanrz April 9, 2026 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Complete CRUD for RemoteDatasets

1 participant