Allow uploading mags, attachments to existing datasets#9402
Allow uploading mags, attachments to existing datasets#9402
Conversation
📝 WalkthroughWalkthroughImplement upload endpoints for dataset mags and attachments, refactor DAO names and pending-upload tracking, introduce domain-partitioned upload metadata and services, bump API version to 14, and update frontend, routes, and database schema for the new upload domain model. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Allow more dataset mutation via API, as used by the python libs. ### Steps to test: - Test CRUD via API - Test that old compose via frontend still works (renamed some api fields) - Corresponding libs PR: #1438 ### TODOs: - [x] RemoteAttachments.delete_attachment - [x] Backend needs to respect missing attachments in updatePartial call - [x] ensure frontend or older libs versions don’t already skip the attachments? - ensure libs don’t drop attachments on updates when opened via zarr streaming? →LIBS - [x] bump API version to v13, ignore attachment changes on older (otherwise we lose attachments unwanted) - [x] RemoteAttachments.rename_attachment - [x] RemoteDataset.add_layer_as_ref (linked layer) - Needs new protocol similar to compose (ensure permission checks) - [x] RemoteLayer.add_mag_as_ref (linked mag) - Same as above - [x] RemoteAttachments.add_attachments_as_ref - Same as above - [x] Determine which updates are allowed for non-virtual datasets, make sure those work, make sure others are rejected ### Follow-up - RemoteLayer.add_mag_as_copy with transfer_mode=HTTP - Needs new upload protocol. Use resumable here too? - RemoteAttachments.add_attachment_as_copy with transfer_mode=HTTP - Needs new upload protocol (analog to mag) Both will follow in #9402 ### Issues: - contributes to #9278 ------ - [x] Added changelog entry (create a `$PR_NUMBER.md` file in `unreleased_changes` or use `./tools/create-changelog-entry.py`) - [x] Removed dev-only changes like prints and application.conf edits - [x] Considered [common edge cases](../blob/master/.github/common_edge_cases.md) - [x] Needs datastore update after deployment --------- Co-authored-by: Norman Rzepka <code@normanrz.com>
… bump api version
There was a problem hiding this comment.
Actionable comments posted: 13
🧹 Nitpick comments (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RedisTemporaryStore.scala (2)
85-89: Redundant implicit ExecutionContext parameter.The
implicit ec: ExecutionContextparameter onfindParsedis redundant since the trait already declaresimplicit def ec: ExecutionContextat line 12.♻️ Suggested fix
- def findParsed[T: Reads](key: String)(implicit ec: ExecutionContext): Fox[T] = + def findParsed[T: Reads](key: String): Fox[T] = for { objectString <- find(key) parsed <- JsonHelper.parseAs[T](objectString).toFox } yield parsed🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RedisTemporaryStore.scala` around lines 85 - 89, The method findParsed currently declares an unnecessary implicit parameter implicit ec: ExecutionContext even though the trait defines implicit def ec: ExecutionContext; remove the implicit parameter from findParsed's signature so it simply reads def findParsed[T: Reads](key: String): Fox[T], and update any internal references (the for-comprehension using find and JsonHelper.parseAs) to use the trait-level implicit ec; keep the method body unchanged otherwise and adjust any call sites if they explicitly passed an ExecutionContext.
18-22: Consider simplifying the Fox chaining.The pattern
withExceptionHandler(_.get(id)).map(_.toFox).flattenworks but could be more idiomatic usingflatMap:♻️ Suggested simplification
def find(id: String): Fox[String] = - withExceptionHandler(_.get(id)).map(_.toFox).flatten + withExceptionHandler(_.get(id)).flatMap(_.toFox) def findLong(id: String): Fox[Long] = - withExceptionHandler(_.get(id).map(s => s.toLong)).map(_.toFox).flatten + withExceptionHandler(_.get(id).map(_.toLong)).flatMap(_.toFox)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RedisTemporaryStore.scala` around lines 18 - 22, The current implementations of find and findLong use map(...).flatten on the Fox returned by withExceptionHandler; replace the map(...).flatten pattern with flatMap for clarity and idiomatic Scala: change withExceptionHandler(_.get(id)).map(_.toFox).flatten in find to withExceptionHandler(_.get(id)).flatMap(_.toFox) (or equivalent) and similarly replace withExceptionHandler(_.get(id).map(s => s.toLong)).map(_.toFox).flatten in findLong with a flatMap-based version so both methods use flatMap instead of map(...).flatten.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadDomain.scala (1)
3-3: Unnecessary semicolon in import statement.♻️ Proposed fix
-import com.scalableminds.util.enumeration.ExtendedEnumeration; +import com.scalableminds.util.enumeration.ExtendedEnumeration🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadDomain.scala` at line 3, The import statement "import com.scalableminds.util.enumeration.ExtendedEnumeration;" contains an unnecessary trailing semicolon; open UploadDomain.scala and remove the semicolon so the import reads "import com.scalableminds.util.enumeration.ExtendedEnumeration" (ensure no other imports in the file are modified inadvertently and run a quick compile to verify).webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
940-961: Consider: Orphan cleanup checks all domains but directories are domain-specific.The
cleanUpOrphanUploadsForOrgaAndDomainmethod builds a domain-specific directory path (line 941) but then callsisKnownUploadOfAnyDomainwhich checks all three metadata stores. Since upload directories are partitioned by domain, checking only the relevant domain's store would be more efficient. However, this defensive approach is safe if upload IDs could theoretically overlap.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala` around lines 940 - 961, The cleanup loop in cleanUpOrphanUploadsForOrgaAndDomain builds a domain-specific path (uploadDomain) but calls isKnownUploadOfAnyDomain, which redundantly checks all domains; change it to perform a domain-specific existence check instead by using uploadDomain to pick the correct metadata-store predicate (e.g., switch on uploadDomain and call the corresponding method like isKnownUploadInTransferStore / isKnownUploadInDatasetStore / isKnownUploadInVolumeStore or add a new helper isKnownUploadInDomain(uploadId, uploadDomain)), and replace the isKnownUploadOfAnyDomain(uploadDir.getFileName.toString) call with that domain-scoped check so only the relevant store is consulted while preserving the same delete logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/controllers/DatasetController.scala`:
- Around line 730-731: The code clears uploadToPathIsPending before verifying
it, causing findOneWithPendingUploadToPath to fail; change the order so you
first call datasetMagsDAO.findOneWithPendingUploadToPath(datasetId,
request.body.layerName, request.body.mag) and only if that succeeds call
datasetMagsDAO.finishUploadToPath(datasetId, request.body.layerName,
request.body.mag) (or alternatively have finishUploadToPath return/validate the
previous pending state), ensuring uploadToPathIsPending is checked before being
cleared.
In `@app/controllers/LegacyApiController.scala`:
- Around line 49-53: Remove the dead model by deleting the
LegacyUploadInformation case class and its companion object (the implicit
jsonFormat) from the LegacyApiController file; ensure there are no remaining
references to LegacyUploadInformation in the codebase (if any exist, point them
to the datastore’s implementation with the needsConversion field), remove any
now-unused imports introduced only for this model, and run tests/compile to
confirm no other code depends on this obsolete definition.
In `@app/controllers/WKRemoteDataStoreController.scala`:
- Around line 218-255: Add the same ownership and storage-refresh steps used by
the reserve handlers and reportDatasetUpload to both reportMagUpload and
reportAttachmentUpload: after loading dataset in reportMagUpload and
reportAttachmentUpload, verify dataset._dataStore == name and return a
NOT_FOUND/forbidden-like error if it does not match; after calling finishUpload
on datasetMagDAO/datasetAttachmentDAO, call the same datasetService
refresh-used-storage method used by reportDatasetUpload (preserving
GlobalAccessContext) to update quota/used storage, then proceed to obtain
dataStoreClient and invalidate the DS cache. Reference: reportMagUpload,
reportAttachmentUpload, dataset._dataStore, finishUpload,
datasetService.clientFor, and the refresh-used-storage call used in
reportDatasetUpload.
- Around line 135-153: The current guard only rejects attachments with the same
type AND name, allowing duplicate singleton-type attachments (e.g.,
"segmentIndex", "cumsum") with different names; update the validation before
calling datasetAttachmentDAO.insertWithUploadPending to also reject when any
attachment of a singleton type already exists on the layer (ignore name) or when
a pending singleton already exists. Concretely, in the flow around
datasetService.getDataSourceAndLayerFor / existingAttachmentOpt /
uploadToPathsService.handleExistingPendingAttachment, add a check for singleton
types (e.g., a Set("segmentIndex","cumsum")) that verifies dataLayer.attachments
has no attachment.getByType(request.body.attachmentType) (or flatMap/.exists)
and fail with a clear message if one exists, so insertWithUploadPending cannot
create a second singleton entry.
In `@app/models/dataset/Dataset.scala`:
- Around line 956-978: The two insert helpers insertWithUploadToPathPending and
insertWithUploadPending must persist credentialId like the non-WKW branch in
updateMags; add a credentialId: Option[ObjectId] parameter to both methods and
include credentialId in the INSERT column list and VALUES expression (e.g. add
credentialId column and use ${credentialId} in the q"""...VALUES(...)""" for
each method) so pending uploads on credential-backed datasets retain the locator
credential; ensure the inserted column name matches the table schema and that
parameter is threaded through callers consistent with how updateMags handles
credentialId.
In `@app/models/dataset/UploadToPathsService.scala`:
- Around line 302-303: The second lookup incorrectly calls
datasetMagDAO.findOneWithPendingUploadToPath again, so withPendingUploadBox
never detects an already pending mag; change the second call to
datasetMagDAO.findOneWithPendingUpload (keeping the first call to
findOneWithPendingUploadToPath for withPendingUploadToPathsBox) so
withPendingUploadBox correctly reflects any existing uploadIsPending mag and
causes an immediate conflict instead of deferring to the DB insert/constraint.
- Around line 323-334: The overwritePending branch currently deletes the DB row
via datasetLayerAttachmentsDAO.deletePendingAttachment but doesn't remove the
previously reserved attachment path, causing orphaned storage; update the
overwritePending path in UploadToPathsService so that before or immediately
after calling datasetLayerAttachmentsDAO.deletePendingAttachment you also call
the service that deletes reserved storage for the previous attachment path (the
same cleanup used by the mag flow), passing the previously fetched
withPendingUploadToPathsBox/withPendingUploadBox path info to remove the file;
ensure you lookup the prior path from withPendingUploadToPathsBox or
withPendingUploadBox and perform the delete operation (same helper used
elsewhere) then proceed with database deletion to avoid leaking reserved files.
In `@conf/webknossos.versioned.routes`:
- Line 7: Fix the typo in the comment string "# chnaged in v14: Dataset upload
routes and parameters have been refactored, introduced upload domain" by
replacing "chnaged" with "changed" so the line reads "# changed in v14: Dataset
upload routes and parameters have been refactored, introduced upload domain".
In `@frontend/javascripts/test/backend_snapshot_tests/datasets.e2e.ts`:
- Around line 197-205: The reserved filename in resumableUploadInfo.filePaths
("test-dataset-upload.zip") doesn't match the actual filename used by the chunk
upload ("test-dataset.zip"), so update resumableUploadInfo.filePaths to use
"test-dataset.zip" to match the upload flow (ensure the resumableUploadInfo
object — specifically the filePaths array — and the later chunk upload filename
are identical).
In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala`:
- Around line 17-20: The deleteOnDisk method uses the caller-provided path
(dataSourcePath) without validating it; normalize and canonicalize the incoming
path and enforce it is inside the expected base directory before performing any
move/delete. Specifically, for deleteOnDisk, compute a normalized/canonical Path
(e.g. path.getOrElse(...).toAbsolutePath.normalize or preferably
toRealPath(LinkOption.NOFOLLOW_LINKS)) and verify it startsWith the expected
root (dataBaseDir.resolve(organizationId).toRealPath(...)); if the check fails,
return a failed Fox with a clear error rather than proceeding with deletion;
apply the same normalization/check to any intermediate paths used in the method
to guard against .. segments or symlink escapes.
In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadMetadataStore.scala`:
- Around line 179-199: MagUploadMetadataStore currently inserts Redis keys for
mag and layerName but doesn't override cleanUp, leaving orphaned keys; add an
override def cleanUp(uploadId: String)(implicit ec: ExecutionContext): Fox[Unit]
in the MagUploadMetadataStore class that deletes the two keys produced by
redisKeyForMag(uploadId) and redisKeyForLayerName(uploadId) (e.g. via
store.delete or store.deleteMany) and then delegates to the parent cleanUp (or
ensures other domain cleanup runs), returning the appropriate Fox[Unit].
- Around line 201-230: AttachmentUploadMetadataStore is missing an override of
cleanUp, so attachment-related Redis keys remain orphaned; add a
cleanUp(uploadId: String) override in class AttachmentUploadMetadataStore that
deletes the three keys generated by redisKeyForAttachment(uploadId),
redisKeyForAttachmentType(uploadId) and redisKeyForLayerName(uploadId) (use the
store.delete or equivalent method used elsewhere and return the appropriate
Fox[Unit] result), and ensure the method signature matches the base
UploadMetadataStore.cleanUp contract and composes/delegates to any superclass
cleanup if needed.
In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala`:
- Around line 416-421: The call inside finishMagUpload to
moveUnpackedMagOrAttachmentToTarget is passing the wrong domain
(UploadDomain.attachment) which causes incorrect handling; update the call in
finishMagUpload to pass UploadDomain.mag instead so
moveUnpackedMagOrAttachmentToTarget, any logging and downstream behavior
correctly treat the item as a mag rather than an attachment (look for the
invocation with parameters unpackToDir, layerName, datasetId, dataSourceId,
s"${mag.mag.toMagLiteral(true)}__${ObjectId.generate}" and replace the final
parameter).
---
Nitpick comments:
In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadDomain.scala`:
- Line 3: The import statement "import
com.scalableminds.util.enumeration.ExtendedEnumeration;" contains an unnecessary
trailing semicolon; open UploadDomain.scala and remove the semicolon so the
import reads "import com.scalableminds.util.enumeration.ExtendedEnumeration"
(ensure no other imports in the file are modified inadvertently and run a quick
compile to verify).
In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala`:
- Around line 940-961: The cleanup loop in cleanUpOrphanUploadsForOrgaAndDomain
builds a domain-specific path (uploadDomain) but calls isKnownUploadOfAnyDomain,
which redundantly checks all domains; change it to perform a domain-specific
existence check instead by using uploadDomain to pick the correct metadata-store
predicate (e.g., switch on uploadDomain and call the corresponding method like
isKnownUploadInTransferStore / isKnownUploadInDatasetStore /
isKnownUploadInVolumeStore or add a new helper isKnownUploadInDomain(uploadId,
uploadDomain)), and replace the
isKnownUploadOfAnyDomain(uploadDir.getFileName.toString) call with that
domain-scoped check so only the relevant store is consulted while preserving the
same delete logic.
In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RedisTemporaryStore.scala`:
- Around line 85-89: The method findParsed currently declares an unnecessary
implicit parameter implicit ec: ExecutionContext even though the trait defines
implicit def ec: ExecutionContext; remove the implicit parameter from
findParsed's signature so it simply reads def findParsed[T: Reads](key: String):
Fox[T], and update any internal references (the for-comprehension using find and
JsonHelper.parseAs) to use the trait-level implicit ec; keep the method body
unchanged otherwise and adjust any call sites if they explicitly passed an
ExecutionContext.
- Around line 18-22: The current implementations of find and findLong use
map(...).flatten on the Fox returned by withExceptionHandler; replace the
map(...).flatten pattern with flatMap for clarity and idiomatic Scala: change
withExceptionHandler(_.get(id)).map(_.toFox).flatten in find to
withExceptionHandler(_.get(id)).flatMap(_.toFox) (or equivalent) and similarly
replace withExceptionHandler(_.get(id).map(s => s.toLong)).map(_.toFox).flatten
in findLong with a flatMap-based version so both methods use flatMap instead of
map(...).flatten.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 463ebd30-db4f-4155-accc-446e1df0b642
⛔ Files ignored due to path filters (1)
test/db/dataset_mags.csvis excluded by!**/*.csv
📒 Files selected for processing (34)
app/controllers/DatasetController.scalaapp/controllers/LegacyApiController.scalaapp/controllers/WKRemoteDataStoreController.scalaapp/models/dataset/Dataset.scalaapp/models/dataset/DatasetService.scalaapp/models/dataset/UploadToPathsService.scalaapp/models/storage/UsedStorageService.scalaapp/utils/sql/SimpleSQLDAO.scalaconf/webknossos.latest.routesconf/webknossos.versioned.routesfrontend/javascripts/admin/dataset/dataset_upload_view.tsxfrontend/javascripts/admin/rest_api.tsfrontend/javascripts/test/backend_snapshot_tests/datasets.e2e.tsschema/evolutions/161-upload-mags-attachments.sqlschema/evolutions/reversions/161-upload-mags-attachments.sqlschema/schema.sqlunreleased_changes/9402.mdutil/src/main/scala/com/scalableminds/util/mvc/ApiVersioning.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/UploadController.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/MagLocator.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/models/UnfinishedUpload.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayerAttachments.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadDomain.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadMetadataStore.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RedisTemporaryStore.scalawebknossos-datastore/conf/datastore.latest.routeswebknossos-datastore/conf/datastore.versioned.routeswebknossos-tracingstore/conf/tracingstore.versioned.routes
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala
Show resolved
Hide resolved
...tore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadMetadataStore.scala
Show resolved
Hide resolved
...tore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadMetadataStore.scala
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
app/models/dataset/UploadToPathsService.scala (1)
329-330:⚠️ Potential issue | 🟠 MajorAttachment overwrite does not delete the previously reserved path.
The mag flow (lines 306-308) properly deletes old pending paths via
deletePathsForOldPendingbefore removing the DB record. However, the attachment overwrite branch only callsdeletePendingAttachmentwithout first deleting the previously reserved filesystem path. This will leak storage when overwriting pending attachment reservations.Suggested fix
_ <- if (overwritePending) { - datasetLayerAttachmentsDAO.deletePendingAttachment(dataset._id, layerName, attachmentType, attachmentName) + for { + _ <- Fox.runOptional(withPendingUploadToPathsBox.toOption) { oldPending => + deletePathsForOldPending(dataset, oldPending.path) + } + _ <- datasetLayerAttachmentsDAO.deletePendingAttachment(dataset._id, layerName, attachmentType, attachmentName) + } yield () } else🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/dataset/UploadToPathsService.scala` around lines 329 - 330, The overwritePending branch currently only calls datasetLayerAttachmentsDAO.deletePendingAttachment(...) which removes the DB record but doesn't free the previously reserved filesystem path; update the branch to first call deletePathsForOldPending(dataset._id, layerName, attachmentType, attachmentName) (the same helper used in the mag flow) to remove the reserved path(s) and then call datasetLayerAttachmentsDAO.deletePendingAttachment(...) so storage is not leaked when overwriting pending attachments.app/controllers/WKRemoteDataStoreController.scala (3)
135-153:⚠️ Potential issue | 🟠 MajorSingleton attachment types can have duplicates created.
The check at lines 136-138 uses
getByTypeAndName, which only rejects attachments with matching type AND name. For singleton attachment types likesegmentIndexandcumsum, a second attachment with the same type but different name would pass validation and be inserted. Downstream parsing typically usesheadOption, causing one upload to be silently ignored.Suggested fix
(dataSource, dataLayer) <- datasetService.getDataSourceAndLayerFor(dataset, request.body.layerName) + isSingletonAttachment = LayerAttachmentType.isSingletonAttachment(request.body.attachmentType) existingAttachmentOpt = dataLayer.attachments.flatMap( - _.getByTypeAndName(request.body.attachmentType, request.body.attachment.name)) - _ <- Fox.fromBool(existingAttachmentOpt.isEmpty) ?~> s"Layer already has ${request.body.attachmentType} attachment named ${request.body.attachment.name}" + if (isSingletonAttachment) + _.getByType(request.body.attachmentType).headOption + else + _.getByTypeAndName(request.body.attachmentType, request.body.attachment.name)) + existsError = if (isSingletonAttachment) s"Layer already has a ${request.body.attachmentType} attachment (singleton type)" + else s"Layer already has ${request.body.attachmentType} attachment named ${request.body.attachment.name}" + _ <- Fox.fromBool(existingAttachmentOpt.isEmpty) ?~> existsError🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/controllers/WKRemoteDataStoreController.scala` around lines 135 - 153, The current validation uses dataLayer.attachments.getByTypeAndName (in the block before datasetAttachmentDAO.insertWithUploadPending) which only prevents duplicates with the same type+name but allows creating a second attachment with the same singleton type (e.g., "segmentIndex", "cumsum") under a different name; change the check to detect any existing attachment of that type for singleton types: add a singletonTypes set (or use existing constant) and, when request.body.attachmentType is in that set, validate by searching attachments for matching type (not type+name) and fail if any exists (update the error text as needed), otherwise keep the existing getByTypeAndName behavior; ensure this check is applied before uploadToPathsService.handleExistingPendingAttachment and before insertWithUploadPending so duplicates are prevented.
218-234:⚠️ Potential issue | 🟠 MajorMissing datastore ownership check and storage refresh.
Unlike
reserveMagUpload(line 113) which verifiesdataset._dataStore == dataStore.name,reportMagUploaddoes not validate that the reporting datastore owns this dataset. Additionally, unlikereportDatasetUpload(line 213), storage usage is not refreshed after the mag upload completes.Suggested fix
def reportMagUpload(name: String, key: String): Action[ReportMagUploadParameters] = Action.async(validateJson[ReportMagUploadParameters]) { implicit request => dataStoreService.validateAccess(name, key) { _ => for { dataset <- datasetDAO.findOne(request.body.datasetId)(GlobalAccessContext) ?~> Messages( "dataset.notFound", request.body.datasetId) ~> NOT_FOUND + _ <- Fox.fromBool(dataset._dataStore == name) ?~> "Cannot report mag upload via different datastore." ~> NOT_FOUND _ <- datasetMagDAO.findOneWithPendingUpload(request.body.datasetId, request.body.layerName, request.body.mag.mag) ?~> "dataset.finishMagUpload.notPending" _ <- request.body.mag.path.toFox ?~> "dataset.finishMagUpload.pathNotSet" _ <- datasetMagDAO.finishUpload(request.body.datasetId, request.body.layerName, request.body.mag) + _ <- usedStorageService.refreshStorageReportForDataset(dataset) dataStoreClient <- datasetService.clientFor(dataset)(GlobalAccessContext) _ <- dataStoreClient.invalidateDatasetInDSCache(dataset._id) } yield Ok } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/controllers/WKRemoteDataStoreController.scala` around lines 218 - 234, reportMagUpload is missing the same datastore ownership check done in reserveMagUpload and the storage-usage refresh done in reportDatasetUpload; add a guard that verifies dataset._dataStore == dataStore.name (same check pattern as in reserveMagUpload) before proceeding to finish the mag upload, and after datasetMagDAO.finishUpload and dataStoreClient.invalidateDatasetInDSCache invoke the same storage refresh call used in reportDatasetUpload (e.g. datasetService.refreshStorageUsage(dataset)(GlobalAccessContext) or the equivalent method your codebase uses) so ownership is enforced and storage usage is refreshed.
236-256:⚠️ Potential issue | 🟠 MajorMissing datastore ownership check and storage refresh.
Same concerns as
reportMagUpload: no verification that the reporting datastore owns this dataset, and storage usage is not refreshed after attachment upload completes.Suggested fix
def reportAttachmentUpload(name: String, key: String): Action[ReportAttachmentUploadParameters] = Action.async(validateJson[ReportAttachmentUploadParameters]) { implicit request => dataStoreService.validateAccess(name, key) { _ => for { dataset <- datasetDAO.findOne(request.body.datasetId)(GlobalAccessContext) ?~> Messages( "dataset.notFound", request.body.datasetId) ~> NOT_FOUND + _ <- Fox.fromBool(dataset._dataStore == name) ?~> "Cannot report attachment upload via different datastore." ~> NOT_FOUND _ <- datasetAttachmentDAO.findOneWithPendingUpload( request.body.datasetId, request.body.layerName, request.body.attachmentType, request.body.attachment.name) ?~> "dataset.finishAttachmentUpload.notPending" _ <- datasetAttachmentDAO.finishUpload(request.body.datasetId, request.body.layerName, request.body.attachmentType, request.body.attachment) + _ <- usedStorageService.refreshStorageReportForDataset(dataset) dataStoreClient <- datasetService.clientFor(dataset)(GlobalAccessContext) _ <- dataStoreClient.invalidateDatasetInDSCache(dataset._id) } yield Ok } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/controllers/WKRemoteDataStoreController.scala` around lines 236 - 256, The reportAttachmentUpload flow is missing a datastore-ownership check and a storage refresh: after finding the dataset (datasetDAO.findOne) add a verification that the calling datastore owns the dataset (use dataStoreService to perform an ownership check or compare the dataset ownership field to the validated datastore name from dataStoreService.validateAccess) and return FORBIDDEN if it doesn't own it; then, after datasetAttachmentDAO.finishUpload and dataStoreClient.invalidateDatasetInDSCache, trigger a storage usage refresh for the dataset by calling the appropriate datasetService storage refresh method (e.g. datasetService.refreshStorageUsage or equivalent) with the dataset._id and GlobalAccessContext so storage quotas are updated.
🧹 Nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (1)
134-139: Guard the legacy response rewrite against a missingdatasetId.This shim currently uses
.get, so any upstream200 OKpayload withoutdatasetIdbecomes a datastore500. Since this is only a compatibility rename, it should fall back to the upstream result instead of throwing.Suggested hardening
if (result.header.status == OK) { result.body match { case play.api.http.HttpEntity.Strict(data, _) => val json = Json.parse(data.toArray).as[JsObject] - Ok((json - "datasetId") ++ Json.obj("newDatasetId" -> (json \ "datasetId").get)) + (json \ "datasetId").toOption match { + case Some(datasetId) => + Ok((json - "datasetId") ++ Json.obj("newDatasetId" -> datasetId)) + case None => + result + } case _ => result } } else result🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala` around lines 134 - 139, The legacy response rewrite in DSLegacyApiController currently calls .get on (json \ "datasetId") which throws if missing; change the logic to check for the presence of "datasetId" before rewriting: when matching play.api.http.HttpEntity.Strict, parse the JSON into JsObject, use (json \ "datasetId").asOpt (or pattern-match on .toOption) and only return Ok((json - "datasetId") ++ Json.obj("newDatasetId" -> datasetId)) when datasetId is present; otherwise fall back to returning the original result unchanged. Ensure you update the code paths where result.header.status == OK and the HttpEntity.Strict branch to use this guarded check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala`:
- Around line 182-196: The adapter is dropping isVirtual and needsConversion
when constructing DatasetUploadInfo in the v11 path; update the
DatasetUploadInfo creation in reserveUploadV11 (where adaptedRequestBody is
built) to pass through request.body.isVirtual and request.body.needsConversion
instead of hardcoding None, so the values parsed by
LegacyReserveUploadInformationV11 are preserved when creating the
DatasetUploadInfo (references: DatasetUploadInfo, ResumableUploadInfo,
reserveUploadV11 / adaptedRequestBody).
---
Duplicate comments:
In `@app/controllers/WKRemoteDataStoreController.scala`:
- Around line 135-153: The current validation uses
dataLayer.attachments.getByTypeAndName (in the block before
datasetAttachmentDAO.insertWithUploadPending) which only prevents duplicates
with the same type+name but allows creating a second attachment with the same
singleton type (e.g., "segmentIndex", "cumsum") under a different name; change
the check to detect any existing attachment of that type for singleton types:
add a singletonTypes set (or use existing constant) and, when
request.body.attachmentType is in that set, validate by searching attachments
for matching type (not type+name) and fail if any exists (update the error text
as needed), otherwise keep the existing getByTypeAndName behavior; ensure this
check is applied before uploadToPathsService.handleExistingPendingAttachment and
before insertWithUploadPending so duplicates are prevented.
- Around line 218-234: reportMagUpload is missing the same datastore ownership
check done in reserveMagUpload and the storage-usage refresh done in
reportDatasetUpload; add a guard that verifies dataset._dataStore ==
dataStore.name (same check pattern as in reserveMagUpload) before proceeding to
finish the mag upload, and after datasetMagDAO.finishUpload and
dataStoreClient.invalidateDatasetInDSCache invoke the same storage refresh call
used in reportDatasetUpload (e.g.
datasetService.refreshStorageUsage(dataset)(GlobalAccessContext) or the
equivalent method your codebase uses) so ownership is enforced and storage usage
is refreshed.
- Around line 236-256: The reportAttachmentUpload flow is missing a
datastore-ownership check and a storage refresh: after finding the dataset
(datasetDAO.findOne) add a verification that the calling datastore owns the
dataset (use dataStoreService to perform an ownership check or compare the
dataset ownership field to the validated datastore name from
dataStoreService.validateAccess) and return FORBIDDEN if it doesn't own it;
then, after datasetAttachmentDAO.finishUpload and
dataStoreClient.invalidateDatasetInDSCache, trigger a storage usage refresh for
the dataset by calling the appropriate datasetService storage refresh method
(e.g. datasetService.refreshStorageUsage or equivalent) with the dataset._id and
GlobalAccessContext so storage quotas are updated.
In `@app/models/dataset/UploadToPathsService.scala`:
- Around line 329-330: The overwritePending branch currently only calls
datasetLayerAttachmentsDAO.deletePendingAttachment(...) which removes the DB
record but doesn't free the previously reserved filesystem path; update the
branch to first call deletePathsForOldPending(dataset._id, layerName,
attachmentType, attachmentName) (the same helper used in the mag flow) to remove
the reserved path(s) and then call
datasetLayerAttachmentsDAO.deletePendingAttachment(...) so storage is not leaked
when overwriting pending attachments.
---
Nitpick comments:
In
`@webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala`:
- Around line 134-139: The legacy response rewrite in DSLegacyApiController
currently calls .get on (json \ "datasetId") which throws if missing; change the
logic to check for the presence of "datasetId" before rewriting: when matching
play.api.http.HttpEntity.Strict, parse the JSON into JsObject, use (json \
"datasetId").asOpt (or pattern-match on .toOption) and only return Ok((json -
"datasetId") ++ Json.obj("newDatasetId" -> datasetId)) when datasetId is
present; otherwise fall back to returning the original result unchanged. Ensure
you update the code paths where result.header.status == OK and the
HttpEntity.Strict branch to use this guarded check.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e6f86d62-1436-43f3-a1c3-db0418f3799d
📒 Files selected for processing (5)
app/controllers/DatasetController.scalaapp/controllers/WKRemoteDataStoreController.scalaapp/models/dataset/UploadToPathsService.scalaconf/webknossos.versioned.routeswebknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala
🚧 Files skipped from review as they are similar to previous changes (1)
- conf/webknossos.versioned.routes
overwritePending(in request body), analogous to reserveMagUploadToPathSteps to test:
TODOs:
Issues:
$PR_NUMBER.mdfile inunreleased_changesor use./tools/create-changelog-entry.py)