Skip to content

Conversation

@GregorShear
Copy link
Contributor

@GregorShear GregorShear commented Jan 9, 2026

Description: Adds updateStorageMapping GraphQL mutation for modifying existing storage mappings. Refactors shared validation and health check logic into reusable helper functions. The update response includes a republish field indicating whether the primary storage bucket changed, which signals that affected specs will need republishing.

Workflow steps: Call updateStorageMapping with the same input shape as createStorageMapping:

mutation {
  updateStorageMapping(
    catalogPrefix: "gregCo/"
    dryRun: false
    storage: {
      stores: [
        {provider: "GCS", bucket: "your-bucket", prefix: "collection-data/"}
      ]
      data_planes: ["ops/dp/public/local-cluster"]
    }
  ) {
    updated
    catalogPrefix
    republish
  }
}

Use dryRun: true to validate input and check if a republish would be required without persisting changes.

Notes for reviewers:

  • The republish field compares the incoming list of stores with the existing
  • The TODO comment about actually triggering republish remains - this PR only reports whether it's needed

@GregorShear GregorShear requested a review from psFried January 9, 2026 21:43
Copy link
Member

@psFried psFried left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great so far. I left one minor comment, but apart from that this should be good to go once it's rebased on top of the changes to the other PR

@GregorShear GregorShear force-pushed the greg/mutation/updateMapping branch from ca9af4c to ac89dcd Compare January 13, 2026 19:28
@GregorShear GregorShear marked this pull request as ready for review January 13, 2026 19:55
@GregorShear
Copy link
Contributor Author

republish logic updated and branch rebased on top of the other PR

@GregorShear GregorShear requested a review from psFried January 13, 2026 19:59
Copy link
Member

@psFried psFried left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a bit more feedback inline, but should be good after that

"updated storage mapping"
);

// TODO: if the primary (first) fragmentStore changed, republish the entire prefix.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: seems like the plan is for the client to handle the republication, so this can be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this gql operation need to be aware of the republishing process, or is that just handled elsewhere when changes are detected?

Base automatically changed from greg/mapping/mutation to master January 15, 2026 16:24
@GregorShear GregorShear force-pushed the greg/mutation/updateMapping branch from 8323a70 to 50cd8ec Compare January 15, 2026 16:45
…mutations

Split the single upsertStorageMapping GraphQL mutation into separate createStorageMapping and updateStorageMapping mutations with distinct behavior:

- createStorageMapping: fails if a mapping already exists, checks for existing specs that would be affected
- updateStorageMapping: fails if no mapping exists, returns whether a republish is needed due to store changes
@GregorShear GregorShear force-pushed the greg/mutation/updateMapping branch from 46431bd to cbb12a9 Compare January 20, 2026 15:40
Comment on lines +202 to +213
let sampled_specs = sqlx::query_scalar!(
r#"
SELECT catalog_name
FROM live_specs
WHERE starts_with(catalog_name, $1)
AND spec IS NOT NULL
LIMIT 5
"#,
&catalog_prefix,
)
.fetch_all(&mut *txn)
.await?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@psFried
okay to leave these inline or would you rather keep all queries in crates/control-plane-api/src/directives/storage_mappings.rs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think inline is generally fine, as long as it's not something we're trying to re-use elsewhere. For this particular query, inline seems fine. But we have some existing functions in crates/control-plane-api/src/directives/storage_mappings.rs that seem like they'd work for inserts and updates to storage mappings. I think it's worth using those, unless there's a reason not to.

sqlx::query!(
r#"
UPDATE storage_mappings
SET spec = $2, detail = $3, updated_at = now()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we be manually updating the updated_at value? i would expect this to be handled with an UPDATE trigger...

Copy link
Member

@psFried psFried Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we just handle those manually instead of with triggers. It's not something I feel especially strongly about, and we do still use triggers here and there for other things. Though I think my general position is to try to avoid triggers when we can, to make things more explicit.

That said, I think the existing upsert function is probably what we should use here instead of inline sql, and that already takes care of setting updated_at.

Comment on lines 425 to 436
sqlx::query!(
r#"
UPDATE storage_mappings
SET spec = $2, detail = $3, updated_at = now()
WHERE catalog_prefix = 'recovery/' || $1
"#,
&catalog_prefix as &str,
crate::TextJson(&recovery_storage) as crate::TextJson<&models::StorageDef>,
detail,
)
.execute(&mut *txn)
.await?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this recovery/ logic is right but would love some extra scrutiny here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comments about factoring out a function that returns both the storage defs from the original input, and about reusing the existing functions for updating the database. I'm thinking we should use upsert_storage_mapping, and then this inline query would not be necessary.

@GregorShear GregorShear requested a review from psFried January 20, 2026 15:44
Copy link
Member

@psFried psFried left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good start. I left some comments inline, mostly about trying to re-use the existing logic from directives::storage_mappings. LMK if you'd like to talk through any of that

Comment on lines +202 to +213
let sampled_specs = sqlx::query_scalar!(
r#"
SELECT catalog_name
FROM live_specs
WHERE starts_with(catalog_name, $1)
AND spec IS NOT NULL
LIMIT 5
"#,
&catalog_prefix,
)
.fetch_all(&mut *txn)
.await?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think inline is generally fine, as long as it's not something we're trying to re-use elsewhere. For this particular query, inline seems fine. But we have some existing functions in crates/control-plane-api/src/directives/storage_mappings.rs that seem like they'd work for inserts and updates to storage mappings. I think it's worth using those, unless there's a reason not to.

.cloned()
.map(|mut store| {
let prefix = store.prefix_mut();
*prefix = models::Prefix::new(format!("{prefix}collection-data/"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I foresee potential issues with this logic in update scenarios. The UI will fetch the existing storage mappings, which already have the collection-data/ prefix, and then pass those storage mappings to update, which would add the prefix again. So I'm thinking at a minimum we'd want to change the corresponding logic that's used during the update to check whether the prefix is there already. But also, this feels like it ought to be in a function that's shared between insert and update operations. Ideally, we'd have a pure function that accepts a single models::StorageDef, and returns a tuple of both the collection and recovery storage defs. Then we could continue to use that same function, even after we no longer persist the recovery mappings in postgres.

Comment on lines 425 to 436
sqlx::query!(
r#"
UPDATE storage_mappings
SET spec = $2, detail = $3, updated_at = now()
WHERE catalog_prefix = 'recovery/' || $1
"#,
&catalog_prefix as &str,
crate::TextJson(&recovery_storage) as crate::TextJson<&models::StorageDef>,
detail,
)
.execute(&mut *txn)
.await?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comments about factoring out a function that returns both the storage defs from the original input, and about reusing the existing functions for updating the database. I'm thinking we should use upsert_storage_mapping, and then this inline query would not be necessary.

@GregorShear GregorShear force-pushed the greg/mutation/updateMapping branch from b5f7b81 to 1c67122 Compare January 22, 2026 21:30
Comment on lines +67 to +73
pub async fn update_storage_mapping<'e, T, E>(
detail: Option<&str>,
catalog_prefix: &str,
spec: T,
executor: E,
) -> sqlx::Result<bool>
where
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same rationale as the GraphQL mutations - separate create / update functions will generally reduce complexity. Keeping upsert for now to minimize the diff and it's convenient for the recovery/ mappings

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 yep, sounds good to me

Comment on lines +126 to +128
pub fn split_collection_and_recovery_storage(
storage: models::StorageDef,
) -> (models::StorageDef, models::StorageDef) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a good place for this util function to live? and related tests below?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think so. We'll probably eventually move this whole file out from directives, but I think it's the best place for this logic.

// Begin a transaction to fetch existing mapping and update.
let mut txn = env.pg_pool.begin().await?;

// Fetch existing storage mapping to compare stores and verify it exists.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the query just below here looks similar to the existing fetch function in the shared file, except here we're locking the row with FOR UPDATE - i'm inclined to keep this query defined here until we find another place we need the row lock

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this seems good to me

Comment on lines +254 to +260
upsert_storage_mapping(
detail.as_deref(),
&format!("recovery/{catalog_prefix}"),
&recovery_storage,
&mut txn,
)
.await?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mention this elsewhere but i'm keeping the upsert function here because it does feel convenient for the recovery case. I'd make a case for removing upsert entirely when the recovery/ mappings are gone

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me 👍

Comment on lines +141 to +143
if !prefix.as_str().ends_with(COLLECTION_DATA_SUFFIX) {
*prefix = models::Prefix::new(format!("{prefix}{COLLECTION_DATA_SUFFIX}"));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to talk this through - the collection-data/ part may have already been added in a previous operation, so we want to check if it's there first

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm realizing that maybe it's not actually great for us to return the storage mappings with the /collection-data/ added... we'd presumably display /user-prefix/collection-data/ on the storage mappings UI, which seems like it'd be confusing to users if we then went and also wrote data to /user-prefix/recovery/.

We started out with the idea that we'd want to only expose a single representation of the storage mapping to users, and hide the split behind our API. I still think that's a good idea, but I think I was wrong about my previous suggestion to only conditionally add the collection-data/. Technically, it works, but it doesn't feel right.

I think it might be better if we instead strip the collection-data/ when we return the storage mappings in the API response? Then the UI would only ever see /user-prefix/, and we'd automatically add the collection-data/ when we persist changes. Does that make sense to you?

.into_iter()
.map(|mut store| {
let prefix = store.prefix_mut();
if let Some(base) = prefix.as_str().strip_suffix(COLLECTION_DATA_SUFFIX) {
Copy link
Contributor Author

@GregorShear GregorShear Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to above - the collection-data/ part may exist from a previous operation, so we remove it for the recovery/ mapping

@GregorShear GregorShear requested a review from psFried January 22, 2026 22:04
@GregorShear
Copy link
Contributor Author

@psFried took another pass here - i believe i've incorporated all of your feedback. I'm making a case to (eventually) deprecate the existing upsert function and move toward separate create/update functions in directives/storage_mappings.rs for the same reason we gave for the gql mutations. Happy to chat through this if you disagree

@GregorShear GregorShear force-pushed the greg/mutation/updateMapping branch from 36eee1d to eed89cb Compare January 23, 2026 16:27
@GregorShear GregorShear force-pushed the greg/mutation/updateMapping branch from eed89cb to dd29c4d Compare January 23, 2026 16:48
Copy link
Member

@psFried psFried left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes all look good, except I realized that we might want to change how we handle the collection-data/ being added. LMK if you want to talk that over on a VC

Comment on lines +254 to +260
upsert_storage_mapping(
detail.as_deref(),
&format!("recovery/{catalog_prefix}"),
&recovery_storage,
&mut txn,
)
.await?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me 👍

// Begin a transaction to fetch existing mapping and update.
let mut txn = env.pg_pool.begin().await?;

// Fetch existing storage mapping to compare stores and verify it exists.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this seems good to me

Comment on lines +67 to +73
pub async fn update_storage_mapping<'e, T, E>(
detail: Option<&str>,
catalog_prefix: &str,
spec: T,
executor: E,
) -> sqlx::Result<bool>
where
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 yep, sounds good to me

Comment on lines +126 to +128
pub fn split_collection_and_recovery_storage(
storage: models::StorageDef,
) -> (models::StorageDef, models::StorageDef) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think so. We'll probably eventually move this whole file out from directives, but I think it's the best place for this logic.

Comment on lines +141 to +143
if !prefix.as_str().ends_with(COLLECTION_DATA_SUFFIX) {
*prefix = models::Prefix::new(format!("{prefix}{COLLECTION_DATA_SUFFIX}"));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm realizing that maybe it's not actually great for us to return the storage mappings with the /collection-data/ added... we'd presumably display /user-prefix/collection-data/ on the storage mappings UI, which seems like it'd be confusing to users if we then went and also wrote data to /user-prefix/recovery/.

We started out with the idea that we'd want to only expose a single representation of the storage mapping to users, and hide the split behind our API. I still think that's a good idea, but I think I was wrong about my previous suggestion to only conditionally add the collection-data/. Technically, it works, but it doesn't feel right.

I think it might be better if we instead strip the collection-data/ when we return the storage mappings in the API response? Then the UI would only ever see /user-prefix/, and we'd automatically add the collection-data/ when we persist changes. Does that make sense to you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants