-
Notifications
You must be signed in to change notification settings - Fork 87
Update storage mapping mutation #2605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
psFried
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great so far. I left one minor comment, but apart from that this should be good to go once it's rebased on top of the changes to the other PR
crates/control-plane-api/src/server/public/graphql/storage_mappings.rs
Outdated
Show resolved
Hide resolved
ca9af4c to
ac89dcd
Compare
|
republish logic updated and branch rebased on top of the other PR |
psFried
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a bit more feedback inline, but should be good after that
crates/control-plane-api/src/server/public/graphql/storage_mappings.rs
Outdated
Show resolved
Hide resolved
crates/control-plane-api/src/server/public/graphql/storage_mappings.rs
Outdated
Show resolved
Hide resolved
| "updated storage mapping" | ||
| ); | ||
|
|
||
| // TODO: if the primary (first) fragmentStore changed, republish the entire prefix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: seems like the plan is for the client to handle the republication, so this can be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this gql operation need to be aware of the republishing process, or is that just handled elsewhere when changes are detected?
crates/control-plane-api/src/server/public/graphql/storage_mappings.rs
Outdated
Show resolved
Hide resolved
8323a70 to
50cd8ec
Compare
…mutations Split the single upsertStorageMapping GraphQL mutation into separate createStorageMapping and updateStorageMapping mutations with distinct behavior: - createStorageMapping: fails if a mapping already exists, checks for existing specs that would be affected - updateStorageMapping: fails if no mapping exists, returns whether a republish is needed due to store changes
46431bd to
cbb12a9
Compare
| let sampled_specs = sqlx::query_scalar!( | ||
| r#" | ||
| SELECT catalog_name | ||
| FROM live_specs | ||
| WHERE starts_with(catalog_name, $1) | ||
| AND spec IS NOT NULL | ||
| LIMIT 5 | ||
| "#, | ||
| &catalog_prefix, | ||
| ) | ||
| .fetch_all(&mut *txn) | ||
| .await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@psFried
okay to leave these inline or would you rather keep all queries in crates/control-plane-api/src/directives/storage_mappings.rs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think inline is generally fine, as long as it's not something we're trying to re-use elsewhere. For this particular query, inline seems fine. But we have some existing functions in crates/control-plane-api/src/directives/storage_mappings.rs that seem like they'd work for inserts and updates to storage mappings. I think it's worth using those, unless there's a reason not to.
| sqlx::query!( | ||
| r#" | ||
| UPDATE storage_mappings | ||
| SET spec = $2, detail = $3, updated_at = now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we be manually updating the updated_at value? i would expect this to be handled with an UPDATE trigger...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we just handle those manually instead of with triggers. It's not something I feel especially strongly about, and we do still use triggers here and there for other things. Though I think my general position is to try to avoid triggers when we can, to make things more explicit.
That said, I think the existing upsert function is probably what we should use here instead of inline sql, and that already takes care of setting updated_at.
| sqlx::query!( | ||
| r#" | ||
| UPDATE storage_mappings | ||
| SET spec = $2, detail = $3, updated_at = now() | ||
| WHERE catalog_prefix = 'recovery/' || $1 | ||
| "#, | ||
| &catalog_prefix as &str, | ||
| crate::TextJson(&recovery_storage) as crate::TextJson<&models::StorageDef>, | ||
| detail, | ||
| ) | ||
| .execute(&mut *txn) | ||
| .await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this recovery/ logic is right but would love some extra scrutiny here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my other comments about factoring out a function that returns both the storage defs from the original input, and about reusing the existing functions for updating the database. I'm thinking we should use upsert_storage_mapping, and then this inline query would not be necessary.
psFried
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good start. I left some comments inline, mostly about trying to re-use the existing logic from directives::storage_mappings. LMK if you'd like to talk through any of that
| let sampled_specs = sqlx::query_scalar!( | ||
| r#" | ||
| SELECT catalog_name | ||
| FROM live_specs | ||
| WHERE starts_with(catalog_name, $1) | ||
| AND spec IS NOT NULL | ||
| LIMIT 5 | ||
| "#, | ||
| &catalog_prefix, | ||
| ) | ||
| .fetch_all(&mut *txn) | ||
| .await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think inline is generally fine, as long as it's not something we're trying to re-use elsewhere. For this particular query, inline seems fine. But we have some existing functions in crates/control-plane-api/src/directives/storage_mappings.rs that seem like they'd work for inserts and updates to storage mappings. I think it's worth using those, unless there's a reason not to.
| .cloned() | ||
| .map(|mut store| { | ||
| let prefix = store.prefix_mut(); | ||
| *prefix = models::Prefix::new(format!("{prefix}collection-data/")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I foresee potential issues with this logic in update scenarios. The UI will fetch the existing storage mappings, which already have the collection-data/ prefix, and then pass those storage mappings to update, which would add the prefix again. So I'm thinking at a minimum we'd want to change the corresponding logic that's used during the update to check whether the prefix is there already. But also, this feels like it ought to be in a function that's shared between insert and update operations. Ideally, we'd have a pure function that accepts a single models::StorageDef, and returns a tuple of both the collection and recovery storage defs. Then we could continue to use that same function, even after we no longer persist the recovery mappings in postgres.
| sqlx::query!( | ||
| r#" | ||
| UPDATE storage_mappings | ||
| SET spec = $2, detail = $3, updated_at = now() | ||
| WHERE catalog_prefix = 'recovery/' || $1 | ||
| "#, | ||
| &catalog_prefix as &str, | ||
| crate::TextJson(&recovery_storage) as crate::TextJson<&models::StorageDef>, | ||
| detail, | ||
| ) | ||
| .execute(&mut *txn) | ||
| .await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my other comments about factoring out a function that returns both the storage defs from the original input, and about reusing the existing functions for updating the database. I'm thinking we should use upsert_storage_mapping, and then this inline query would not be necessary.
b5f7b81 to
1c67122
Compare
| pub async fn update_storage_mapping<'e, T, E>( | ||
| detail: Option<&str>, | ||
| catalog_prefix: &str, | ||
| spec: T, | ||
| executor: E, | ||
| ) -> sqlx::Result<bool> | ||
| where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same rationale as the GraphQL mutations - separate create / update functions will generally reduce complexity. Keeping upsert for now to minimize the diff and it's convenient for the recovery/ mappings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 yep, sounds good to me
| pub fn split_collection_and_recovery_storage( | ||
| storage: models::StorageDef, | ||
| ) -> (models::StorageDef, models::StorageDef) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a good place for this util function to live? and related tests below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think so. We'll probably eventually move this whole file out from directives, but I think it's the best place for this logic.
| // Begin a transaction to fetch existing mapping and update. | ||
| let mut txn = env.pg_pool.begin().await?; | ||
|
|
||
| // Fetch existing storage mapping to compare stores and verify it exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the query just below here looks similar to the existing fetch function in the shared file, except here we're locking the row with FOR UPDATE - i'm inclined to keep this query defined here until we find another place we need the row lock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this seems good to me
| upsert_storage_mapping( | ||
| detail.as_deref(), | ||
| &format!("recovery/{catalog_prefix}"), | ||
| &recovery_storage, | ||
| &mut txn, | ||
| ) | ||
| .await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i mention this elsewhere but i'm keeping the upsert function here because it does feel convenient for the recovery case. I'd make a case for removing upsert entirely when the recovery/ mappings are gone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me 👍
| if !prefix.as_str().ends_with(COLLECTION_DATA_SUFFIX) { | ||
| *prefix = models::Prefix::new(format!("{prefix}{COLLECTION_DATA_SUFFIX}")); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to talk this through - the collection-data/ part may have already been added in a previous operation, so we want to check if it's there first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm realizing that maybe it's not actually great for us to return the storage mappings with the /collection-data/ added... we'd presumably display /user-prefix/collection-data/ on the storage mappings UI, which seems like it'd be confusing to users if we then went and also wrote data to /user-prefix/recovery/.
We started out with the idea that we'd want to only expose a single representation of the storage mapping to users, and hide the split behind our API. I still think that's a good idea, but I think I was wrong about my previous suggestion to only conditionally add the collection-data/. Technically, it works, but it doesn't feel right.
I think it might be better if we instead strip the collection-data/ when we return the storage mappings in the API response? Then the UI would only ever see /user-prefix/, and we'd automatically add the collection-data/ when we persist changes. Does that make sense to you?
| .into_iter() | ||
| .map(|mut store| { | ||
| let prefix = store.prefix_mut(); | ||
| if let Some(base) = prefix.as_str().strip_suffix(COLLECTION_DATA_SUFFIX) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to above - the collection-data/ part may exist from a previous operation, so we remove it for the recovery/ mapping
|
@psFried took another pass here - i believe i've incorporated all of your feedback. I'm making a case to (eventually) deprecate the existing upsert function and move toward separate create/update functions in directives/storage_mappings.rs for the same reason we gave for the gql mutations. Happy to chat through this if you disagree |
36eee1d to
eed89cb
Compare
eed89cb to
dd29c4d
Compare
psFried
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes all look good, except I realized that we might want to change how we handle the collection-data/ being added. LMK if you want to talk that over on a VC
| upsert_storage_mapping( | ||
| detail.as_deref(), | ||
| &format!("recovery/{catalog_prefix}"), | ||
| &recovery_storage, | ||
| &mut txn, | ||
| ) | ||
| .await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me 👍
| // Begin a transaction to fetch existing mapping and update. | ||
| let mut txn = env.pg_pool.begin().await?; | ||
|
|
||
| // Fetch existing storage mapping to compare stores and verify it exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this seems good to me
| pub async fn update_storage_mapping<'e, T, E>( | ||
| detail: Option<&str>, | ||
| catalog_prefix: &str, | ||
| spec: T, | ||
| executor: E, | ||
| ) -> sqlx::Result<bool> | ||
| where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 yep, sounds good to me
| pub fn split_collection_and_recovery_storage( | ||
| storage: models::StorageDef, | ||
| ) -> (models::StorageDef, models::StorageDef) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think so. We'll probably eventually move this whole file out from directives, but I think it's the best place for this logic.
| if !prefix.as_str().ends_with(COLLECTION_DATA_SUFFIX) { | ||
| *prefix = models::Prefix::new(format!("{prefix}{COLLECTION_DATA_SUFFIX}")); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm realizing that maybe it's not actually great for us to return the storage mappings with the /collection-data/ added... we'd presumably display /user-prefix/collection-data/ on the storage mappings UI, which seems like it'd be confusing to users if we then went and also wrote data to /user-prefix/recovery/.
We started out with the idea that we'd want to only expose a single representation of the storage mapping to users, and hide the split behind our API. I still think that's a good idea, but I think I was wrong about my previous suggestion to only conditionally add the collection-data/. Technically, it works, but it doesn't feel right.
I think it might be better if we instead strip the collection-data/ when we return the storage mappings in the API response? Then the UI would only ever see /user-prefix/, and we'd automatically add the collection-data/ when we persist changes. Does that make sense to you?
Description: Adds
updateStorageMappingGraphQL mutation for modifying existing storage mappings. Refactors shared validation and health check logic into reusable helper functions. The update response includes arepublishfield indicating whether the primary storage bucket changed, which signals that affected specs will need republishing.Workflow steps: Call
updateStorageMappingwith the same input shape ascreateStorageMapping:Use
dryRun: trueto validate input and check if a republish would be required without persisting changes.Notes for reviewers:
republishfield compares the incoming list of stores with the existing