Skip to content

chore: improve dataset cache#226

Draft
Fedir-Yatsenko wants to merge 3 commits intodevelopmentfrom
chore/improve-dataset-cache
Draft

chore: improve dataset cache#226
Fedir-Yatsenko wants to merge 3 commits intodevelopmentfrom
chore/improve-dataset-cache

Conversation

@Fedir-Yatsenko
Copy link
Collaborator

Applicable issues

N/A

Description of changes

Problem: When a dataset is retrieved from the in-memory cache, the system doesn't check whether its configuration has changed since it was cached. If an admin updates a dataset's config (e.g., dimensions, URN, citation, include_attributes), the stale cached dataset continues to be served until TTL expiry.

Solution: On cache hit in QuanthubSdmx21DataSourceHandler._get_dataset(), compare the cached dataset's config against the incoming config using Pydantic model equality. If they differ, the stale entry is removed and the dataset is reloaded from the SDMX source.

Also improved logging in _propagate_config_to_channel_datasets to trace per-channel-dataset decisions (skipped, auto-updated, needs reindex).

Checklist

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

Fedir-Yatsenko and others added 2 commits March 23, 2026 12:33
Check cached dataset config against incoming config on cache hit.
If they differ, remove the stale entry and reload from source.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@Fedir-Yatsenko Fedir-Yatsenko self-assigned this Mar 23, 2026
@Fedir-Yatsenko Fedir-Yatsenko requested a review from ypldan as a code owner March 23, 2026 10:43
@Fedir-Yatsenko Fedir-Yatsenko added the python Pull requests that update python code label Mar 23, 2026
@Fedir-Yatsenko
Copy link
Collaborator Author

Fedir-Yatsenko commented Mar 23, 2026

/deploy-review

GitHub actions run: 23433379190
Environment URL: review-environment | pipeline

Replace TTL-based dataset cache with AsyncLoadingCache that accepts
a loader and validator per get() call. Concurrent loads for the same
key are deduplicated by storing futures directly. Move cache logic
from _get_dataset to get_dataset for cleaner separation.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@Fedir-Yatsenko
Copy link
Collaborator Author

Fedir-Yatsenko commented Mar 23, 2026

/deploy-review

GitHub actions run: 23445780504
Environment URL: review-environment | pipeline

@Fedir-Yatsenko Fedir-Yatsenko marked this pull request as draft March 23, 2026 15:41
dataset_config = self.parse_data_set_config(config)
return await self._dataset_cache.get(
key=str(entity_id),
loader=lambda: self._get_dataset(entity_id, title, config, auth_context),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is not correct. If allow_offline=True and any problems occur, get_dataset should return an OfflineDataset without caching it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant