chore: improve dataset cache by Fedir-Yatsenko · Pull Request #226 · epam/statgpt-backend

Fedir-Yatsenko · 2026-03-23T10:43:59Z

Applicable issues

N/A

Description of changes

Problem: When a dataset is retrieved from the in-memory cache, the system doesn't check whether its configuration has changed since it was cached. If an admin updates a dataset's config (e.g., dimensions, URN, citation, include_attributes), the stale cached dataset continues to be served until TTL expiry.

Solution: On cache hit in QuanthubSdmx21DataSourceHandler._get_dataset(), compare the cached dataset's config against the incoming config using Pydantic model equality. If they differ, the stale entry is removed and the dataset is reloaded from the SDMX source.

Also improved logging in _propagate_config_to_channel_datasets to trace per-channel-dataset decisions (skipped, auto-updated, needs reindex).

Checklist

Title of the pull request follows Conventional Commits specification
Deployed and tested in a Review environment.

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

Check cached dataset config against incoming config on cache hit. If they differ, remove the stale entry and reload from source. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Fedir-Yatsenko · 2026-03-23T10:44:19Z

/deploy-review

GitHub actions run: 23433379190
Environment URL: review-environment | pipeline

Replace TTL-based dataset cache with AsyncLoadingCache that accepts a loader and validator per get() call. Concurrent loads for the same key are deduplicated by storing futures directly. Move cache logic from _get_dataset to get_dataset for cleaner separation. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Fedir-Yatsenko · 2026-03-23T15:36:49Z

/deploy-review

GitHub actions run: 23445780504
Environment URL: review-environment | pipeline

Fedir-Yatsenko · 2026-03-23T15:45:20Z

statgpt/common/data/quanthub/v21/datasource.py

+                dataset_config = self.parse_data_set_config(config)
+                return await self._dataset_cache.get(
+                    key=str(entity_id),
+                    loader=lambda: self._get_dataset(entity_id, title, config, auth_context),


This behavior is not correct. If allow_offline=True and any problems occur, get_dataset should return an OfflineDataset without caching it

Fedir-Yatsenko and others added 2 commits March 23, 2026 12:33

fix: invalidate dataset cache when config changes

bd552b2

Check cached dataset config against incoming config on cache hit. If they differ, remove the stale entry and reload from source. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Improve logging

f3982df

Fedir-Yatsenko requested a review from kryachkow March 23, 2026 10:43

Fedir-Yatsenko self-assigned this Mar 23, 2026

Fedir-Yatsenko requested a review from ypldan as a code owner March 23, 2026 10:43

Fedir-Yatsenko added the python Pull requests that update python code label Mar 23, 2026

Fedir-Yatsenko marked this pull request as draft March 23, 2026 15:41

Fedir-Yatsenko commented Mar 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: improve dataset cache#226

chore: improve dataset cache#226
Fedir-Yatsenko wants to merge 3 commits intodevelopmentfrom
chore/improve-dataset-cache

Fedir-Yatsenko commented Mar 23, 2026

Uh oh!

Fedir-Yatsenko commented Mar 23, 2026 •

edited by statgpt-actions

Loading

Uh oh!

Fedir-Yatsenko commented Mar 23, 2026 •

edited by statgpt-actions

Loading

Uh oh!

Fedir-Yatsenko Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Fedir-Yatsenko commented Mar 23, 2026

Applicable issues

Description of changes

Checklist

Uh oh!

Fedir-Yatsenko commented Mar 23, 2026 • edited by statgpt-actions Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fedir-Yatsenko commented Mar 23, 2026 • edited by statgpt-actions Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fedir-Yatsenko Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fedir-Yatsenko commented Mar 23, 2026 •

edited by statgpt-actions

Loading

Fedir-Yatsenko commented Mar 23, 2026 •

edited by statgpt-actions

Loading