Tests(py): speed up local restore flows#10499
Tests(py): speed up local restore flows#10499nmanovic wants to merge 1 commit intov3-pr4a-runtime-foundationfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b8f310d4c9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "SELECT pg_terminate_backend(pg_stat_activity.pid) " | ||
| "FROM pg_stat_activity " | ||
| "WHERE pg_stat_activity.datname = %s " | ||
| "AND pid <> pg_backend_pid()", | ||
| (target,), |
There was a problem hiding this comment.
Terminate source DB sessions before templated restore
CREATE DATABASE ... WITH TEMPLATE requires the template database to have no active sessions, but this code only terminates connections to target_db. During local startup we call restore_from_template(source_db="cvat", target_db="test_db"), and cvat can still have live connections from server/worker processes, which makes restore fail with “source database is being accessed by other users.” The legacy restore.sql explicitly terminated :from when cloning from cvat, so this is a regression in restore reliability.
Useful? React with 👍 / 👎.
| if len(keys_to_delete) >= 1000: | ||
| self._inmem_db0.delete(*keys_to_delete) | ||
| keys_to_delete.clear() |
There was a problem hiding this comment.
Avoid deleting keys while SCAN iteration is in progress
This loop mutates DB0 (delete) before SCAN has completed, but Redis only guarantees full key coverage for SCAN when the keyspace is not changing during iteration. With large keysets (where the >= 1000 branch is hit), some keys can be skipped and survive the restore, leaking Redis state between tests. The previous local restore path used a single key listing + delete pass, so this introduces a completeness risk in reset behavior.
Useful? React with 👍 / 👎.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
3bc10fb to
f2c8e67
Compare
b8f310d to
fec0185
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fec018501a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| queue.scheduled_job_registry.remove(job, delete_job=False) | ||
| elif status == JobStatus.DEFERRED: | ||
| queue.deferred_job_registry.remove(job, delete_job=False) | ||
|
|
||
| job.cancel(enqueue_dependents=False) |
There was a problem hiding this comment.
Preserve periodic scheduled jobs during queue drain
The drain logic unconditionally removes and cancels every scheduled/deferred job in the selected queues, and local restores call it for _BACKGROUND_JOB_QUEUES (which includes cleaning). In this repo, cleaning is also where PERIODIC_RQ_JOBS are scheduled, so the first DB reset cancels cron jobs like clean_up_sessions and they are not re-created until containers restart. That changes runtime behavior for the rest of the test session and can hide bugs in flows that rely on periodic RQ tasks.
Useful? React with 👍 / 👎.
fec0185 to
6719c1e
Compare
f2c8e67 to
0a878d3
Compare
6719c1e to
943f5b3
Compare
0a878d3 to
ce7a4fe
Compare
943f5b3 to
86c7dba
Compare
ce7a4fe to
29cef23
Compare
86c7dba to
753f499
Compare
29cef23 to
360c5ac
Compare
Summary
This is the next stacked PR after
v3-pr4a-runtime-foundation.Scope is intentionally limited to faster and safer local test restore/reset behavior:
Out of scope:
Why
The foundation PR introduced explicit local runtime ownership, but restore/reset was still paying shell/process startup cost on every operation and still relied on legacy wildcard fixture exports.
This PR keeps semantics the same while making repeated local reset paths materially cheaper and less flaky.
It also includes the local CVAT data restore optimization so repeated data resets do not repack/copy the baseline archive every time.
What changed
tests/python/infra/db_restore.pyCREATE DATABASE ... WITH TEMPLATE ...tests/python/infra/redis_restore.pytests/python/infra/rq_cleanup.pytests/python/infra/fixtures.pytests/python/conftest.pyinfra.fixturesexplicitly instead of re-exporting restore fixtures from the legacy moduletests/python/infra/instances/local_instance.pytests/python/requirements.txtMeasured speedup
Measured on the same local stack, same machine, 120 iterations each.
PostgreSQL restore
79.25 ms22.13 ms3.6xRedis in-memory restore
55.94 ms0.35 ms161xRedis on-disk restore
54.11 ms0.19 ms289xCVAT data restore
102.35 ms68.51 ms1.5xValidation
Static checks:
py_compileblack --checkisort --check --diff --resolve-all-configspylinttyposremarkTargeted restore-heavy tests:
pytest tests/python/rest_api/test_auth.py tests/python/rest_api/test_access_tokens.py --platform=local --infra=reuse55 passedFull Python suite:
pytest tests/python --platform=local --infra=reuse2178 passed, 20 skipped, 1 xfailedCI-like Django unit tests:
docker compose -f docker-compose.yml -f docker-compose.dev.yml -f docker-compose.ci.yml run ... cvat_ci python manage.py test cvat/apps -v 2412 tests,OK (skipped=58)Kube compatibility smoke:
pytest tests/python/rest_api/test_auth.py::TestBasicAuth::test_can_use_basic_auth --platform=kube -q1 passedReview notes