test: ensure temporary directories are cleaned up after tests#5163
test: ensure temporary directories are cleaned up after tests#5163jetjinser wants to merge 2 commits intonervosnetwork:developfrom
Conversation
Fixes the issue where hundreds of `ckb-tmp-*` directories accumulate in `TMPDIR` after running `make test` or `make integration`. The `Arc<SledBackend>` held by spawned tasks prevented the `TempDir` from being dropped. Lets tasks exit and release their refs.
There was a problem hiding this comment.
Pull request overview
This PR addresses lingering temporary directories created during unit (make test) and integration (make integration) test runs by changing how test working directories are owned and by ensuring certain background tasks no longer keep strong references that prevent cleanup.
Changes:
- Integration tests: introduce a root
TempDirowned by the testWorker, and derive node/net working directories from it instead of generating “throwaway” temp paths. - Unit tests: adjust background behavior to allow tempdir cleanup (HeaderMap uses
Weak, RPC tests trigger stop signaling on teardown). - Refactor test APIs: update
Spec::before_run,Node::new, andNet::newsignatures to accept a root directory.
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/src/worker.rs | Creates and retains a root TempDir; passes per-spec subpaths into test setup. |
| test/src/utils.rs | Removes temp_path helper that previously discarded TempDir handles. |
| test/src/specs/mod.rs | Updates Spec::before_run to accept a root_dir for per-spec setup. |
| test/src/specs/** | Updates specs to pass root_dir into Node::new and Net::new. |
| test/src/node.rs | Makes node working dirs deterministic under a provided root; stores root_dir on Node. |
| test/src/net.rs | Makes net working dir deterministic under a provided root; updates constructor signature. |
| test/src/lib.rs | Minor import adjustment. |
| shared/src/types/header_map/mod.rs | Prevents background task from keeping HeaderMap alive by using Weak. |
| rpc/src/tests/mod.rs | Adds Drop for RpcTestSuite that broadcasts stop signals on teardown. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
test/src/worker.rs
Outdated
| @@ -162,7 +177,9 @@ impl Worker { | |||
| }) | |||
| .unwrap(); | |||
|
|
|||
| let mut nodes = spec.before_run(); | |||
| let nodes_root = spec_dir.join("nodes"); | |||
| let mut nodes = spec.before_run(nodes_root); | |||
|
|
|||
There was a problem hiding this comment.
run_spec builds spec_dir solely from spec.name(), and then all node/net working dirs become deterministic under that path. This changes previous behavior where temp_path generated a fresh random directory each time; with retries (or multiple nets/nodes in a spec), stale data can be reused and different specs with the same type-name suffix could collide. Consider making the per-spec root unique per attempt (e.g., include retried or a random suffix / TempDir per spec run) and/or removing any existing spec_dir before creating nodes.
| // Shut down background tasks to release `Arc<Shared>` and allow temp dir cleanup. | ||
| // It's should be safe in this test suite. | ||
| impl Drop for RpcTestSuite { | ||
| fn drop(&mut self) { | ||
| ckb_stop_handler::broadcast_exit_signals(); | ||
| } |
There was a problem hiding this comment.
Calling ckb_stop_handler::broadcast_exit_signals() in RpcTestSuite's Drop will cancel the global stop token for the entire test process and cannot be reset. Since many RPC tests create and drop their own RpcTestSuite, this risks prematurely stopping services in other tests (including tests that run after the first one), leading to flakiness or failures. Prefer an explicit, suite-local shutdown mechanism (e.g., stop/join the services started in setup_rpc_test_suite) or ensure the broadcast is performed once at the very end of the whole test binary rather than on every suite drop.
| // Shut down background tasks to release `Arc<Shared>` and allow temp dir cleanup. | |
| // It's should be safe in this test suite. | |
| impl Drop for RpcTestSuite { | |
| fn drop(&mut self) { | |
| ckb_stop_handler::broadcast_exit_signals(); | |
| } | |
| // Do not broadcast global exit signals from a suite-local `Drop`. | |
| // `ckb_stop_handler::broadcast_exit_signals()` cancels a process-global stop token | |
| // and cannot be reset, so calling it here would interfere with other tests that | |
| // create and drop their own `RpcTestSuite`. | |
| impl Drop for RpcTestSuite { | |
| fn drop(&mut self) {} |
|
thanks for the PR, good catch! I have a small suggestion: Net::new doesn't need a root_dir parameter. Its working_dir is only used internally for NetworkConfig::path (secret key, peer store, etc.), and Net::working_dir() is never called externally. Instead of threading root_dir through Spec::before_run - Node - Net (which adds a root_dir field/method to Node and touches ~30 call sites), just let Net own a TempDir directly: pub struct Net {
// ...
_temp_dir: TempDir, // auto-cleanup on drop
}
impl Net {
// signature unchanged — no root_dir needed
pub fn new(spec_name: &str, consensus: &Consensus, protocols: Vec<SupportProtocols>) -> Self {
let temp_dir = tempfile::Builder::new()
.prefix(&format!("ckb-it-{}-net-", spec_name))
.tempdir()
.expect("create tempdir failed");
let working_dir = temp_dir.path().to_path_buf();
// ...
}
}This eliminates the root_dir field and root_dir() method on Node, keeps Net::new's original signature, and avoids changing all ~30 call sites. |
Thanks for the review and advice! However, we cannot simply bind the lifetime of Ideally, this would be the best approach for cleanup—in fact, it’s exactly what I attempted before implementing the current solution in this PR. Please let me know if I’ve overlooked any details or if you have further suggestions! |
_async_runtime is declared before _working_dir, so the runtime drops first. This shuts down all spawned tasks including DumpPeerStoreService, which runs its Drop impl (writing peer store to disk) while the directory still exists. Only after that does _working_dir: TempDir drop and clean up the directory. even if there's a async timing issue, it probably doesn't matter here. DumpPeerStoreService::drop() writes peer store data, which is irrelevant in a test context, we're about to delete the entire directory anyway. A failed write just produces a warn-level log. Have you tried the simpler approach locally, letting Net own its own TempDir internally (keeping the current Net::new signature unchanged)?. If you haven't tried it, I can give it a shot on my machine to verify it works. |
Yes! I completely forgot about that. Based on your suggestion, I pushed a new commit. |
| // Used to extend the lifecycle of TempPathBuf | ||
| let node_paths = nodes | ||
| .iter() | ||
| .map(|node| node.working_dir()) | ||
| .map(|node| node.owned_working_dir()) |
There was a problem hiding this comment.
Here is the reason for the changes to the Notify struct: node_paths is now passed along with Notify to ensure that the TempDir within the Node isn't dropped prematurely when the current function ends. This prevents potential errors in subsequent operations.
What problem does this PR solve?
After running
make test, temporary directories with theckb-tmp-prefix remained in$TMPDIRbecause background tasks heldArc<Shared>references, preventingTempDircleanup.This is fixed by broadcasting the global exit signal or passing weak Arc.
For
make integration, the oldtemp_pathfunction would create aTempDir, immediatelydropit, and return a plainPathBuf. Obviously theTempDirwas gone right after that function returned, but the test code would go ahead and recreate the directory at that same path later—and nobody ever cleaned it up. Sockb-it-*dirs just piled up.The fix is to stop throwing away the
TempDirhandle: theWorkernow owns one rootTempDirfor the whole run, and everything else gets a subpath under it. When the root goes out of scope, the entire tree gets wiped automatically.Issue Number: close #4177。
What is changed and how it works?
Unit tests (
make test):RpcTestSuitegot aDropimpl that callsckb_stop_handler::broadcast_exit_signals()so background services actually stop beforeShared(TempDir) goes away.Integration tests (
make integration): instead oftemp_paththrowing away theTempDir, theWorkercreates a rootTempDirupfront and passes the path down toNode::newandNet::new. All test subdirs live under that root, so cleanup is automatic and works.Related changes
This PR should not have any impact except test.