Add maxfailures option to limit test failures before stopping#61560
Add maxfailures option to limit test failures before stopping#61560VanitasCodes wants to merge 1 commit intoJuliaLang:masterfrom
Conversation
|
@oscardssmith Is this along the lines of what you had in mind? |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a maxfailures mechanism to Julia’s Test stdlib to stop a test run after a configurable number of failures/errors, without changing default @testset behavior.
Changes:
- Introduces global atomic failure counter + limit, plus exported setters/getters/reset helpers.
- Adds
MaxFailuresErrorand integrates it with existing failfast propagation/printing paths. - Adds stdlib tests validating default behavior, stopping at N failures, counting errors, and invalid input.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| stdlib/Test/src/Test.jl | Implements global max-failures tracking, exports API, adds MaxFailuresError, and wires it into record and top-level printing. |
| stdlib/Test/test/runtests.jl | Adds integration-style tests that spawn a Julia process to verify max-failures behavior and messaging. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| set_max_failures(n::Integer) | ||
|
|
||
| Set the maximum number of test failures (fails + errors) allowed before stopping. | ||
| Default is `typemax(Int)` (no limit). Set to `0` to stop on first failure. |
There was a problem hiding this comment.
The docstring states “Set to 0 to stop on first failure”, but with the current record logic (count >= limit) set_max_failures(1) also stops on the first failure (and the added tests use 1 for “stop after 1 failure”). Please clarify the semantics in the docstring to match the implemented/tested behavior (e.g., document that n=1 stops after the first failure, or adjust the comparison logic if n is intended to mean “failures tolerated before stopping”).
| Default is `typemax(Int)` (no limit). Set to `0` to stop on first failure. | |
| Default is `typemax(Int)` (no limit). Set to `1` to stop after the first failure. |
| !!! compat "Julia 1.14" | ||
| This function requires at least Julia 1.14. | ||
| """ | ||
| get_max_failures() = Threads.atomic_add!(global_failure_limit, 0) |
There was a problem hiding this comment.
Using atomic_add!(x, 0) to read an atomic performs a read-modify-write (RMW) operation, which is heavier than an atomic load and can increase contention. Prefer an atomic load (e.g., global_failure_limit[] / global_failure_count[], or a dedicated atomic_load if used elsewhere in the file) for pure reads.
| !!! compat "Julia 1.14" | ||
| This function requires at least Julia 1.14. | ||
| """ | ||
| get_failure_count() = Threads.atomic_add!(global_failure_count, 0) |
There was a problem hiding this comment.
Using atomic_add!(x, 0) to read an atomic performs a read-modify-write (RMW) operation, which is heavier than an atomic load and can increase contention. Prefer an atomic load (e.g., global_failure_limit[] / global_failure_count[], or a dedicated atomic_load if used elsewhere in the file) for pure reads.
| Threads.atomic_xchg!(global_failure_limit, Int(n)) | ||
| return n |
There was a problem hiding this comment.
set_max_failures stores Int(n) but returns n (which may be a different type, e.g. BigInt). Consider returning the stored value (Int(n)) or nothing to avoid surprising type/behavior mismatches for callers.
| Threads.atomic_xchg!(global_failure_limit, Int(n)) | |
| return n | |
| limit = Int(n) | |
| Threads.atomic_xchg!(global_failure_limit, limit) | |
| return limit |
| ts.failfast && throw(FailFastError()) | ||
| # check maxfailures limit; +1 because atomic_add! returns the old value | ||
| count = Threads.atomic_add!(global_failure_count, 1) + 1 | ||
| count >= global_failure_limit[] && throw(MaxFailuresError(global_failure_limit[], count)) |
There was a problem hiding this comment.
global_failure_limit[] is read twice; if another task changes the limit between the check and constructing MaxFailuresError, the thrown error can report a different limit than the one that triggered the stop. Load the limit once into a local limit and use it for both the comparison and the exception payload (this also reduces atomic-load traffic).
| count >= global_failure_limit[] && throw(MaxFailuresError(global_failure_limit[], count)) | |
| limit = global_failure_limit[] | |
| count >= limit && throw(MaxFailuresError(limit, count)) |
| result = read(pipeline(ignorestatus(cmd), stderr=devnull), String) | ||
| @test occursin("Max failures reached: 1", result) | ||
| @test occursin("First", result) | ||
| @test !occursin(r"Test Summary:.*\n.*Second", result) |
There was a problem hiding this comment.
These regexes assume \n line endings. To make the tests more robust across platforms/environments, consider matching \r?\n (or otherwise avoiding hard-coding newline style) so the assertion doesn’t become Windows-sensitive.
| result = read(pipeline(ignorestatus(cmd), stderr=devnull), String) | ||
| @test occursin("Max failures reached: 2", result) | ||
| @test occursin("First", result) | ||
| @test !occursin(r"Test Summary:.*\n.*Second", result) |
There was a problem hiding this comment.
These regexes assume \n line endings. To make the tests more robust across platforms/environments, consider matching \r?\n (or otherwise avoiding hard-coding newline style) so the assertion doesn’t become Windows-sensitive.
| result = read(pipeline(ignorestatus(cmd), stderr=devnull), String) | ||
| @test occursin("Max failures reached: 1", result) | ||
| @test occursin("First", result) | ||
| @test !occursin(r"Test Summary:.*\n.*Second", result) |
There was a problem hiding this comment.
These regexes assume \n line endings. To make the tests more robust across platforms/environments, consider matching \r?\n (or otherwise avoiding hard-coding newline style) so the assertion doesn’t become Windows-sensitive.
| export GenericString, GenericSet, GenericDict, GenericArray, GenericOrder | ||
| export TestSetException | ||
| export TestLogger, LogRecord | ||
| export set_max_failures, get_max_failures, get_failure_count, reset_failure_count |
There was a problem hiding this comment.
these probably shouldn't be exported
|
|
||
| # Global state for tracking test failures across testsets | ||
| const global_failure_count = Threads.Atomic{Int}(0) | ||
| const global_failure_limit = Threads.Atomic{Int}(typemax(Int)) |
There was a problem hiding this comment.
this probably doesn't need to be atomic.
There was a problem hiding this comment.
it also should be 0 by default if we want to be backward compatible.
| """ | ||
| set_max_failures(n::Integer) | ||
|
|
||
| Set the maximum number of test failures (fails + errors) allowed before stopping. | ||
| Default is `typemax(Int)` (no limit). Set to `0` to stop on first failure. | ||
|
|
||
| !!! compat "Julia 1.14" | ||
| This function requires at least Julia 1.14. | ||
| """ | ||
| function set_max_failures(n::Integer) | ||
| n >= 0 || throw(ArgumentError("maxfailures must be non-negative, got $n")) | ||
| Threads.atomic_xchg!(global_failure_limit, Int(n)) | ||
| return n | ||
| end | ||
|
|
||
| """ | ||
| get_max_failures() | ||
|
|
||
| Get the current failure limit. Returns `typemax(Int)` if no limit is set. | ||
|
|
||
| !!! compat "Julia 1.14" | ||
| This function requires at least Julia 1.14. | ||
| """ | ||
| get_max_failures() = Threads.atomic_add!(global_failure_limit, 0) |
There was a problem hiding this comment.
by making the failure limit nonatomic you could delete both of these functions.
| """ | ||
| get_failure_count() | ||
|
|
||
| Get the current count of test failures (fails + errors). | ||
|
|
||
| !!! compat "Julia 1.14" | ||
| This function requires at least Julia 1.14. | ||
| """ | ||
| get_failure_count() = Threads.atomic_add!(global_failure_count, 0) |
There was a problem hiding this comment.
This can just be global_failure_count[] (and thus we likely don't need a function for it)
| """ | ||
| reset_failure_count() | ||
|
|
||
| Reset the failure counter to zero. Called at the start of test runs. | ||
|
|
||
| !!! compat "Julia 1.14" | ||
| This function requires at least Julia 1.14. | ||
| """ | ||
| function reset_failure_count() | ||
| Threads.atomic_xchg!(global_failure_count, 0) | ||
| return nothing | ||
| end |
There was a problem hiding this comment.
This can just be global_failure_count[] = 0 (and thus we likely don't need a function for it)
|
Overall, I think this is the right direction (although seeing the changes in Pkg would be useful to see this in use). |
|
Thanks for putting this together!
Do those functions need to be exported (or public)? |
|
@oscardssmith I have a few questions before I revise. You mentioned the default should be 0 for backward compatibility. I originally went with You're completely right about the atomics. I was overthinking the concurrency case, but tests run sequentially in a single process anyway. I'll switch to a simple For the API, if we make the limit nonatomic and just use direct access like Should I draft a companion PR for Pkg showing how |
|
@DilumAluthge I originally exported them thinking they might be useful for custom test runners, but @oscardssmith pointed out they probably shouldn't be. I think you're right that they should either be |
Keeping them internal (non-public) sounds good to me! |
Fixes #21594
Fixes #23375
This came out of the triage discussion on #61483, where @oscardssmith suggested that instead of changing the default behavior of
@testset, we should add amaxfailuresoption that test runners can use to control how many failures are tolerated before stopping execution. This PR implements the Test stdlib side of that.There's a global atomic counter that tracks failures and errors across testsets, and a configurable limit. When the count hits the limit, a
MaxFailuresErroris thrown and execution stops. The default limit istypemax(Int), so existing behavior is completely unchanged unless you explicitly callset_max_failures.Four new functions are exported from Test:
set_max_failures(n)- set the limitget_max_failures()- read the current limitget_failure_count()- read the current failure countreset_failure_count()- reset the counter to zeroThe
MaxFailuresErrorintegrates with the existingis_failfast_errormachinery, so it propagates correctly through nested testsets the same wayFailFastErrordoes. If bothfailfastandmaxfailuresare set,failfasttakes precedence since it's checked first inrecord.The intended usage is for something like
Pkg.test(; maxfailures=10)to callTest.set_max_failures(10)before running tests, which would be a follow-up PR.