Skip to content

Add maxfailures option to limit test failures before stopping#61560

Open
VanitasCodes wants to merge 1 commit intoJuliaLang:masterfrom
VanitasCodes:add-maxfailures-option
Open

Add maxfailures option to limit test failures before stopping#61560
VanitasCodes wants to merge 1 commit intoJuliaLang:masterfrom
VanitasCodes:add-maxfailures-option

Conversation

@VanitasCodes
Copy link
Copy Markdown

Fixes #21594
Fixes #23375

This came out of the triage discussion on #61483, where @oscardssmith suggested that instead of changing the default behavior of @testset, we should add a maxfailures option that test runners can use to control how many failures are tolerated before stopping execution. This PR implements the Test stdlib side of that.

There's a global atomic counter that tracks failures and errors across testsets, and a configurable limit. When the count hits the limit, a MaxFailuresError is thrown and execution stops. The default limit is typemax(Int), so existing behavior is completely unchanged unless you explicitly call set_max_failures.

Four new functions are exported from Test:

  • set_max_failures(n) - set the limit
  • get_max_failures() - read the current limit
  • get_failure_count() - read the current failure count
  • reset_failure_count() - reset the counter to zero

The MaxFailuresError integrates with the existing is_failfast_error machinery, so it propagates correctly through nested testsets the same way FailFastError does. If both failfast and maxfailures are set, failfast takes precedence since it's checked first in record.

The intended usage is for something like Pkg.test(; maxfailures=10) to call Test.set_max_failures(10) before running tests, which would be a follow-up PR.

Copilot AI review requested due to automatic review settings April 11, 2026 02:07
@VanitasCodes
Copy link
Copy Markdown
Author

@oscardssmith Is this along the lines of what you had in mind?

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a maxfailures mechanism to Julia’s Test stdlib to stop a test run after a configurable number of failures/errors, without changing default @testset behavior.

Changes:

  • Introduces global atomic failure counter + limit, plus exported setters/getters/reset helpers.
  • Adds MaxFailuresError and integrates it with existing failfast propagation/printing paths.
  • Adds stdlib tests validating default behavior, stopping at N failures, counting errors, and invalid input.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File Description
stdlib/Test/src/Test.jl Implements global max-failures tracking, exports API, adds MaxFailuresError, and wires it into record and top-level printing.
stdlib/Test/test/runtests.jl Adds integration-style tests that spawn a Julia process to verify max-failures behavior and messaging.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

set_max_failures(n::Integer)

Set the maximum number of test failures (fails + errors) allowed before stopping.
Default is `typemax(Int)` (no limit). Set to `0` to stop on first failure.
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring states “Set to 0 to stop on first failure”, but with the current record logic (count >= limit) set_max_failures(1) also stops on the first failure (and the added tests use 1 for “stop after 1 failure”). Please clarify the semantics in the docstring to match the implemented/tested behavior (e.g., document that n=1 stops after the first failure, or adjust the comparison logic if n is intended to mean “failures tolerated before stopping”).

Suggested change
Default is `typemax(Int)` (no limit). Set to `0` to stop on first failure.
Default is `typemax(Int)` (no limit). Set to `1` to stop after the first failure.

Copilot uses AI. Check for mistakes.
!!! compat "Julia 1.14"
This function requires at least Julia 1.14.
"""
get_max_failures() = Threads.atomic_add!(global_failure_limit, 0)
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using atomic_add!(x, 0) to read an atomic performs a read-modify-write (RMW) operation, which is heavier than an atomic load and can increase contention. Prefer an atomic load (e.g., global_failure_limit[] / global_failure_count[], or a dedicated atomic_load if used elsewhere in the file) for pure reads.

Copilot uses AI. Check for mistakes.
!!! compat "Julia 1.14"
This function requires at least Julia 1.14.
"""
get_failure_count() = Threads.atomic_add!(global_failure_count, 0)
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using atomic_add!(x, 0) to read an atomic performs a read-modify-write (RMW) operation, which is heavier than an atomic load and can increase contention. Prefer an atomic load (e.g., global_failure_limit[] / global_failure_count[], or a dedicated atomic_load if used elsewhere in the file) for pure reads.

Copilot uses AI. Check for mistakes.
Comment on lines +61 to +62
Threads.atomic_xchg!(global_failure_limit, Int(n))
return n
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_max_failures stores Int(n) but returns n (which may be a different type, e.g. BigInt). Consider returning the stored value (Int(n)) or nothing to avoid surprising type/behavior mismatches for callers.

Suggested change
Threads.atomic_xchg!(global_failure_limit, Int(n))
return n
limit = Int(n)
Threads.atomic_xchg!(global_failure_limit, limit)
return limit

Copilot uses AI. Check for mistakes.
ts.failfast && throw(FailFastError())
# check maxfailures limit; +1 because atomic_add! returns the old value
count = Threads.atomic_add!(global_failure_count, 1) + 1
count >= global_failure_limit[] && throw(MaxFailuresError(global_failure_limit[], count))
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

global_failure_limit[] is read twice; if another task changes the limit between the check and constructing MaxFailuresError, the thrown error can report a different limit than the one that triggered the stop. Load the limit once into a local limit and use it for both the comparison and the exception payload (this also reduces atomic-load traffic).

Suggested change
count >= global_failure_limit[] && throw(MaxFailuresError(global_failure_limit[], count))
limit = global_failure_limit[]
count >= limit && throw(MaxFailuresError(limit, count))

Copilot uses AI. Check for mistakes.
result = read(pipeline(ignorestatus(cmd), stderr=devnull), String)
@test occursin("Max failures reached: 1", result)
@test occursin("First", result)
@test !occursin(r"Test Summary:.*\n.*Second", result)
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These regexes assume \n line endings. To make the tests more robust across platforms/environments, consider matching \r?\n (or otherwise avoiding hard-coding newline style) so the assertion doesn’t become Windows-sensitive.

Copilot uses AI. Check for mistakes.
result = read(pipeline(ignorestatus(cmd), stderr=devnull), String)
@test occursin("Max failures reached: 2", result)
@test occursin("First", result)
@test !occursin(r"Test Summary:.*\n.*Second", result)
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These regexes assume \n line endings. To make the tests more robust across platforms/environments, consider matching \r?\n (or otherwise avoiding hard-coding newline style) so the assertion doesn’t become Windows-sensitive.

Copilot uses AI. Check for mistakes.
result = read(pipeline(ignorestatus(cmd), stderr=devnull), String)
@test occursin("Max failures reached: 1", result)
@test occursin("First", result)
@test !occursin(r"Test Summary:.*\n.*Second", result)
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These regexes assume \n line endings. To make the tests more robust across platforms/environments, consider matching \r?\n (or otherwise avoiding hard-coding newline style) so the assertion doesn’t become Windows-sensitive.

Copilot uses AI. Check for mistakes.
export GenericString, GenericSet, GenericDict, GenericArray, GenericOrder
export TestSetException
export TestLogger, LogRecord
export set_max_failures, get_max_failures, get_failure_count, reset_failure_count
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these probably shouldn't be exported


# Global state for tracking test failures across testsets
const global_failure_count = Threads.Atomic{Int}(0)
const global_failure_limit = Threads.Atomic{Int}(typemax(Int))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably doesn't need to be atomic.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it also should be 0 by default if we want to be backward compatible.

Comment on lines +50 to +73
"""
set_max_failures(n::Integer)

Set the maximum number of test failures (fails + errors) allowed before stopping.
Default is `typemax(Int)` (no limit). Set to `0` to stop on first failure.

!!! compat "Julia 1.14"
This function requires at least Julia 1.14.
"""
function set_max_failures(n::Integer)
n >= 0 || throw(ArgumentError("maxfailures must be non-negative, got $n"))
Threads.atomic_xchg!(global_failure_limit, Int(n))
return n
end

"""
get_max_failures()

Get the current failure limit. Returns `typemax(Int)` if no limit is set.

!!! compat "Julia 1.14"
This function requires at least Julia 1.14.
"""
get_max_failures() = Threads.atomic_add!(global_failure_limit, 0)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by making the failure limit nonatomic you could delete both of these functions.

Comment on lines +75 to +83
"""
get_failure_count()

Get the current count of test failures (fails + errors).

!!! compat "Julia 1.14"
This function requires at least Julia 1.14.
"""
get_failure_count() = Threads.atomic_add!(global_failure_count, 0)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be global_failure_count[] (and thus we likely don't need a function for it)

Comment on lines +85 to +96
"""
reset_failure_count()

Reset the failure counter to zero. Called at the start of test runs.

!!! compat "Julia 1.14"
This function requires at least Julia 1.14.
"""
function reset_failure_count()
Threads.atomic_xchg!(global_failure_count, 0)
return nothing
end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be global_failure_count[] = 0 (and thus we likely don't need a function for it)

@oscardssmith
Copy link
Copy Markdown
Member

Overall, I think this is the right direction (although seeing the changes in Pkg would be useful to see this in use).

@DilumAluthge
Copy link
Copy Markdown
Member

Thanks for putting this together!

Four new functions are exported from Test:

Do those functions need to be exported (or public)?

@VanitasCodes
Copy link
Copy Markdown
Author

@oscardssmith I have a few questions before I revise.

You mentioned the default should be 0 for backward compatibility. I originally went with typemax(Int) to mean "unlimited" so existing behavior stays the same, but I think you're suggesting 0 should mean "disabled/not counting" instead of "stop immediately"? That makes more sense. So the semantics would be 0 means the feature is off, and you'd call set_max_failures(10) to enable it with a limit of 10. Is that what you had in mind?

You're completely right about the atomics. I was overthinking the concurrency case, but tests run sequentially in a single process anyway. I'll switch to a simple Ref{Int} or just a module global.

For the API, if we make the limit nonatomic and just use direct access like global_failure_limit = n, do we still want helper functions at all, or should Pkg just set Test.global_failure_limit directly? I can see it going either way. The functions feel a bit over-engineered for what they do.

Should I draft a companion PR for Pkg showing how Pkg.test(; maxfailures=10) would use this, or wait until this settles?

@VanitasCodes
Copy link
Copy Markdown
Author

@DilumAluthge I originally exported them thinking they might be useful for custom test runners, but @oscardssmith pointed out they probably shouldn't be. I think you're right that they should either be public (documented but not exported) or just kept internal and accessed as Test.set_max_failures(). What do you think makes more sense? I'm leaning toward not exporting them and letting Pkg access them via the Test. prefix.

@DilumAluthge
Copy link
Copy Markdown
Member

just kept internal and accessed as Test.set_max_failures()

Keeping them internal (non-public) sounds good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Top-level @testsets stop on errors testset with for loop stops at first failed test

4 participants