-
Notifications
You must be signed in to change notification settings - Fork 9
Add integration interactive test for debugging intermittent test failures #320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implements a comprehensive test runner to detect and debug non-reproducible test failures reported in issue #138. The runner executes tests multiple times with configurable parameters and provides detailed failure analysis. Features: - Run tests multiple times with configurable iterations - Configurable timeout per test (default: 120s) - --known-flaky flag to run all problematic tests from issue #138 - Automatic failure logging with timestamps - JSON reports for machine-readable analysis - Success rate statistics and duration metrics - Stop-on-failure mode for immediate debugging - Verbose mode for detailed pytest output Usage examples: - Run all known flaky tests: make test-known-flaky ITERATIONS=10 - Run specific test: python3 FprimeZephyrReference/test/int/run_flaky_tests.py --test antenna_deployer_test --iterations 20 - Custom timeout: --timeout 60 Related: #138
- Remove unused Dict import - Format strings with double quotes for consistency - Break long lines for readability
|
Just a heads up, I'm working on redoing some of the integration tests in #321 to hopefully fix their flakyness. Don't think these will conflict though, just wanted to make sure you're aware. Also I'm pretty sure the linked issue is somewhat outdated to the tests on main that are flaky now |
Can you flag the new flaky tests in #138 so it can be re-integrated in the CI eventually? also we won't have duplicate works, this can keep be the test only for flaky ones. |
As of right now the only test I can't get to work is the veml6031, otherwise I have every test working consistently on that branch on our engineering cube |
hrfarmer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments of some things I noticed over a quick glance.
Overall though, I think that all of the "known flaky tests" functionality and references should be removed, because once #321 is in, there shouldn't be any more flaky tests. As well, if any more flaky tests do appear, this script and the README would have to keep getting updated to account for them, when instead the tests should probably just be fixed. In those cases, this script with the functionality to manually pass in specific tests would be helpful.
Change requirement for the flaky tests since you're already fixed most of them. The new one will be dynamic. |
- Reset .github/workflows/ci.yaml and Makefile to main - Remove patches/ directory and flaky test runner files - Keep README.md changes for reference TODO: Update README.md "Testing for Flaky Tests" section once the flaky test runner implementation is complete, as the referenced files (run_flaky_tests.py, FLAKY_TEST_RUNNER.md) do not currently exist.
… test detection Replaces hardcoded test approach with flexible test runner that: - Auto-discovers all integration tests from test/int directory - Provides interactive cursor-based selection (arrow keys + spacebar) - Supports CLI mode for automation and scripts - Runs tests for configurable cycles to detect flaky behavior - Automatically identifies and reports flaky tests - Shows detailed statistics including pass rates and timing - Gracefully handles errors with clear messages Usage: - Interactive: make test-interactive - CLI: make test-interactive ARGS="--tests watchdog_test --cycles 10" - All tests: make test-interactive ARGS="--all --cycles 20"
Running tests 3 times by default provides better flaky test detection while still being quick enough for regular use.
Summary
Adds a interactive integration test runner to help debug non-reproducible test failures reported in #138.
Problem
Several integration tests fail randomly in ways that are difficult to track:
These failures could be due to race conditions, hardware timing issues, or edge cases.
These tests have been fixed in #321 For future test, this interactive integration test can help debug any potential flaky tests during multiple runs.
Solution
This PR introduces a interactive test runner that executes tests multiple times to detect and analyze flaky behavior.
Features
Add interactive integration test runner with auto-discovery and flaky test detection
Replaces hardcoded test approach with flexible test runner that:
Usage
Output Example
Testing
Tested locally with:
Documentation
Related
Closes #138