Skip to content

Conversation

@yudataguy
Copy link
Collaborator

@yudataguy yudataguy commented Feb 2, 2026

Summary

Adds a interactive integration test runner to help debug non-reproducible test failures reported in #138.

Problem

Several integration tests fail randomly in ways that are difficult to track:

  • Antenna Deployer Test 5
  • Power Monitor Tests 1 and 2
  • RTC Test 3
  • IMU Test 2

These failures could be due to race conditions, hardware timing issues, or edge cases.
These tests have been fixed in #321 For future test, this interactive integration test can help debug any potential flaky tests during multiple runs.

Solution

This PR introduces a interactive test runner that executes tests multiple times to detect and analyze flaky behavior.

Features

Add interactive integration test runner with auto-discovery and flaky test detection

Replaces hardcoded test approach with flexible test runner that:

  • Auto-discovers all integration tests from test/int directory
  • Provides interactive cursor-based selection (arrow keys + spacebar)
  • Supports CLI mode for automation and scripts
  • Runs tests for configurable cycles to detect flaky behavior
  • Automatically identifies and reports flaky tests
  • Shows detailed statistics including pass rates and timing
  • Gracefully handles errors with clear messages

Usage

  • Interactive: make test-interactive (default cycle: 3)
  • CLI: make test-interactive ARGS="--tests watchdog_test --cycles 10"
  • All tests: make test-interactive ARGS="--all --cycles 20"

Output Example

Screenshot 2026-02-03 at 3 40 59 PM
================================================================================
TEST SUMMARY
================================================================================

Test: antenna_deployer_test
Total iterations: 20
Passed: 18
Failed: 2
Success rate: 90.0%

Duration statistics:
  Average: 2.45s
  Min: 2.12s
  Max: 4.23s

Failed iterations: [7, 14]

Testing

Tested locally with:

  • ✅ Single test multiple iterations
  • ✅ Known flaky tests mode
  • ✅ Timeout configuration
  • ✅ Failure logging

Documentation

  • Added section to main README.md
  • Created comprehensive FLAKY_TEST_RUNNER.md
  • Added Makefile targets with help text

Related

Closes #138

Implements a comprehensive test runner to detect and debug non-reproducible
test failures reported in issue #138. The runner executes tests multiple
times with configurable parameters and provides detailed failure analysis.

Features:
- Run tests multiple times with configurable iterations
- Configurable timeout per test (default: 120s)
- --known-flaky flag to run all problematic tests from issue #138
- Automatic failure logging with timestamps
- JSON reports for machine-readable analysis
- Success rate statistics and duration metrics
- Stop-on-failure mode for immediate debugging
- Verbose mode for detailed pytest output

Usage examples:
- Run all known flaky tests: make test-known-flaky ITERATIONS=10
- Run specific test: python3 FprimeZephyrReference/test/int/run_flaky_tests.py --test antenna_deployer_test --iterations 20
- Custom timeout: --timeout 60

Related: #138
- Remove unused Dict import
- Format strings with double quotes for consistency
- Break long lines for readability
@hrfarmer hrfarmer self-requested a review February 2, 2026 21:37
@hrfarmer
Copy link
Collaborator

hrfarmer commented Feb 3, 2026

Just a heads up, I'm working on redoing some of the integration tests in #321 to hopefully fix their flakyness. Don't think these will conflict though, just wanted to make sure you're aware.

Also I'm pretty sure the linked issue is somewhat outdated to the tests on main that are flaky now

@yudataguy
Copy link
Collaborator Author

Just a heads up, I'm working on redoing some of the integration tests in #321 to hopefully fix their flakyness. Don't think these will conflict though, just wanted to make sure you're aware.

Also I'm pretty sure the linked issue is somewhat outdated to the tests on main that are flaky now

Can you flag the new flaky tests in #138 so it can be re-integrated in the CI eventually? also we won't have duplicate works, this can keep be the test only for flaky ones.

@hrfarmer
Copy link
Collaborator

hrfarmer commented Feb 3, 2026

Just a heads up, I'm working on redoing some of the integration tests in #321 to hopefully fix their flakyness. Don't think these will conflict though, just wanted to make sure you're aware.
Also I'm pretty sure the linked issue is somewhat outdated to the tests on main that are flaky now

Can you flag the new flaky tests in #138 so it can be re-integrated in the CI eventually? also we won't have duplicate works, this can keep be the test only for flaky ones.

As of right now the only test I can't get to work is the veml6031, otherwise I have every test working consistently on that branch on our engineering cube

Copy link
Collaborator

@hrfarmer hrfarmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments of some things I noticed over a quick glance.

Overall though, I think that all of the "known flaky tests" functionality and references should be removed, because once #321 is in, there shouldn't be any more flaky tests. As well, if any more flaky tests do appear, this script and the README would have to keep getting updated to account for them, when instead the tests should probably just be fixed. In those cases, this script with the functionality to manually pass in specific tests would be helpful.

@yudataguy
Copy link
Collaborator Author

I left a few comments of some things I noticed over a quick glance.

Overall though, I think that all of the "known flaky tests" functionality and references should be removed, because once #321 is in, there shouldn't be any more flaky tests. As well, if any more flaky tests do appear, this script and the README would have to keep getting updated to account for them, when instead the tests should probably just be fixed. In those cases, this script with the functionality to manually pass in specific tests would be helpful.

#138 (comment)

Change requirement for the flaky tests since you're already fixed most of them. The new one will be dynamic.

- Reset .github/workflows/ci.yaml and Makefile to main
- Remove patches/ directory and flaky test runner files
- Keep README.md changes for reference

TODO: Update README.md "Testing for Flaky Tests" section once
the flaky test runner implementation is complete, as the
referenced files (run_flaky_tests.py, FLAKY_TEST_RUNNER.md)
do not currently exist.
… test detection

Replaces hardcoded test approach with flexible test runner that:
- Auto-discovers all integration tests from test/int directory
- Provides interactive cursor-based selection (arrow keys + spacebar)
- Supports CLI mode for automation and scripts
- Runs tests for configurable cycles to detect flaky behavior
- Automatically identifies and reports flaky tests
- Shows detailed statistics including pass rates and timing
- Gracefully handles errors with clear messages

Usage:
- Interactive: make test-interactive
- CLI: make test-interactive ARGS="--tests watchdog_test --cycles 10"
- All tests: make test-interactive ARGS="--all --cycles 20"
Running tests 3 times by default provides better flaky test detection
while still being quick enough for regular use.
@yudataguy yudataguy requested a review from hrfarmer February 3, 2026 23:55
@yudataguy yudataguy changed the title Add flaky test runner for debugging intermittent test failures Add integration interactive test for debugging intermittent test failures Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[BUG] Integration Test Non-Repoducible Behaviour

4 participants