Add integration interactive test for debugging intermittent test failures #320

yudataguy · 2026-02-02T21:32:42Z

Summary

Adds a interactive integration test runner to help debug non-reproducible test failures reported in #138.

Problem

Several integration tests fail randomly in ways that are difficult to track:

Antenna Deployer Test 5
Power Monitor Tests 1 and 2
RTC Test 3
IMU Test 2

These failures could be due to race conditions, hardware timing issues, or edge cases.
These tests have been fixed in #321 For future test, this interactive integration test can help debug any potential flaky tests during multiple runs.

Solution

This PR introduces a interactive test runner that executes tests multiple times to detect and analyze flaky behavior.

Features

Add interactive integration test runner with auto-discovery and flaky test detection

Replaces hardcoded test approach with flexible test runner that:

Auto-discovers all integration tests from test/int directory
Provides interactive cursor-based selection (arrow keys + spacebar)
Supports CLI mode for automation and scripts
Runs tests for configurable cycles to detect flaky behavior
Automatically identifies and reports flaky tests
Shows detailed statistics including pass rates and timing
Gracefully handles errors with clear messages

Usage

Interactive: make test-interactive (default cycle: 3)
CLI: make test-interactive ARGS="--tests watchdog_test --cycles 10"
All tests: make test-interactive ARGS="--all --cycles 20"

Output Example

================================================================================
TEST SUMMARY
================================================================================

Test: antenna_deployer_test
Total iterations: 20
Passed: 18
Failed: 2
Success rate: 90.0%

Duration statistics:
  Average: 2.45s
  Min: 2.12s
  Max: 4.23s

Failed iterations: [7, 14]

Testing

Tested locally with:

✅ Single test multiple iterations
✅ Known flaky tests mode
✅ Timeout configuration
✅ Failure logging

Documentation

Added section to main README.md
Created comprehensive FLAKY_TEST_RUNNER.md
Added Makefile targets with help text

Implements a comprehensive test runner to detect and debug non-reproducible test failures reported in issue #138. The runner executes tests multiple times with configurable parameters and provides detailed failure analysis. Features: - Run tests multiple times with configurable iterations - Configurable timeout per test (default: 120s) - --known-flaky flag to run all problematic tests from issue #138 - Automatic failure logging with timestamps - JSON reports for machine-readable analysis - Success rate statistics and duration metrics - Stop-on-failure mode for immediate debugging - Verbose mode for detailed pytest output Usage examples: - Run all known flaky tests: make test-known-flaky ITERATIONS=10 - Run specific test: python3 FprimeZephyrReference/test/int/run_flaky_tests.py --test antenna_deployer_test --iterations 20 - Custom timeout: --timeout 60 Related: #138

- Remove unused Dict import - Format strings with double quotes for consistency - Break long lines for readability

hrfarmer · 2026-02-03T00:15:15Z

Just a heads up, I'm working on redoing some of the integration tests in #321 to hopefully fix their flakyness. Don't think these will conflict though, just wanted to make sure you're aware.

Also I'm pretty sure the linked issue is somewhat outdated to the tests on main that are flaky now

yudataguy · 2026-02-03T02:50:35Z

Just a heads up, I'm working on redoing some of the integration tests in #321 to hopefully fix their flakyness. Don't think these will conflict though, just wanted to make sure you're aware.

Also I'm pretty sure the linked issue is somewhat outdated to the tests on main that are flaky now

Can you flag the new flaky tests in #138 so it can be re-integrated in the CI eventually? also we won't have duplicate works, this can keep be the test only for flaky ones.

hrfarmer · 2026-02-03T07:09:56Z

Just a heads up, I'm working on redoing some of the integration tests in #321 to hopefully fix their flakyness. Don't think these will conflict though, just wanted to make sure you're aware.
Also I'm pretty sure the linked issue is somewhat outdated to the tests on main that are flaky now

Can you flag the new flaky tests in #138 so it can be re-integrated in the CI eventually? also we won't have duplicate works, this can keep be the test only for flaky ones.

As of right now the only test I can't get to work is the veml6031, otherwise I have every test working consistently on that branch on our engineering cube

hrfarmer

I left a few comments of some things I noticed over a quick glance.

Overall though, I think that all of the "known flaky tests" functionality and references should be removed, because once #321 is in, there shouldn't be any more flaky tests. As well, if any more flaky tests do appear, this script and the README would have to keep getting updated to account for them, when instead the tests should probably just be fixed. In those cases, this script with the functionality to manually pass in specific tests would be helpful.

FprimeZephyrReference/test/int/FLAKY_TEST_RUNNER.md

FprimeZephyrReference/test/int/run_flaky_tests.py

Makefile

FprimeZephyrReference/test/int/FLAKY_TEST_RUNNER.md

yudataguy · 2026-02-03T21:03:32Z

I left a few comments of some things I noticed over a quick glance.

Overall though, I think that all of the "known flaky tests" functionality and references should be removed, because once #321 is in, there shouldn't be any more flaky tests. As well, if any more flaky tests do appear, this script and the README would have to keep getting updated to account for them, when instead the tests should probably just be fixed. In those cases, this script with the functionality to manually pass in specific tests would be helpful.

#138 (comment)

Change requirement for the flaky tests since you're already fixed most of them. The new one will be dynamic.

- Reset .github/workflows/ci.yaml and Makefile to main - Remove patches/ directory and flaky test runner files - Keep README.md changes for reference TODO: Update README.md "Testing for Flaky Tests" section once the flaky test runner implementation is complete, as the referenced files (run_flaky_tests.py, FLAKY_TEST_RUNNER.md) do not currently exist.

… test detection Replaces hardcoded test approach with flexible test runner that: - Auto-discovers all integration tests from test/int directory - Provides interactive cursor-based selection (arrow keys + spacebar) - Supports CLI mode for automation and scripts - Runs tests for configurable cycles to detect flaky behavior - Automatically identifies and reports flaky tests - Shows detailed statistics including pass rates and timing - Gracefully handles errors with clear messages Usage: - Interactive: make test-interactive - CLI: make test-interactive ARGS="--tests watchdog_test --cycles 10" - All tests: make test-interactive ARGS="--all --cycles 20"

Running tests 3 times by default provides better flaky test detection while still being quick enough for regular use.

github-project-automation bot added this to V1.X.X Feb 2, 2026

Fix linting issues in run_flaky_tests.py

17e5374

- Remove unused Dict import - Format strings with double quotes for consistency - Break long lines for readability

yudataguy requested review from Mikefly123 and ineskhou February 2, 2026 21:35

Merge branch 'main' into int-test-fix

7a28bec

hrfarmer self-requested a review February 2, 2026 21:37

Merge branch 'main' into int-test-fix

be4e522

hrfarmer requested changes Feb 3, 2026

View reviewed changes

yudataguy added 3 commits February 3, 2026 13:17

Change default test cycle count from 1 to 3

41886e6

Running tests 3 times by default provides better flaky test detection while still being quick enough for regular use.

yudataguy requested a review from hrfarmer February 3, 2026 23:55

yudataguy changed the title ~~Add flaky test runner for debugging intermittent test failures~~ Add integration interactive test for debugging intermittent test failures Feb 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add integration interactive test for debugging intermittent test failures #320

Add integration interactive test for debugging intermittent test failures #320

Uh oh!

yudataguy commented Feb 2, 2026 •

edited

Loading

Uh oh!

hrfarmer commented Feb 3, 2026

Uh oh!

yudataguy commented Feb 3, 2026

Uh oh!

hrfarmer commented Feb 3, 2026

Uh oh!

hrfarmer left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yudataguy commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add integration interactive test for debugging intermittent test failures #320

Are you sure you want to change the base?

Add integration interactive test for debugging intermittent test failures #320

Uh oh!

Conversation

yudataguy commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Features

Usage

Output Example

Testing

Documentation

Related

Uh oh!

hrfarmer commented Feb 3, 2026

Uh oh!

yudataguy commented Feb 3, 2026

Uh oh!

hrfarmer commented Feb 3, 2026

Uh oh!

hrfarmer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yudataguy commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yudataguy commented Feb 2, 2026 •

edited

Loading