Skip to content

test: add tart VM lifecycle integration tests#55

Open
STRRL wants to merge 12 commits intomainfrom
feat/tart-integration-tests
Open

test: add tart VM lifecycle integration tests#55
STRRL wants to merge 12 commits intomainfrom
feat/tart-integration-tests

Conversation

@STRRL
Copy link
Copy Markdown
Member

@STRRL STRRL commented Apr 7, 2026

Summary

  • Add BDD integration tests for tart VM lifecycle (pull, clone, start, IP, exec, stop, delete, cleanup)
  • Uses ghcr.io/cirruslabs/macos-tahoe-base:latest as smaller test image (no Xcode)
  • Tart steps auto-skip when tart binary is not available
  • CI matrix adds "Tart" job on self-hosted macOS arm64 runner with nested virtualization

Test plan

  • Verify tart tests pass on self-hosted macOS arm64 runner
  • Verify non-tart integration tests still pass
  • Verify tart tests skip gracefully on runners without tart installed

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 7, 2026

Deploying elastic-fruit-runner with  Cloudflare Pages  Cloudflare Pages

Latest commit: b82f4a2
Status: ✅  Deploy successful!
Preview URL: https://d912673e.elastic-fruit-runner.pages.dev
Branch Preview URL: https://feat-tart-integration-tests.elastic-fruit-runner.pages.dev

View logs

@STRRL STRRL force-pushed the feat/tart-integration-tests branch from 3a115bb to 660b772 Compare April 7, 2026 05:52
@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review


P1 Badge Run Tart integration matrix on a macOS runner

The integration matrix now runs entirely on [self-hosted, linux, arm64], including the new Tart case, so the Tart lifecycle scenario is not actually exercised on the intended platform. In this repo Tart is the macOS VM path (README), and the new step a tart manager explicitly marks the scenario pending when tart is unavailable (steps_test.go), which means Linux runners can report a passing/green job without validating Tart behavior. Use a macOS label for the Tart matrix entry (or matrix-specific runs-on) so this test provides real coverage.


sc.Step(`^I stop and delete the VM$`, func() error {
if err := state.tartMgr.Stop(context.Background(), state.tartVMName); err != nil {
return fmt.Errorf("stop VM: %w", err)
}
return state.tartMgr.Delete(context.Background(), state.tartVMName)

P2 Badge Ensure Tart VMs are torn down on step failure

VM cleanup is only performed in this explicit scenario step, so if any earlier Tart step fails (pull/clone/start/ip/exec), the remaining steps are skipped and stop/delete never runs. Because these tests run on persistent self-hosted machines, leaked VMs can accumulate and interfere with subsequent jobs. Add unconditional teardown in sc.After (or equivalent) that attempts to stop/delete state.tartVMName whenever it was created.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 72.72727% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
internal/api/server.go 25.00% 3 Missing ⚠️
Flag Coverage Δ
integration-GitHub App 43.20% <27.27%> (+5.77%) ⬆️
integration-PAT 41.92% <27.27%> (-0.01%) ⬇️
integration-Tart 7.71% <45.45%> (+7.71%) ⬆️
unittests 20.35% <0.00%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
internal/management/jobs.go 74.28% <100.00%> (+0.37%) ⬆️
internal/tart/manager.go 39.28% <100.00%> (+39.28%) ⬆️
internal/api/server.go 77.69% <25.00%> (-1.22%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 45b7b78937

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread features/tart/vm_lifecycle.feature Outdated

Scenario: list and cleanup orphaned VMs
Given a tart manager
When I clone a VM with a random name
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Pull image before cloning in orphan-cleanup scenario

The list and cleanup orphaned VMs scenario clones immediately without first ensuring the test image is present locally, so on a clean runner (or when this scenario is run in isolation) Clone can fail before any cleanup assertions run. This makes the scenario order-dependent on pull, clone, start, exec, and cleanup a VM instead of being self-contained.

Useful? React with 👍 / 👎.

"failed": {},
"Succeeded": {},
"Failed": {},
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😮‍💨 slop

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 90f9e155d2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread test/integration/steps_test.go Outdated
Comment on lines +762 to +763
_ = state.tartMgr.Stop(context.Background(), name)
_ = state.tartMgr.Delete(context.Background(), name)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Propagate tart cleanup failures instead of discarding them

This step ignores both Stop and Delete errors, so the scenario can still pass even when cleanup fails for prefixed VMs (especially stale orphans from prior runs). Because the next assertion only verifies the current cloned VM name, failures on other matched VMs are silently masked, which can leave orphaned Tart VMs behind and hide regressions in the cleanup path.

Useful? React with 👍 / 👎.

STRRL added 4 commits April 13, 2026 06:18
Add BDD scenarios for tart manager operations with nested virtualization:
- Pull, clone, start, IP discovery, SSH exec, stop, delete
- List and cleanup orphaned VMs with prefix matching

Uses ghcr.io/cirruslabs/macos-tahoe-base:latest as a smaller test image
(configurable via EFR_TEST_TART_IMAGE). Tart steps auto-skip via
godog.ErrPending when tart binary is not available.

CI matrix gains a "Tart" job running on self-hosted macOS arm64 runner
with nested virtualization enabled.
Use per-matrix runs-on so PAT/GitHub App tests run on Linux arm64
while Tart tests run on macOS arm64 (required for Apple
Virtualization.framework and nested virtualization).
… API

GitHub Actions API sends job results in lowercase ("succeeded", "failed")
but the mapping code expected Title Case ("Succeeded", "Failed"), causing
all completed jobs to show as FAIL in the dashboard (issue #68).
@STRRL STRRL force-pushed the feat/tart-integration-tests branch from 90f9e15 to 6953359 Compare April 13, 2026 13:19
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6953359dfb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread test/integration/steps_test.go Outdated
})

sc.Step(`^I pull the VM image$`, func() error {
image := envOrDefault("EFR_TEST_TART_IMAGE", "ghcr.io/cirruslabs/macos-tahoe-base:latest")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Pin Tart test image to a fixed tag or digest

Defaulting EFR_TEST_TART_IMAGE to ghcr.io/cirruslabs/macos-tahoe-base:latest makes this integration suite non-deterministic, and the workflow does not override that variable for the Tart job. When the upstream latest tag moves, lifecycle behavior (boot timing, SSH defaults, image contents) can change and break CI without any repository change, so this should use a pinned image tag or digest.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e741951162

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/api/server.go
Comment on lines +158 to +160
case "succeeded":
return controlplanev1.JobResult_JOB_RESULT_SUCCESS
case "Failed":
case "failed":
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Accept legacy capitalized job result strings

This switch now only maps lowercase values, but existing jobs.db records written by prior versions used "Succeeded"/"Failed". After upgrade, those rows resolve to JOB_RESULT_UNSPECIFIED, and the dashboard path treats completed unknown results as failures, so historical successful jobs are misreported until old rows roll off. Please keep backward-compatible mappings (or migrate stored rows) in toProtoJobResult to avoid data corruption in the UI during upgrades.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 453b2a90da

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +664 to +665
if binpath.Lookup("tart") == "tart" {
return ctx, fmt.Errorf("tart binary not found in PATH; install via `brew install cirruslabs/cli/tart`")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip Tart scenarios when binary is unavailable

The a tart manager step hard-fails with a regular error when tart is missing, which causes the scenario (and often the full integration run) to fail on environments that do not have Tart installed. Since the integration harness defaults to running all feature files when EFR_TEST_FEATURES is unset, this makes non-Tart environments unable to run the rest of the suite instead of cleanly skipping Tart-only scenarios.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ccced30ccb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

return ctx, fmt.Errorf("tart binary not found in PATH; install via `brew install cirruslabs/cli/tart`")
}
state.tartMgr = tart.NewManager()
state.tartPrefix = "efr-tart-test"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Scope Tart cleanup to this test run

Setting state.tartPrefix to the fixed value "efr-tart-test" makes both cleanup paths delete any VM on the host that matches that shared prefix, not just VMs created by the current scenario. On shared/self-hosted machines (or overlapping local runs), one run can remove another run’s VM and cause nondeterministic failures or unintended data loss. Use a per-run/per-scenario unique prefix (while keeping it stable within that run) so cleanup only targets resources created by the current execution.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ead520d3b0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +79 to +81
security list-keychains -d user -s "$tart_keychain"
security default-keychain -s "$tart_keychain"
security login-keychain -s "$tart_keychain"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore keychain defaults after Tart setup

This step mutates persistent user keychain state on a self-hosted macOS runner (list-keychains -s, default-keychain -s, and login-keychain -s) but never restores it, so later jobs can inherit a broken search/default/login keychain configuration (including a temp keychain under RUNNER_TEMP that may be removed after the job). On shared runners this can cause unrelated workflows to fail keychain-backed auth/signing operations.

Useful? React with 👍 / 👎.

Comment on lines +84 to +85
sudo defaults write /Library/Preferences/SystemConfiguration/com.apple.InternetSharing.default.plist bootpd -dict DHCPLeaseTimeSecs -int 600
sudo rm -f /var/db/dhcpd_leases
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Revert host DHCP tweaks after Tart job

The workflow writes /Library/Preferences/SystemConfiguration/com.apple.InternetSharing.default.plist and deletes /var/db/dhcpd_leases with sudo on a self-hosted machine, but there is no cleanup step to restore prior host networking state. Because this is a persistent host-level change, subsequent jobs on the same runner can observe altered DHCP behavior and flaky VM networking unrelated to this test run.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline comment for the current diff.

Comment thread internal/tart/manager.go
cmd := exec.CommandContext(ctx, binpath.Lookup("tart"), "ip", name, "--wait", "60")
out, err := cmd.Output()
cmd := exec.CommandContext(ctx, binpath.Lookup("tart"), "ip", name, "--wait", strconv.Itoa(ipAddressWaitSeconds))
out, err := cmd.CombinedOutput()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] CombinedOutput() mixes stderr into the parsed IP

Switching from cmd.Output() to cmd.CombinedOutput() while still parsing the result with strings.TrimSpace(string(out)) at line 146 means any stderr output from tart ip --wait (progress, warnings, future log lines) gets concatenated with the IP and breaks net.ParseIP. The intent here was clearly to surface stderr in the error message, but doing so via CombinedOutput regresses the success path.

Suggested fix: capture stderr separately and keep Output() for the parsed value:

var stderr bytes.Buffer
cmd.Stderr = &stderr
out, err := cmd.Output()
if err != nil {
    err = fmt.Errorf("tart ip %s: %w\n%s", name, err, stderr.Bytes())
    ...
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant