-
Notifications
You must be signed in to change notification settings - Fork 496
llm/skills: add skill for debugging buildkite/ci failures #35248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+118
−0
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,118 @@ | ||
| --- | ||
| name: debug-ci | ||
| description: Investigate CI failures on a PR using gh and bk CLI tools. Triggers when asked about failing checks, Buildkite failures, or CI issues on a PR. | ||
| argument-hint: <PR number or GitHub PR URL> | ||
| --- | ||
|
|
||
| Investigate CI failures for a Materialize PR. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| This skill requires both `gh` (GitHub CLI) and `bk` (Buildkite CLI) to be installed and authenticated. Before doing anything else, verify both are available by running `which gh` and `which bk`. If either tool is missing, **stop immediately** and tell the user which tool(s) need to be installed and configured. Do not attempt to use the REST API directly or any other workaround — this workflow only works with these CLI tools. | ||
|
|
||
| Both `gh` and `bk` make network requests that are blocked by the default sandbox. All Bash commands in this workflow must use `dangerouslyDisableSandbox: true`. | ||
|
|
||
| ## Step 1: Extract PR number | ||
|
|
||
| Parse `$ARGUMENTS` to get the PR number. Handle both formats: | ||
| - Plain number: `35192` | ||
| - Full URL: `https://github.com/MaterializeInc/materialize/pull/35192` | ||
|
|
||
| ## Step 2: Find the build | ||
|
|
||
| Use `gh` to get the PR's branch name and then find the Buildkite build: | ||
|
|
||
| ```bash | ||
| # Get the branch name for the PR | ||
| gh pr view <PR_NUMBER> --json headRefName --jq .headRefName | ||
| ``` | ||
|
|
||
| Alternatively, list failing checks directly: | ||
| ```bash | ||
| gh pr checks <PR_NUMBER> 2>&1 | ||
| ``` | ||
|
|
||
| Lines containing `fail` have tab-separated fields: | ||
| ``` | ||
| name fail 0 https://buildkite.com/materialize/<PIPELINE>/builds/<BUILD>#<JOB_ID> description | ||
| ``` | ||
|
|
||
| Extract from the URL: | ||
| - **Pipeline**: path segment after `materialize/` (usually `test`) | ||
| - **Build number**: the number after `builds/` | ||
| - **Job ID**: the UUID after `#` | ||
|
|
||
| ## Step 3: Check annotations first | ||
|
|
||
| **Before diving into logs**, fetch the build annotations. They contain pre-extracted error messages, stack traces, and links to known flaky test issues — this saves significant time compared to grepping through raw logs. | ||
|
|
||
| ```bash | ||
| bk api /pipelines/<PIPELINE>/builds/<BUILD_NUMBER>/annotations --no-pager 2>&1 | ||
| ``` | ||
|
|
||
| The response is JSON. Each annotation has: | ||
| - `style`: `"error"` for failures | ||
| - `body_html`: HTML containing the error summary, including: | ||
| - The specific test/job that failed | ||
| - The actual error message or stack trace in `<pre><code>` blocks | ||
| - Links to known flaky test issues (look for GitHub issue links like `database-issues/#NNNN`) | ||
| - Main branch history showing if this test passes on main (flaky test indicator) | ||
|
|
||
| Parse the error annotations to get a quick overview of all failures before fetching any logs. | ||
|
|
||
| ## Step 4: Fetch logs when needed | ||
|
|
||
| Only fetch full logs when annotations don't provide enough detail. Triage in this order: | ||
|
|
||
| 1. **clippy** — compilation/lint errors that often explain everything | ||
| 2. **lint-and-rustfmt** — formatting and lint-check failures | ||
| 3. **cargo-test** — unit/integration test failures | ||
| 4. **fast-sql-logic-tests** — SLT failures | ||
| 5. **testdrive** — integration test failures (often cascading) | ||
| 6. **Everything else** (checks-parallel, cluster-tests, dbt, etc.) | ||
|
|
||
| To fetch a job's log: | ||
| ```bash | ||
| bk job log <JOB_ID> -p <PIPELINE> -b <BUILD_NUMBER> --no-timestamps --no-pager 2>&1 | tail -100 | ||
| ``` | ||
|
|
||
| For large logs, first grep for errors to find the relevant section: | ||
| ```bash | ||
| bk job log <JOB_ID> -p <PIPELINE> -b <BUILD_NUMBER> --no-timestamps --no-pager 2>&1 | grep -B2 -A5 'error\|FAIL\|panicked' | ||
| ``` | ||
|
|
||
| Fetch multiple job logs in parallel when they are independent (e.g., clippy + lint at the same time). | ||
|
|
||
| ## Step 5: Categorize failures | ||
|
|
||
| Use these Materialize-specific patterns to diagnose: | ||
|
|
||
| ### Clippy errors | ||
| Code lint issues in changed files. Common ones: `as_conversions`, `needless_borrow`, `clone_on_ref_ptr`. Fix the code, not the lint config. | ||
|
|
||
| ### `check-test-flags` lint failure | ||
| A new configuration flag was introduced but not registered in the required places: | ||
| - `misc/python/materialize/parallel_workload/action.py` (FlipFlagsAction) | ||
| - `misc/python/materialize/mzcompose/__init__.py` (get_variable_system_parameters / get_minimal_system_parameters / UNINTERESTING_SYSTEM_PARAMETERS) | ||
|
|
||
| ### Cargo test failures | ||
| Read the panic message or assertion diff. Common patterns: | ||
| - `unwrap_err() on Ok` → test expected an error but the code now succeeds | ||
| - `assertion left == right failed` → behavioral change in output | ||
|
|
||
| ### Testdrive cascades | ||
| After one test crashes environmentd, all subsequent tests in that shard fail with `Name or service not known` or `connection closed`. **Only the first failure in a shard matters** — everything after it is a cascade. Look for the first `error:` or `FAIL` in the log. | ||
|
|
||
| Testdrive shards with the same number (e.g., `testdrive-10` and `testdrive-with-alloydb-10`) run the same tests — if both fail, it's likely to be the same root cause. | ||
|
|
||
| ### SLT failures | ||
| Check whether it's wrong output (behavioral change) vs. connection error (crash/timeout). Wrong output means the query semantics changed. | ||
|
|
||
| ## Step 6: Summarize | ||
|
|
||
| Group failures by **root cause**, not by job name. Typically many failing jobs share just 1-2 root causes. Present the summary as: | ||
|
|
||
| 1. **Root cause A** — description, which jobs it affects, what to fix | ||
| 2. **Root cause B** — description, which jobs it affects, what to fix | ||
|
|
||
| Distinguish between issues that are clearly caused by the PR's changes vs. pre-existing flaky tests. The annotations often link to known flaky test issues (GitHub `database-issues` links) — use these to identify pre-existing flakes vs. regressions introduced by the PR. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be easier to just do
where def- is the username of my fork and pr-fix-secret-cli the branch name, instead of these github api roundtrips.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Claude said it wouldn't want to add this, because it's to complicated to know the username of the fork, but we did incorporate the other suggestions. 🙇♂️