Skip to content

Commit ee7b07a

Browse files
authored
llm/skills: add skill for debugging buildkite/ci failures (#35248)
1 parent a8d7809 commit ee7b07a

File tree

1 file changed

+118
-0
lines changed

1 file changed

+118
-0
lines changed

.claude/skills/debug-ci/SKILL.md

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
---
2+
name: debug-ci
3+
description: Investigate CI failures on a PR using gh and bk CLI tools. Triggers when asked about failing checks, Buildkite failures, or CI issues on a PR.
4+
argument-hint: <PR number or GitHub PR URL>
5+
---
6+
7+
Investigate CI failures for a Materialize PR.
8+
9+
## Prerequisites
10+
11+
This skill requires both `gh` (GitHub CLI) and `bk` (Buildkite CLI) to be installed and authenticated. Before doing anything else, verify both are available by running `which gh` and `which bk`. If either tool is missing, **stop immediately** and tell the user which tool(s) need to be installed and configured. Do not attempt to use the REST API directly or any other workaround — this workflow only works with these CLI tools.
12+
13+
Both `gh` and `bk` make network requests that are blocked by the default sandbox. All Bash commands in this workflow must use `dangerouslyDisableSandbox: true`.
14+
15+
## Step 1: Extract PR number
16+
17+
Parse `$ARGUMENTS` to get the PR number. Handle both formats:
18+
- Plain number: `35192`
19+
- Full URL: `https://github.com/MaterializeInc/materialize/pull/35192`
20+
21+
## Step 2: Find the build
22+
23+
Use `gh` to get the PR's branch name and then find the Buildkite build:
24+
25+
```bash
26+
# Get the branch name for the PR
27+
gh pr view <PR_NUMBER> --json headRefName --jq .headRefName
28+
```
29+
30+
Alternatively, list failing checks directly:
31+
```bash
32+
gh pr checks <PR_NUMBER> 2>&1
33+
```
34+
35+
Lines containing `fail` have tab-separated fields:
36+
```
37+
name fail 0 https://buildkite.com/materialize/<PIPELINE>/builds/<BUILD>#<JOB_ID> description
38+
```
39+
40+
Extract from the URL:
41+
- **Pipeline**: path segment after `materialize/` (usually `test`)
42+
- **Build number**: the number after `builds/`
43+
- **Job ID**: the UUID after `#`
44+
45+
## Step 3: Check annotations first
46+
47+
**Before diving into logs**, fetch the build annotations. They contain pre-extracted error messages, stack traces, and links to known flaky test issues — this saves significant time compared to grepping through raw logs.
48+
49+
```bash
50+
bk api /pipelines/<PIPELINE>/builds/<BUILD_NUMBER>/annotations --no-pager 2>&1
51+
```
52+
53+
The response is JSON. Each annotation has:
54+
- `style`: `"error"` for failures
55+
- `body_html`: HTML containing the error summary, including:
56+
- The specific test/job that failed
57+
- The actual error message or stack trace in `<pre><code>` blocks
58+
- Links to known flaky test issues (look for GitHub issue links like `database-issues/#NNNN`)
59+
- Main branch history showing if this test passes on main (flaky test indicator)
60+
61+
Parse the error annotations to get a quick overview of all failures before fetching any logs.
62+
63+
## Step 4: Fetch logs when needed
64+
65+
Only fetch full logs when annotations don't provide enough detail. Triage in this order:
66+
67+
1. **clippy** — compilation/lint errors that often explain everything
68+
2. **lint-and-rustfmt** — formatting and lint-check failures
69+
3. **cargo-test** — unit/integration test failures
70+
4. **fast-sql-logic-tests** — SLT failures
71+
5. **testdrive** — integration test failures (often cascading)
72+
6. **Everything else** (checks-parallel, cluster-tests, dbt, etc.)
73+
74+
To fetch a job's log:
75+
```bash
76+
bk job log <JOB_ID> -p <PIPELINE> -b <BUILD_NUMBER> --no-timestamps --no-pager 2>&1 | tail -100
77+
```
78+
79+
For large logs, first grep for errors to find the relevant section:
80+
```bash
81+
bk job log <JOB_ID> -p <PIPELINE> -b <BUILD_NUMBER> --no-timestamps --no-pager 2>&1 | grep -B2 -A5 'error\|FAIL\|panicked'
82+
```
83+
84+
Fetch multiple job logs in parallel when they are independent (e.g., clippy + lint at the same time).
85+
86+
## Step 5: Categorize failures
87+
88+
Use these Materialize-specific patterns to diagnose:
89+
90+
### Clippy errors
91+
Code lint issues in changed files. Common ones: `as_conversions`, `needless_borrow`, `clone_on_ref_ptr`. Fix the code, not the lint config.
92+
93+
### `check-test-flags` lint failure
94+
A new configuration flag was introduced but not registered in the required places:
95+
- `misc/python/materialize/parallel_workload/action.py` (FlipFlagsAction)
96+
- `misc/python/materialize/mzcompose/__init__.py` (get_variable_system_parameters / get_minimal_system_parameters / UNINTERESTING_SYSTEM_PARAMETERS)
97+
98+
### Cargo test failures
99+
Read the panic message or assertion diff. Common patterns:
100+
- `unwrap_err() on Ok` → test expected an error but the code now succeeds
101+
- `assertion left == right failed` → behavioral change in output
102+
103+
### Testdrive cascades
104+
After one test crashes environmentd, all subsequent tests in that shard fail with `Name or service not known` or `connection closed`. **Only the first failure in a shard matters** — everything after it is a cascade. Look for the first `error:` or `FAIL` in the log.
105+
106+
Testdrive shards with the same number (e.g., `testdrive-10` and `testdrive-with-alloydb-10`) run the same tests — if both fail, it's likely to be the same root cause.
107+
108+
### SLT failures
109+
Check whether it's wrong output (behavioral change) vs. connection error (crash/timeout). Wrong output means the query semantics changed.
110+
111+
## Step 6: Summarize
112+
113+
Group failures by **root cause**, not by job name. Typically many failing jobs share just 1-2 root causes. Present the summary as:
114+
115+
1. **Root cause A** — description, which jobs it affects, what to fix
116+
2. **Root cause B** — description, which jobs it affects, what to fix
117+
118+
Distinguish between issues that are clearly caused by the PR's changes vs. pre-existing flaky tests. The annotations often link to known flaky test issues (GitHub `database-issues` links) — use these to identify pre-existing flakes vs. regressions introduced by the PR.

0 commit comments

Comments
 (0)