rhivos-tools: Add code snippet inlining and e2e integration tests#93
rhivos-tools: Add code snippet inlining and e2e integration tests#93bburt-rh wants to merge 4 commits intoredhat-documentation:mainfrom
Conversation
md2adoc.py now runs as a pre-processor on raw Markdown before pandoc,
rather than as a post-processor on pandoc's AsciiDoc output. This fixes
a fundamental pipeline issue where pandoc mangled MkDocs-specific syntax
(converting -- to em-dashes, collapsing block structure, stripping
frontmatter) before md2adoc could process it.
Key changes to md2adoc.py:
- All conversion functions wrap injected AsciiDoc in pandoc raw blocks
(```{=asciidoc}) so pandoc passes it through verbatim
- convert_snippets now classifies by file extension: .md files remain
as include:: directives, code files (.yml, .container, .sh, etc.)
are read from disk and inlined as [source,lang] blocks
- Supports MkDocs line-range syntax (file:start:end)
- Missing files fall back to include:: with a // WARNING comment
- New --base-path CLI argument for resolving snippet source paths
SKILL.md updated to reflect the corrected pipeline order
(md2adoc before pandoc) and document --base-path usage.
22 end-to-end integration tests verify the full pipeline across
4 fixture files with known content markers, validating snippet
inlining, prose include resolution, asciidoctor rendering, and
absence of residual MkDocs syntax.
Made-with: Cursor
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
Walkthroughmd2adoc.py is repurposed from a pandoc post-processor to a pre-processor: it now optionally inlines code snippets from a provided base path, wraps generated AsciiDoc fragments in pandoc raw blocks, converts prose snippet references to Changes
Sequence Diagram(s)sequenceDiagram
autonumber
rect rgba(240,248,255,0.5)
participant User as User/CI
end
rect rgba(220,240,220,0.5)
participant MD2 as md2adoc.py
end
rect rgba(255,228,196,0.5)
participant PAN as pandoc
end
rect rgba(255,240,245,0.5)
participant ASCI as asciidoctor
end
rect rgba(245,245,245,0.5)
participant FS as Filesystem
end
User->>MD2: md2adoc.py --base-path=<sig-docs> input.md -> working.md (+ prose `.adoc` files)
MD2->>FS: read input.md and referenced snippet/prose files under base_path
MD2-->>FS: write preprocessed working.md and produce prose `.adoc` include files
MD2->>PAN: hand off working.md for Markdown→AsciiDoc conversion
PAN->>FS: write output.adoc
PAN->>ASCI: provide AsciiDoc output for validation
ASCI->>FS: read output.adoc and included `.adoc` snippets
ASCI-->>User: return warnings/errors (validation)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 inconclusive)
✅ Passed checks (7 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
plugins/rhivos-tools/skills/rhivos-fetch-convert/tests/test_e2e_conversion.py (1)
308-319: Assert that pandoc raw fences do not survive the pipeline.This suite checks for leftover MkDocs syntax, but not for literal
```{=asciidoc}markers in the generated.adoc. Adding that assertion would catch nested-raw-block regressions in the converters.Suggested assertion
assert "/// figure-caption" not in content assert re.search(r'```\w+\s+title=', content) is None assert re.search(r'\]\([^)]+\.md\)', content) is None + assert "```{=asciidoc}" not in content🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/rhivos-tools/skills/rhivos-fetch-convert/tests/test_e2e_conversion.py` around lines 308 - 319, Add an assertion to test_no_mkdocs_syntax_remains in class TestNoMkDocsSyntaxRemains to ensure Pandoc raw fence markers don't survive the conversion: after reading content from adoc_path (variable content), assert that the literal string "```{=asciidoc}" is not present (e.g., assert "```{=asciidoc}" not in content) to catch nested raw-block regressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/rhivos-tools/skills/rhivos-fetch-convert/scripts/md2adoc.py`:
- Around line 399-416: The issue is that convert_admonitions() and
convert_tabbed_content() wrap content in raw `{=asciidoc}` fences before
convert_snippets(), convert_code_block_titles(), and convert_figure_captions()
run, causing nested/raw fence leakage; fix it by reordering process_file so
convert_snippets(), convert_code_block_titles(), and convert_figure_captions()
are called before convert_admonitions() and convert_tabbed_content() (or
alternatively make those three transforms raw-block-aware), i.e. move the calls
to convert_snippets, convert_code_block_titles, and convert_figure_captions
earlier in process_file so they emit raw blocks first and then wrap results with
convert_admonitions and convert_tabbed_content.
- Around line 202-210: The code currently trusts file_ref when computing
resolved = base_path / file_ref which allows paths like "../..." or absolute
paths to escape base_path; change this to first resolve both base_path and the
candidate (e.g., resolved_candidate = (base_path / file_ref).resolve()) and
ensure the candidate is contained within base_path.resolve() (using
Path.is_relative_to or comparing commonpath/parents); if the candidate is
outside base_path, treat it as missing and emit the same warning/include lines
and continue instead of calling _read_snippet_lines(resolved, start, end). Use
the existing symbols resolved/base_path/file_ref/_read_snippet_lines to locate
and update the check.
In `@plugins/rhivos-tools/skills/rhivos-fetch-convert/SKILL.md`:
- Around line 65-73: The doc omits that prose snippet files rewritten by
md2adoc.py to include::...adoc[] must themselves be converted from .md to .adoc
(or recreated) so includes resolve; update SKILL.md to add a step after running
pandoc (or before modular cleanup) that runs md2adoc.py (or a per-snippet pandoc
pass) against the snippet files referenced by the main document (e.g., files
referenced by full_complexity.md) so their .md -> .adoc conversion is performed
(mirror what
plugins/rhivos-tools/skills/rhivos-fetch-convert/tests/test_e2e_conversion.py
does) and mention the --base-path usage for locating those snippet files.
- Around line 61-63: The docs use two different sig-docs path tokens
(`<sig-docs-path>` vs `${sig_docs_path}`) which is inconsistent and may leave
--base-path empty; update the snippet so it uses a single consistent token and,
if you prefer the shell variable form, add an explicit assignment from the skill
ARGUMENTS (or show how `$ARGUMENTS` populates `sig_docs_path`) before the
cp/python commands; ensure the copy command and the md2adoc call both reference
the same identifier (either `<sig-docs-path>` everywhere or `${sig_docs_path}`
everywhere) so --base-path is never empty.
---
Nitpick comments:
In
`@plugins/rhivos-tools/skills/rhivos-fetch-convert/tests/test_e2e_conversion.py`:
- Around line 308-319: Add an assertion to test_no_mkdocs_syntax_remains in
class TestNoMkDocsSyntaxRemains to ensure Pandoc raw fence markers don't survive
the conversion: after reading content from adoc_path (variable content), assert
that the literal string "```{=asciidoc}" is not present (e.g., assert
"```{=asciidoc}" not in content) to catch nested raw-block regressions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: af3bebb4-0270-4667-a020-adcab09c3461
📒 Files selected for processing (12)
plugins/rhivos-tools/skills/rhivos-fetch-convert/SKILL.mdplugins/rhivos-tools/skills/rhivos-fetch-convert/scripts/md2adoc.pyplugins/rhivos-tools/skills/rhivos-fetch-convert/tests/fixtures/admonitions_and_snippets.mdplugins/rhivos-tools/skills/rhivos-fetch-convert/tests/fixtures/code_snippets/long_file.ymlplugins/rhivos-tools/skills/rhivos-fetch-convert/tests/fixtures/code_snippets/sample_config.ymlplugins/rhivos-tools/skills/rhivos-fetch-convert/tests/fixtures/code_snippets/sample_script.shplugins/rhivos-tools/skills/rhivos-fetch-convert/tests/fixtures/code_snippets/sample_service.containerplugins/rhivos-tools/skills/rhivos-fetch-convert/tests/fixtures/figure_and_tabs.mdplugins/rhivos-tools/skills/rhivos-fetch-convert/tests/fixtures/full_complexity.mdplugins/rhivos-tools/skills/rhivos-fetch-convert/tests/fixtures/prose_snippets/disclaimer.mdplugins/rhivos-tools/skills/rhivos-fetch-convert/tests/fixtures/simple_frontmatter.mdplugins/rhivos-tools/skills/rhivos-fetch-convert/tests/test_e2e_conversion.py
…sted raw block bug
Address CI check failures and CodeRabbit review findings:
- Use relative path scripts/md2adoc.py in SKILL.md instead of
${CLAUDE_SKILL_DIR} (skill-and-script-conventions error)
- Unify sig-docs path tokens to <sig-docs-path> throughout SKILL.md
- Add docstrings to all test classes and methods (41% -> 80%+ coverage)
- Add path traversal guard rejecting snippet refs that escape base_path
- Reorder process_file: leaf transforms (snippets, code blocks, figures)
run before container transforms (admonitions, tabs) to prevent nested
pandoc raw blocks that leak ```{=asciidoc} markers into final output
- Make convert_snippets and convert_code_block_titles indent-aware so
content inside admonitions/tabs preserves indentation for collectors
- Add _unwrap_raw_blocks helper stripped by container transforms
- Add raw fence assertion to catch regression in TestNoMkDocsSyntaxRemains
- Document prose snippet conversion step (b1) in SKILL.md
Made-with: Cursor
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
plugins/rhivos-tools/skills/rhivos-fetch-convert/scripts/md2adoc.py (3)
160-165: Consider validating line range bounds.The function trusts that
startandendare valid 1-indexed positive integers withstart <= end. Invalid values (e.g.,start=0,start > end, or values exceeding file length) could produce unexpected results:
start=0would include an extra line from the end due to Python's slice behavior with negative indicesstart > endwould return an empty list silentlyIf upstream input is always well-formed, this may be acceptable. Otherwise, consider adding bounds validation.
♻️ Defensive bounds check
def _read_snippet_lines(file_path: Path, start: int | None, end: int | None) -> list[str]: """Read lines from a file, optionally extracting a 1-indexed inclusive range.""" all_lines = file_path.read_text(encoding="utf-8").splitlines() if start is not None and end is not None: + if start < 1 or end < start: + return all_lines # fallback to full file on invalid range return all_lines[start - 1 : end] return all_lines🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/rhivos-tools/skills/rhivos-fetch-convert/scripts/md2adoc.py` around lines 160 - 165, The _read_snippet_lines function should validate the 1-indexed start/end bounds before slicing: ensure that if start or end are provided they are positive integers, that start <= end, and that both are within 1..len(all_lines); for out-of-range values either clamp them to the file bounds or raise a ValueError (be explicit which policy you choose) and handle None as "no bound", using file_path.read_text(...).splitlines() to compute len(all_lines) for validation; update the function to perform these checks and return a clear error message or a clamped slice rather than relying on Python's negative-index/silent-empty-slice behavior.
377-391: Links inside raw block content may be incorrectly converted.The current check only skips the raw block delimiter lines (
RAW_OPEN/RAW_CLOSE), but content lines between them are still processed. If a raw block contains AsciiDoc that happens to have Markdown-like link syntax (e.g., in a code example), it could be incorrectly transformed.Consider tracking whether you're inside a raw block and skipping all content within:
♻️ Track raw block state
def convert_markdown_links(lines: list[str]) -> list[str]: ... result = [] link_pattern = re.compile(r"\[([^\]]+)\]\(([^)]+)\)") + in_raw_block = False for line in lines: - if line.startswith(RAW_OPEN) or line == RAW_CLOSE: + if line == RAW_OPEN: + in_raw_block = True + result.append(line) + continue + if line == RAW_CLOSE: + in_raw_block = False result.append(line) continue + if in_raw_block: + result.append(line) + continue ...🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/rhivos-tools/skills/rhivos-fetch-convert/scripts/md2adoc.py` around lines 377 - 391, The loop currently only ignores the RAW_OPEN/RAW_CLOSE delimiter lines but still processes lines between them; modify the logic in the for loop to maintain a boolean "in_raw" state (flip it when encountering RAW_OPEN or RAW_CLOSE) and when in_raw is True append lines unchanged (skip running link_pattern.sub via replace_link). Update references in this block (the loop over lines, the RAW_OPEN/RAW_CLOSE checks, and the use of link_pattern.sub/replace_link) so raw block content is preserved verbatim.
40-42: Consider using spread syntax for list construction.Per static analysis (RUF005), spread syntax is preferred over list concatenation.
♻️ Suggested refactor
def _raw_block(asciidoc_lines: list[str]) -> list[str]: """Wrap AsciiDoc lines in a pandoc raw block for pass-through.""" - return [RAW_OPEN] + asciidoc_lines + ["", RAW_CLOSE] + return [RAW_OPEN, *asciidoc_lines, "", RAW_CLOSE]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/rhivos-tools/skills/rhivos-fetch-convert/scripts/md2adoc.py` around lines 40 - 42, The return currently concatenates lists in _raw_block; replace that with a list literal using unpacking to construct the result in one expression: create and return a single list combining RAW_OPEN, the unpacked asciidoc_lines, the empty-string separator, and RAW_CLOSE (referencing function _raw_block and symbols RAW_OPEN/RAW_CLOSE).plugins/rhivos-tools/skills/rhivos-fetch-convert/tests/test_e2e_conversion.py (1)
248-249: Consider documenting the expected----count.The assertion
content.count("----") >= 8relies on knowing how many source blocks the fixture produces. A brief comment explaining this expectation (e.g., "4 code snippets × 2 delimiters each") would help maintainers understand what fixture changes might affect this threshold.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/rhivos-tools/skills/rhivos-fetch-convert/tests/test_e2e_conversion.py` around lines 248 - 249, The test asserts content.count("----") >= 8 without explanation; add a brief inline comment above the two assertions in tests/test_e2e_conversion.py describing how the threshold is computed (for example "expected at least 8 delimiters: 4 code snippets × 2 delimiters each"), and optionally replace the magic number with a named local (e.g., expected_delimiters) so the assertion reads assert content.count("----") >= expected_delimiters to make the intent clear and easier to update when the fixture changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/rhivos-tools/skills/rhivos-fetch-convert/scripts/md2adoc.py`:
- Around line 27-34: EXTENSION_LANGUAGE_MAP is missing mappings for .container
and .conf so those code blocks render without a language; update the
EXTENSION_LANGUAGE_MAP dict to include entries for ".container": "ini" and
".conf": "ini" (add these keys alongside existing mappings like ".yml" and
".json") so the md2adoc.py conversion emits [source,ini] for those extensions.
---
Nitpick comments:
In `@plugins/rhivos-tools/skills/rhivos-fetch-convert/scripts/md2adoc.py`:
- Around line 160-165: The _read_snippet_lines function should validate the
1-indexed start/end bounds before slicing: ensure that if start or end are
provided they are positive integers, that start <= end, and that both are within
1..len(all_lines); for out-of-range values either clamp them to the file bounds
or raise a ValueError (be explicit which policy you choose) and handle None as
"no bound", using file_path.read_text(...).splitlines() to compute
len(all_lines) for validation; update the function to perform these checks and
return a clear error message or a clamped slice rather than relying on Python's
negative-index/silent-empty-slice behavior.
- Around line 377-391: The loop currently only ignores the RAW_OPEN/RAW_CLOSE
delimiter lines but still processes lines between them; modify the logic in the
for loop to maintain a boolean "in_raw" state (flip it when encountering
RAW_OPEN or RAW_CLOSE) and when in_raw is True append lines unchanged (skip
running link_pattern.sub via replace_link). Update references in this block (the
loop over lines, the RAW_OPEN/RAW_CLOSE checks, and the use of
link_pattern.sub/replace_link) so raw block content is preserved verbatim.
- Around line 40-42: The return currently concatenates lists in _raw_block;
replace that with a list literal using unpacking to construct the result in one
expression: create and return a single list combining RAW_OPEN, the unpacked
asciidoc_lines, the empty-string separator, and RAW_CLOSE (referencing function
_raw_block and symbols RAW_OPEN/RAW_CLOSE).
In
`@plugins/rhivos-tools/skills/rhivos-fetch-convert/tests/test_e2e_conversion.py`:
- Around line 248-249: The test asserts content.count("----") >= 8 without
explanation; add a brief inline comment above the two assertions in
tests/test_e2e_conversion.py describing how the threshold is computed (for
example "expected at least 8 delimiters: 4 code snippets × 2 delimiters each"),
and optionally replace the magic number with a named local (e.g.,
expected_delimiters) so the assertion reads assert content.count("----") >=
expected_delimiters to make the intent clear and easier to update when the
fixture changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 0bc0c2c0-9d12-4c4a-b9fc-f5b916021017
📒 Files selected for processing (3)
plugins/rhivos-tools/skills/rhivos-fetch-convert/SKILL.mdplugins/rhivos-tools/skills/rhivos-fetch-convert/scripts/md2adoc.pyplugins/rhivos-tools/skills/rhivos-fetch-convert/tests/test_e2e_conversion.py
🚧 Files skipped from review as they are similar to previous changes (1)
- plugins/rhivos-tools/skills/rhivos-fetch-convert/SKILL.md
- Add .container and .conf to EXTENSION_LANGUAGE_MAP as ini - Add defensive bounds validation in _read_snippet_lines for invalid line ranges (start < 1 or end < start) - Track raw block state in convert_markdown_links so content inside raw blocks is not incorrectly link-converted - Use spread syntax in _raw_block (RUF005) - Update test to assert [source,ini] and document delimiter count Made-with: Cursor
Made-with: Cursor
There was a problem hiding this comment.
🧹 Nitpick comments (1)
plugins/rhivos-tools/skills/rhivos-fetch-convert/tests/test_e2e_conversion.py (1)
18-19: Consider a cleaner import approach.The
sys.path.insertpattern works but is fragile if directory structure changes. Consider usingpytest-pythonpathor configuring the path inpyproject.toml/conftest.pyfor a more maintainable solution.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/rhivos-tools/skills/rhivos-fetch-convert/tests/test_e2e_conversion.py` around lines 18 - 19, Replace the fragile sys.path manipulation in the test (which currently inserts the scripts directory to import md2adoc and use process_file) with a maintainable test import strategy: remove the sys.path.insert(...) line and instead configure test discovery to find the scripts package by adding the scripts directory to the test path via pytest configuration (use pytest-pythonpath plugin or add pythonpath in pytest.ini/pyproject.toml) or set up a conftest.py that adjusts PYTHONPATH once for the test session; then import md2adoc and call process_file normally in tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In
`@plugins/rhivos-tools/skills/rhivos-fetch-convert/tests/test_e2e_conversion.py`:
- Around line 18-19: Replace the fragile sys.path manipulation in the test
(which currently inserts the scripts directory to import md2adoc and use
process_file) with a maintainable test import strategy: remove the
sys.path.insert(...) line and instead configure test discovery to find the
scripts package by adding the scripts directory to the test path via pytest
configuration (use pytest-pythonpath plugin or add pythonpath in
pytest.ini/pyproject.toml) or set up a conftest.py that adjusts PYTHONPATH once
for the test session; then import md2adoc and call process_file normally in
tests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 5f9e6872-70e8-4c08-8b8d-f803c862363b
📒 Files selected for processing (2)
plugins/rhivos-tools/skills/rhivos-fetch-convert/scripts/md2adoc.pyplugins/rhivos-tools/skills/rhivos-fetch-convert/tests/test_e2e_conversion.py
md2adoc.pynow runs as a pre-processor on raw Markdown before pandoc, rather than as a post-processor on pandoc's AsciiDoc output. This fixes a fundamental pipeline issue where pandoc mangled MkDocs-specific syntax (converting -- to em-dashes, collapsing block structure, stripping frontmatter) before md2adoc could process it.Key changes to
md2adoc.py:SKILL.md updated to reflect the corrected pipeline order (md2adoc before pandoc) and document --base-path usage.
end-to-end integration tests verify the full pipeline across 4 fixture files with known content markers, validating snippet inlining, prose include resolution, asciidoctor rendering, and absence of residual MkDocs syntax.
Summary by CodeRabbit
New Features
Improvements
Tests