Add an option to produce fine-grained HTML renderers based on CommonMark's grammar#635
Add an option to produce fine-grained HTML renderers based on CommonMark's grammar#635herley-shaori wants to merge 29 commits intoWitiko:mainfrom
Conversation
…TML types Implement the first task of issue Witiko#606: expose CommonMark's HTML block type differentiation through individual renderers. When the new `parseHtmlBlocks` option is enabled (default: false), the parser produces type-specific renderers instead of the generic `inputBlockHtmlElement` and `inlineHtmlTag` renderers: Block HTML renderers (by CommonMark type): - inputBlockHtmlCommentElement (type 2: HTML comments) - inputBlockHtmlInstructionElement (type 3: processing instructions) - inputBlockHtmlDeclarationElement (type 4: declarations) - inputBlockHtmlCdataElement (type 5: CDATA sections) - inputBlockHtmlSpecialElement (type 1: script/pre/style/textarea) - inputBlockHtmlRegularElement (type 6: div/table/form etc.) - inputBlockHtmlAnyElement (type 7: any other complete tag) Inline HTML renderers: - inlineHtmlInstruction (processing instructions) - inlineHtmlDeclaration (declarations) - inlineHtmlCdataSection (CDATA sections) - inlineHtmlOpenTag (opening tags) - inlineHtmlCloseTag (closing tags) - inlineHtmlEmptyTag (self-closing tags) The existing inlineHtmlComment renderer remains unchanged. When parseHtmlBlocks is false (default), behavior is fully backward compatible. Closes Witiko#606 (task 1)
|
Hi @herley-shaori, this looks great, especially for a first-time contribution. Thanks for putting in the effort! Below, I reviewed the code. If you'd like to make the necessary changes yourself, that would be great; otherwise, I'm happy to take over the PR from here. |
tests/testfiles/regression/github/issue-606-block-html-types.test
Outdated
Show resolved
Hide resolved
…option Apply all changes requested in the code review: - Rename `parseHtmlBlocks` (boolean) option to `htmlOutput` (string) with values `basic` (default) and `commonmark`, making the design more future-proof for potential additional values like `nodes`. - Simplify option documentation to not list individual renderers (consistent with other option descriptions in the codebase). - Fix documentation markup: replace `\Mdef` with `\mref` for cross-references to renderers defined elsewhere. - Add `[raw-html]` link reference for inline HTML construct types. - Fix Lua code indentation in InlineHtml and DisplayHtml parser sections. - Remove unnecessary `if: format != 'context'` condition from test file. - Update .gitignore entry from `venv/` to `tests/test-virtualenv`.
|
Tasks for myself in addition to the comments from the code review:
|
|
Hi @Witiko, thanks so much for taking the time to review this! Your feedback is really valuable. I realize I still have a lot to learn — I'll step back and leave this PR to you. If there's anything I can do to help in the future, happy to contribute! 😊 |
- Add \mref{markdownRendererInlineHtmlComment} to the `basic` option
description for completeness.
- Add [html-blocks] and [raw-html] link references to the option
documentation fragment (each \begin{markdown} fragment needs its own
link references).
- Split renderer documentation into separate sections:
- Rename "HTML Tag and Element Renderers" to "Basic HTML Tag and
Element Renderers" for the generic renderers.
- Add "CommonMark Block HTML Element Renderers" section for
type-specific block renderers.
- Add "CommonMark Inline HTML Renderers" section for type-specific
inline renderers.
- Fix Lua code style: add space after Cs( in else branches of
InlineHtml and DisplayHtml parsers for consistency.
tests/testfiles/unit/lunamark-markdown/html-output-commonmark.test
Outdated
Show resolved
Hide resolved
Move issue-606-block-html-types.test from regression/github/ to unit/lunamark-markdown/html-output-commonmark.test as requested in review feedback.
|
Everything seems OK. I am waiting with the merge until after the v3.14.1 bugfix release, likely at the end of this week, so that we don't ship new features with it. |
Co-authored-by: Andrej Genčur <xgencur@fi.muni.cz>
Co-authored-by: Andrej Genčur <xgencur@fi.muni.cz>
Witiko
left a comment
There was a problem hiding this comment.
@herley-shaori: I found a few other issues that still need to be addressed. If you have the time and would like to, please feel free to take care of them; otherwise, I can finish them myself.
| containing the corresponding HTML text. Their prototypes fall back on | ||
| \mref{markdownRendererInputBlockHtmlElementPrototype}. |
There was a problem hiding this comment.
I don't think this is currently implemented, i.e. the prototypes produce an empty expansion at the moment.
Default definitions that fall back onto \markdownRendererInputBlockHtmlElementPrototype should be added to the section ### Token Renderer Prototypes {#tex-token-renderer-prototypes}.
Here is an easy way to verify that this was implemented correctly: after this change, the test file no-html-output.test should pass even when htmlOutput = commonmark is added to the \markdownSetup command at the top of the file and all newly added definitions are removed from keyval-setup.tex.
There was a problem hiding this comment.
Furthermore, blockHtmlStandaloneTag can't really fall back onto the inputBlockHtmlElement due to the argument mismatch. Instead, it may need to fall back onto e.g. the inlineHtmlTag prototype.
| Each of these macros receives a single argument with the HTML text. Their | ||
| prototypes fall back on \mref{markdownRendererInlineHtmlTagPrototype}. |
There was a problem hiding this comment.
I don't think this is currently implemented, i.e. the prototypes produce an empty expansion at the moment.
Default definitions that fall back onto \markdownRendererInlineHtmlTagPrototype should be added to the section ### Token Renderer Prototypes {#tex-token-renderer-prototypes}.
Here is an easy way to verify that this was implemented correctly: after this change, the test file no-html-output.test should pass even when htmlOutput = commonmark is added to the \markdownSetup command at the top of the file and all newly added definitions are removed from keyval-setup.tex.
| \markdownSetup{ | ||
| htmlOutput = commonmark, | ||
| renderers = { | ||
| inputBlockHtmlComment = {\marginnote{\input{#1}}}, |
There was a problem hiding this comment.
This still won't work, since we don't strip the leading <!-- and the trailing -->. For consistency with the inlineHtmlComment renderer, we should likely strip these, parse the content and provide it directly, not through a separate file, falling back on the inlineHtmlComment renderer prototype instead of inputBlockHtmlElement.
Then, this example should be updated as follows:
| inputBlockHtmlComment = {\marginnote{\input{#1}}}, | |
| blockHtmlComment = \marginnote{#1}, |
…:herley-shaori/markdown into feature/issue-606-parse-html-block-types
Summary
Implements the first task of #606 by exposing CommonMark's HTML block type differentiation through individual renderers.
A new boolean option
parseHtmlBlocks(default:false) is added. When enabled alongside thehtmloption, the parser produces type-specific renderers instead of the genericinputBlockHtmlElementandinlineHtmlTagrenderers:Block HTML renderers (each receives a filename of a file containing the HTML block contents):
inputBlockHtmlCommentElement<!-- ... -->inputBlockHtmlInstructionElement<? ... ?>inputBlockHtmlDeclarationElement<! ... >inputBlockHtmlCdataElement<![CDATA[ ... ]]>inputBlockHtmlSpecialElement<script>,<pre>,<style>,<textarea>inputBlockHtmlRegularElement<div>,<table>,<form>, etc.inputBlockHtmlAnyElementInline HTML renderers (each receives the tag contents as a string):
inlineHtmlInstruction<? ... ?>inlineHtmlDeclaration<! ... >inlineHtmlCdataSection<![CDATA[ ... ]]>inlineHtmlOpenTag<tag>inlineHtmlCloseTag</tag>inlineHtmlEmptyTag<tag/>The existing
inlineHtmlCommentrenderer remains unchanged. WhenparseHtmlBlocksisfalse(default), behavior is fully backward compatible.Changes
markdown.dtx: AddedparseHtmlBlocksoption definition, 13 new renderer registrations, 13 new Lua writer functions, conditional parser routing inDisplayHtmlandInlineHtml, and documentation for all new renderers.tests/support/keyval-setup.tex: Added test renderer prototypes for all 13 new renderers.tests/testfiles/regression/github/issue-606-block-html-types.test: New regression test verifying type-specific renderers are produced whenparseHtmlBlocksis enabled.Test plan
CommonMark_0.30/html_blockstests pass (backward compatibility)CommonMark_0.31.2/raw_htmltests pass (backward compatibility)renameutility — CI should cover this)Note
This PR addresses Task 1 of #606. Task 2 (renderers corresponding to HTML nodes) would require a more substantial HTML parser and is left for a follow-up.
Continues #606.