Skip to content

Floki.HTMLParser.FastHtml fails to parse <noscript> properly when present in head. #580

@arun-muthuraman

Description

@arun-muthuraman

Description

When there is a noscript tag with content present in the head, when parsed, FastHtml moves all the tags present after noscript and the children of noscript into the body node. There is no issue if the noscript tag is present inside the body.

To Reproduce

Steps to reproduce the behavior:

  • Using Floki v0.35.2
  • Using Elixir v1.15.7
  • Using Erlang OTP 26
  • With this code:
 test_html =
  """
  <html>
   <head>
     <noscript><a href='/test'>link</a></noscript>
     <title>test</title>
   </head>
   <body>
     <noscript><a href='/test'>link</a></noscript>
     <p>test p</p>
   </body>
  </html>
  """

Floki.parse_document!(test_html, html_parser: Floki.HTMLParser.FastHtml)

[
  {"html", [],
   [
     {"head", [], ["\n   ", {"noscript", [], []}]},
     {"body", [],
      [
        {"a", [{"href", "/test"}], ["link"]},
        "\n   ",
        {"title", [], ["test"]},
        "\n \n \n   ",
        {"noscript", [], [{"a", [{"href", "/test"}], ["link"]}]},
        "\n   ",
        {"p", [], ["test p"]},
        "\n \n\n"
      ]}
   ]}
]

Expected behavior

[
{"html", [],
 [
   {"head", [],
    [
      "\n",
      {"noscript", [],
       [{"a", [{"href", "/link"}], ["link text"]}]},
      {"title", [], ["test"]},
      "\n"
    ]},
   "\n",
   {"body", [],
    [
      "\n",
      {"noscript", [],
       [{"a", [{"href", "/link"}], ["link text"]}]},
      "\n",
      {"p", [], ["test p"]},
      "\n\n\n"
    ]}
 ]}
]

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions