raw_html removes content within self closing tags #132

navinpeiris · 2017-08-03T07:31:25Z

When using raw_html with self closing tags that contain some text, the said text is lost in the output. This becomes problematic when trying to get the xml content for rss items etc.

For example, using raw_html after parsing the following xml:

<item>
  <link>www.example.com</link>
</item>

results in:

<item>
  <link/></link>
</item>

The text was updated successfully, but these errors were encountered:

navinpeiris · 2017-08-03T07:31:52Z

Pull requested submitted for this issue: #131

mischov · 2017-08-03T19:08:47Z

In HTML5 void elements (which <link> is) cannot have closing tags, and consequently cannot have contents. https://html.spec.whatwg.org/multipage/syntax.html#elements-2.

I believe void elements should get closed automatically by HTML5 parsers, meaning that if they have contents, it will be parsed as the contents of the void element's parent.

In your case, this means that the correct HTML5 parsing of

<item>
  <link>www.example.com</link>
</item>

would be

<item>
  <link /> www.example.com
</item>

As you say, this causes problems when trying to parse XML, but at the moment Floki only supports HTML.

mischov · 2017-08-03T19:23:30Z

That said, your fix suggests that the data is being incorrectly parsed (or at least, not parsed according to HTML5) by mochiweb_html, then just dropped by raw_html, so your fix is a good one in this particular case (it is the same approach I took in Meeseeks).

philss · 2017-08-04T01:54:55Z

Yes, I'm assuming that his fix is meant to represent HTML that was wrongly parsed. I think it's OK to support this scenario, even if it's representing an "invalid" HTML5 since it is what was the parsed tree.

Thank you @navinpeiris and @mischov! 😃

philss closed this as completed Aug 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raw_html removes content within self closing tags #132

raw_html removes content within self closing tags #132

navinpeiris commented Aug 3, 2017

navinpeiris commented Aug 3, 2017

mischov commented Aug 3, 2017

mischov commented Aug 3, 2017 •

edited

Loading

philss commented Aug 4, 2017

raw_html removes content within self closing tags #132

raw_html removes content within self closing tags #132

Comments

navinpeiris commented Aug 3, 2017

navinpeiris commented Aug 3, 2017

mischov commented Aug 3, 2017

mischov commented Aug 3, 2017 • edited Loading

philss commented Aug 4, 2017

mischov commented Aug 3, 2017 •

edited

Loading