-
-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raw_html removes content within self closing tags #132
Comments
Pull requested submitted for this issue: #131 |
In HTML5 void elements (which I believe void elements should get closed automatically by HTML5 parsers, meaning that if they have contents, it will be parsed as the contents of the void element's parent. In your case, this means that the correct HTML5 parsing of <item>
<link>www.example.com</link>
</item> would be <item>
<link /> www.example.com
</item> As you say, this causes problems when trying to parse XML, but at the moment Floki only supports HTML. |
That said, your fix suggests that the data is being incorrectly parsed (or at least, not parsed according to HTML5) by |
Yes, I'm assuming that his fix is meant to represent HTML that was wrongly parsed. I think it's OK to support this scenario, even if it's representing an "invalid" HTML5 since it is what was the parsed tree. Thank you @navinpeiris and @mischov! 😃 |
When using
raw_html
with self closing tags that contain some text, the said text is lost in the output. This becomes problematic when trying to get the xml content for rss items etc.For example, using
raw_html
after parsing the following xml:results in:
The text was updated successfully, but these errors were encountered: