Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent NPE in HtmlAttributes.getIndex #91

Conversation

sideshowbarker
Copy link
Contributor

Apparently it’s possible for a mode value to be such that AttributeName.getQName(mode) returns null. So this change adds a null check to catch that.

Otherwise, without this change, some documents cause that code to throw a NullPointerException.

The NullPointerException is reproducible by parsing the document at https://usbliss.com/?attachment_id=2232 — or by giving that document to the https://validator.w3.org/nu/ checker to check:

https://validator.w3.org/nu/?doc=https://usbliss.com/?attachment_id=2232

That causes the checker to emit the following message:

Internal Error: Oops. That was not supposed to happen. A bug manifested itself in the application internals. Unable to continue. Sorry. The admin was notified.

(The admin who gets notified is me, and notification I get is a stack trace pointing to the NPE inHtmlAttributes.getIndex.)

Apparently it’s possible for a “mode” value to be such that
AttributeName.getQName(mode) returns null. So this change adds a null
check to catch that.

Otherwise, without this change, some documents cause that code to throw
a NullPointerException.
@hsivonen
Copy link
Member

That's really weird. I guess I will need to see in a debugger what's going on.

sideshowbarker added a commit to validator/validator that referenced this pull request Oct 17, 2024
The “vnu” branch of the htmlparser repo contains a workaround for this:
validator/htmlparser#91
…and until we get the actual root cause of the problem figured out, that
workaround is necessary in order to prevent the code from throwing
a massive number of exceptions in production every day all day long.
@sideshowbarker
Copy link
Contributor Author

The HtmlAttributes.getIndex NPE is getting thrown in the production by the W3C service every 5–10 minutes. And I get a notification every time. So I’ve created a https://github.com/validator/htmlparser/tree/vnu branch for the htmlparser repo and pushed the workaround from here to that branch — and in validator/validator@7d62ce1 I’ve switched the checker code over for now to using that htmlparser branch.

@sideshowbarker
Copy link
Contributor Author

OK, looking back through my notifications and logs for the documents that are causing the NPE to be thrown — and then manually running those documents back through the checker myself, I find that one thing they all so far appear to have in common is that in every case they’re also causing a “Stray start tag html error to be reported.

So, I can imagine that what may be happening is that the parser is trying to check the attributes on that stray html start tag, but the state is such that AttributeName.getQName(mode) is unexpectedly null in that case.

@sideshowbarker
Copy link
Contributor Author

OK, the document <html><html id> is a reduced case with which the NPE/regression can be reproduced. Or with most any other arbitrary attribute name in that second html start tag.

Some bisecting seems to indicate the change in 32c2256 introduced the regression for that case. If I build instead from 43aa7fc — the parent of that change — the NPE doesn’t get thrown.

Looking briefly through the 32c2256 change, nothing obvious jumps out to me as being something that’d lead to the behavior breaking for the <html><html id> case. But I’ll make some time to try digging into it further.

cc @emilio

@emilio
Copy link
Contributor

emilio commented Oct 18, 2024

@sideshowbarker can you paste the full stack trace?

emilio added a commit to emilio/htmlparser that referenced this pull request Oct 18, 2024
@emilio
Copy link
Contributor

emilio commented Oct 18, 2024

@sideshowbarker I think the right fix is #92, but if you could confirm it'd be appreciated.

@sideshowbarker
Copy link
Contributor Author

@sideshowbarker can you paste the full stack trace?

Yup, here it is in the context of the validator service:

nu.validator.servlet.VerifierServletTransaction - RuntimeException, doc: data:text/html;charset=utf-8,<html><html id> schema:  lax: false
java.lang.NullPointerException: Cannot invoke "String.equals(Object)" because the return value of "nu.validator.htmlparser.impl.AttributeName.getQName(int)" is null
	at nu.validator.htmlparser.impl.HtmlAttributes.getIndex(HtmlAttributes.java:187)
	at nu.validator.htmlparser.sax.SAXStreamer.addAttributesToElement(SAXStreamer.java:49)
	at nu.validator.htmlparser.sax.SAXStreamer.addAttributesToElement(SAXStreamer.java:35)
	at nu.validator.htmlparser.impl.TreeBuilder.addAttributesToHtml(TreeBuilder.java:4778)
	at nu.validator.htmlparser.impl.TreeBuilder.startTag(TreeBuilder.java:2696)
	at nu.validator.htmlparser.impl.Tokenizer.emitCurrentTagToken(Tokenizer.java:1287)
	at nu.validator.htmlparser.impl.Tokenizer.stateLoop(Tokenizer.java:2027)
	at nu.validator.htmlparser.impl.Tokenizer.tokenizeBuffer(Tokenizer.java:1488)
	at nu.validator.htmlparser.io.Driver.runStates(Driver.java:319)
	at nu.validator.htmlparser.io.Driver.tokenize(Driver.java:236)
	at nu.validator.htmlparser.io.Driver.tokenize(Driver.java:177)
	at nu.validator.htmlparser.sax.HtmlParser.tokenize(HtmlParser.java:536)
	at nu.validator.htmlparser.sax.HtmlParser.parse(HtmlParser.java:415)
	at nu.validator.htmlparser.sax.HtmlParser.parse(HtmlParser.java:405)
	at nu.validator.xml.WiretapXMLReaderWrapper.parse(WiretapXMLReaderWrapper.java:158)
	at nu.validator.xml.AttributesPermutingXMLReaderWrapper.parse(AttributesPermutingXMLReaderWrapper.java:303)
	at nu.validator.servlet.VerifierServletTransaction.validate(VerifierServletTransaction.java:1177)
	at nu.validator.servlet.PageEmitter.emit(PageEmitter.java:60)
	at nu.validator.servlet.VerifierServletTransaction.service(VerifierServletTransaction.java:956)
	at nu.validator.servlet.VerifierServlet.doPost(VerifierServlet.java:304)
	at nu.validator.servlet.VerifierServlet.doGet(VerifierServlet.java:212)

@sideshowbarker sideshowbarker deleted the sideshowbarker/HtmlAttributes.getIndex-prevent-NPE branch October 18, 2024 13:17
@sideshowbarker
Copy link
Contributor Author

Superseded by #92

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants