-
Notifications
You must be signed in to change notification settings - Fork 48
Fix error handling for non-UTF-8 string in Lexer #93
Conversation
It looks like duplicate of #88, but this fix is more accurate. |
* Regardless source of the bug, try to report about this exception to the library maintainers. | ||
* Even if bug is yours, this exception must not happen. | ||
*/ | ||
final class InternalError extends LogicException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
is not required here. I think you can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All exceptions must extend Hoa\Exception
, see sibling exception in the same directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also name the class RegularExpression
or PCRE
, something less generic than InternalError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final is not required here. I think you can remove it.
Let's try to see it from another point of view: there is no final
keyword. There is open
one. You suggest to open for inheritance. Why? I mean, it's known idea that composition is preferable over inheritance for some objective reasons. So it could be just a language design mistake to make classes open to inheritance by default. Some modern languages, for example, fixed it. So what's the real reason to remove open class for inheritance?final
I would also name the class RegularExpression or PCRE, something less generic than InternalError.
I don't get it either: RegularExpression
is a special language or pattern, depends on context, but not an error. Kinda weird for exception class name, IMO. You probably rely on namespaces, but I
I understand and respect your own coding style, but I'm curious how do you explain it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use final
in the existing code base. I prefer to address that as another PR if you don't mind :-). You are pointing to an interesting approach, and I like it, but yeah, keep things seperate a little bit :-).
About the class/exception name, maybe REgularExpression
is not appropriate. But I don't find InternalError
more appropriate, it's too much abstract.
The exception represents an error in the lexer. We can (i) reuse the Hoa\Compiler\Exception\Lexer
exception class (https://github.com/hoaproject/Compiler/blob/master/Exception/Lexer.php), or (ii) create another one to reflect more precisely what is the origin of the exception, hence a better name than InternalError
.
If you decide to re-use Exception\Lexer
, I'm fine with that. I would even suggest to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hoa\Compiler\Exception\Lexer
doesn't look like internal error of the Lexer
. In other words, internal error is library's failure, user did nothing wrong (at least how I interpret it; that's why I reused it for Text is not valid utf-8 string, you probably need to switch "lexer.unicode" setting off.
, obvious misconfiguration by user). It's like comparing 4xx HTTP error codes to 5xx.
I don't like idea to mix up semantically different errors into one exception.
InternalError
is definitely generic name, I consider this exception as unchecked one; users shouldn't be aware of this exception at all. Thus, they shouldn't catch this type of exception directly (only like \Exception
or \Throwable
in specific parts of application where failure of subprogram can be handled).
The thing either works or it is broken. Classifying hypothetical errors is a waste of time for me as most of them are really hypothetical: you don't expect them, otherwise you'd just fixed the; you just make some "save point" for yourself to make sure that it works internally like you expected without surprises. It is similar to Assertion::blahBlah()
: you don't care about exception class. Assertion just must not fail. If it does, your program is bugged like if you passed object as argument to integer parameter.
And regarding names. Practice says that verbs are way more important in programming than nouns. Stack trace says Lexer
is broken and Lexer
throws Lexer
exception. I want to know what happened wrong. LexerEncounteredInvalidUtf8Sequence
, TokenIsEmptyString
, etc.
I also was confused when I found class Lexer
in unit-tests which tried to mimic test of real Lexer
. :)
But I can reuse Hoa\Compiler\Exception\Lexer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to add: that's why there is no unit test for InternalError
: it's impossible to achieve via public API of the class unless the code is actually bugged.
PhpStorm also doesn't inspect tag @throws
if exception was inherited from \LogicException
and \RuntimeException
by default, that's why it was inherited from \LogicException
.
But in the current moment PhpStorm just goes nuts with exceptions.
composer.json
Outdated
@@ -28,6 +28,8 @@ | |||
}, | |||
"require": { | |||
"php" : ">=5.5.0", | |||
"ext-ctype" : "*", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why ext/ctype
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PhpStorm thinks that it is implicit dependency: you use some function from that extension, but didn't declare this dependency. I don't remember exact function name, unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! It shows the error and the patch is well-written. I've noted few things to update here and there, but they are all minors. Once they are fixed, it's ready to be merged. |
4745cd5
to
05740f1
Compare
Co-Authored-By: unkind <[email protected]>
Sorry, I didn't read the condition properly - it will need to be |
What's preventing this PR from getting merged now? |
I don't know what's going on with this one, sorry. I still can't make
And I still don't understand this mess with Closing this one. |
So far
Lexer
fails into infinite recursion with enabledunicode.mode
if input data is not properly encoded.