-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] replace zzBuffer[x] with a zzChar(x) [sf#18] #153
Comments
Commented by briansmith on 2006-10-29 06:30 UTC |
It looks like the original suggestion has become more difficult now that Emitter is calling Character methods that assume a char array (i.e., Character.offsetByCodePoints, Character.codePointAt). An alternative approach is to modify JFlex's Emitter to require a CharSequence, and update the default skeletons to wrap an incoming char array in a lightweight CharSequence implementation backed by the array. A quick benchmark shows minimal difference in performance between array[n] and sequence.charAt(n). import java.util.Random;
public class Test {
static Random random = new Random();
private static char[] createCharacters() {
char[] characters = new char[100000];
for (int i = 0; i < characters.length; i++) {
characters[i] = (char)('a' + (i % 26));
}
return characters;
}
private static CharSequence createCharSequence(final char[] array) {
return createCharSequence(array, 0, array.length);
}
private static CharSequence createCharSequence(final char[] array, final int min, final int max) {
return new CharSequence() {
@Override
public int length() {
return max - min;
}
@Override
public char charAt(int index) {
return array[min + index];
}
@Override
public CharSequence subSequence(int start, int end) {
return createCharSequence(array, min + start, min + end);
}
};
}
private static char testCharArray(char[] array) {
char c = '\0';
for (int i = 0; i < 10000000; i++) {
c = array[random.nextInt(array.length)];
}
return c;
}
private static char testCharSequence(CharSequence sequence) {
char c = '\0';
for (int i = 0; i < 10000000; i++) {
c = sequence.charAt(random.nextInt(sequence.length()));
}
return c;
}
private static void runTest(char[] array) {
CharSequence sequence = createCharSequence(array);
long start = System.currentTimeMillis();
testCharArray(array);
System.out.printf("ARRAY took %d ms\n", System.currentTimeMillis() - start);
start = System.currentTimeMillis();
testCharSequence(sequence);
System.out.printf("SEQUENCE took %d ms\n", System.currentTimeMillis() - start);
}
public static void main(String[] args) {
char[] array = createCharacters();
for (int i = 0; i < 100; i++) {
runTest(array);
}
}
} Output snippet: ARRAY took 98 ms |
The numbers are encouraging. I think we still need to run it in the JFlex context (i.e. a full scanner loop) to make sure, because it might influence cache-locality etc, which can have surprising effects, but it looks much better than I feared it might. |
It seems @jvolkman benchmark spends almost all time in
In ms: 4.09 vs 4.59. Will try to investigate more. |
I think we should prioritize this, because most of the lexers I see are "generated by JFlex 1.7.0 tweaked for IntelliJ platform" because they are IntellJ plugins. |
Happy to put this at the top of the list, although I don't think makes sense as a standard setting. Going from 4.0 to 4.6 is significant slowdown for one specific application. We could provide it as option, though, for when flexibility matters more than performance (like here). I do not think that IntelliJ lexers are anywhere close to the majority of applications, they are just prominent, which is fine. Very happy to support them in any case, it shouldn't be necessary to tweak the generator to use the lexers. |
I'd suggest not to rely on my benchmark because since then everything changed a lot, new JDKs released, they may behave differently and moreover it tests only simple completely out-of-the-real-world use case. Exactly |
Good point, we should definitely add this to the benchmark suite and see what it does in context. If it's provided as an option, I'm less worried about the performance impact -- developers can then make the trade-off themselves. It's just if it's an interface change for everyone that we should be more careful. |
Reported by briansmith on 2006-10-29 06:30 UTC
My suggestion is simple:
Everywhere that the code generate currently generates
zzBuffer[xxxx]
, it should instead generatezzCharAt(xxxx)
.The standard skeletons define
zzCharAt
to be:The benefit is that, once this is done, then user-defined skeletons can totally replace zzBuffer with any data structure they choose. In particular, they can replace it with a CharSequence.
The JetBrains team has already created such a modified version of JFlex which they recommend people to use to implement on-the-fly lexical analyzis for syntax highlighting and other in-editor uses. Please see the "Implementing a Lexer" section of http://www.jetbrains.com/idea/documentation/idea\_5.0.html
for information.
My suggested variation allows for the same functionality as theirs, while remaining compatible with older JDK versions (without CharSequence). I believe, but haven't verified, that modern JVm's should have no problems inlining zzCharAt() to result in minimal performance impact. At least in my application, performance wasn't impacted.
I will attach a patch in diff -u format.
The text was updated successfully, but these errors were encountered: