-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow reading / writing binary stored fields as DataInput #12581
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks @iverase
public void binaryField(FieldInfo fieldInfo, DataInput value, int length) throws IOException { | ||
writeField(remap(fieldInfo), value, length); | ||
} | ||
|
||
@Override | ||
public void binaryField(FieldInfo fieldInfo, byte[] value) throws IOException { | ||
// TODO: can we avoid new BR here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this ever get called now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, it should not be called, maybe we should assert that it doesn't get called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the SimpleTextCodec
codec is using it.
This commit adds the possibility to read / write binary stored values using a DataInput and the number of bytes. By default the implementations will allocate those bytes in a newly created byte array and call the already existing method.
Currently, the only way to handle binary data on stored fields is via byte arrays (wrapped as BytesRef). THis means we are allocating a new byte array everytime we read the value which is wasteful and it can be problematic as those arrays can be humongous and add pressure to the GC.
In this PR we proposed to add the possibility to read / write binary stored values using a DataInput and the number of bytes. By default the implementations will allocate those bytes in a newly created byte array and call the already existing method.
This should speed up the merging of stored fields as we are not using an intermediate data structure any longer and allow implementoirs to read the binary fields without having to allocate a byte array.
EDIT: merges may be faster if merge strategy is
MergeStrategy.VISITOR
closes #12556