Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow reading / writing binary stored fields as DataInput #12581

Merged
merged 7 commits into from
Sep 25, 2023

Conversation

iverase
Copy link
Contributor

@iverase iverase commented Sep 21, 2023

Currently, the only way to handle binary data on stored fields is via byte arrays (wrapped as BytesRef). THis means we are allocating a new byte array everytime we read the value which is wasteful and it can be problematic as those arrays can be humongous and add pressure to the GC.

In this PR we proposed to add the possibility to read / write binary stored values using a DataInput and the number of bytes. By default the implementations will allocate those bytes in a newly created byte array and call the already existing method.

This should speed up the merging of stored fields as we are not using an intermediate data structure any longer and allow implementoirs to read the binary fields without having to allocate a byte array.

EDIT: merges may be faster if merge strategy is MergeStrategy.VISITOR

closes #12556

@iverase iverase changed the title Allow reading / writing binary stored values as DataInput Allow reading / writing binary stored fields as DataInput Sep 21, 2023
Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks @iverase

public void binaryField(FieldInfo fieldInfo, DataInput value, int length) throws IOException {
writeField(remap(fieldInfo), value, length);
}

@Override
public void binaryField(FieldInfo fieldInfo, byte[] value) throws IOException {
// TODO: can we avoid new BR here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this ever get called now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it should not be called, maybe we should assert that it doesn't get called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the SimpleTextCodec codec is using it.

@iverase iverase merged commit d48913a into apache:main Sep 25, 2023
4 checks passed
@iverase iverase deleted the binaryStoredFields branch September 25, 2023 09:09
iverase added a commit that referenced this pull request Sep 25, 2023
This commit adds the possibility to read / write binary stored values using a DataInput and the number of bytes. By default the implementations will allocate those bytes in a newly created byte array and call the already existing method.
@iverase iverase added this to the 9.9.0 milestone Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow reading binary stored values as DataInput
2 participants