Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce raw fwd index version V5 containing implicit num doc length, improving space efficiency #14105

Merged
merged 104 commits into from
Oct 17, 2024
Merged
Show file tree
Hide file tree
Changes from 54 commits
Commits
Show all changes
104 commits
Select commit Hold shift + click to select a range
84987ed
Initial implementation of toggling explicit MV entry size for MVFixed…
jackluo923 Sep 27, 2024
d654fd9
Fixed uncovered code paths exposed via unit test
jackluo923 Oct 1, 2024
3d4b99b
Fix style issue
jackluo923 Oct 1, 2024
8c967b5
Refactored code to use new class versions.
jackluo923 Oct 2, 2024
c2359ec
Fixed style.
jackluo923 Oct 2, 2024
dd3410f
Refactored MultiValueFixedByteRawIndexCreatorTest.java
jackluo923 Oct 2, 2024
0c0df84
Fix style.
jackluo923 Oct 2, 2024
e7e091b
Modified existing unit test and extended it for MultiValueFixedByteRa…
jackluo923 Oct 2, 2024
153be16
Improved unit test for MultiValueFixedByteRawIndexCreatorTest and Mul…
jackluo923 Oct 2, 2024
0233905
Remove redundant blank line
jackluo923 Oct 2, 2024
69defe1
Adjusted comments content
jackluo923 Oct 2, 2024
e1173c0
Removed redundant constructor missed during refactoring.
jackluo923 Oct 3, 2024
34ac786
Upgrade MVFixedByteRawIndex reader and writer from V4 to V5, retain f…
jackluo923 Oct 8, 2024
b090676
Minor changes in MultiValueFixedByteRawIndexCreator
jackluo923 Oct 8, 2024
2ff1914
Fix minor style issue.
jackluo923 Oct 8, 2024
54b2709
Refactored FixByteChunkMVForwardIndexReader
jackluo923 Oct 8, 2024
318b826
Deleted FixByteChunkMVForwardIndexReaderV2
jackluo923 Oct 9, 2024
a9170b7
Deleted FixByteChunkMVForwardIndexReaderV2Test
jackluo923 Oct 9, 2024
d699c2a
Add VarByteChunkV5Test unit test
jackluo923 Oct 9, 2024
7137792
Add license to VarByteChunkV5Test unit test
jackluo923 Oct 9, 2024
6452c79
Improved unit test
jackluo923 Oct 9, 2024
9bfbd22
Refactored unit test
jackluo923 Oct 9, 2024
fac46c5
Add blank line
jackluo923 Oct 9, 2024
29a9fdb
Remove blank line
jackluo923 Oct 9, 2024
8abf7fe
Add blank line
jackluo923 Oct 10, 2024
1c877c6
Remove blank line
jackluo923 Oct 10, 2024
ce6870b
Rebase with lastest master
jackluo923 Sep 27, 2024
0d71f91
Fixed uncovered code paths exposed via unit test
jackluo923 Oct 1, 2024
736f23f
Fix style issue
jackluo923 Oct 1, 2024
f2faece
Refactored code to use new class versions.
jackluo923 Oct 2, 2024
01cbf56
Fixed style.
jackluo923 Oct 2, 2024
f8c2f24
Refactored MultiValueFixedByteRawIndexCreatorTest.java
jackluo923 Oct 2, 2024
a6ca351
Fix style.
jackluo923 Oct 2, 2024
61fee18
Modified existing unit test and extended it for MultiValueFixedByteRa…
jackluo923 Oct 2, 2024
500dae6
Improved unit test for MultiValueFixedByteRawIndexCreatorTest and Mul…
jackluo923 Oct 2, 2024
427bdb5
Remove redundant blank line
jackluo923 Oct 2, 2024
0cb705d
Adjusted comments content
jackluo923 Oct 2, 2024
1617ebd
Removed redundant constructor missed during refactoring.
jackluo923 Oct 3, 2024
813d360
Upgrade MVFixedByteRawIndex reader and writer from V4 to V5, retain f…
jackluo923 Oct 8, 2024
7a565e4
Fix minor style issue.
jackluo923 Oct 8, 2024
296959d
Deleted FixByteChunkMVForwardIndexReaderV2
jackluo923 Oct 9, 2024
0d7073b
Deleted FixByteChunkMVForwardIndexReaderV2Test
jackluo923 Oct 9, 2024
427af58
Add VarByteChunkV5Test unit test
jackluo923 Oct 9, 2024
6e8c3ae
Add license to VarByteChunkV5Test unit test
jackluo923 Oct 9, 2024
52976a7
Improved unit test
jackluo923 Oct 9, 2024
9bb453a
Refactored unit test
jackluo923 Oct 9, 2024
93d3100
Add blank line
jackluo923 Oct 9, 2024
79e91e9
Remove blank line
jackluo923 Oct 9, 2024
9a7676f
Add blank line
jackluo923 Oct 10, 2024
aa9eb74
Remove blank line
jackluo923 Oct 10, 2024
4ce5280
Refactored code to utilize changes from Extract common MV ser/de logi…
jackluo923 Oct 14, 2024
0d88f2f
Merge remote-tracking branch 'origin/master-improved-MV-fixed-byte-in…
jackluo923 Oct 14, 2024
ec1b628
Removed redundant RuntimeException from method signature
jackluo923 Oct 14, 2024
87dd327
Merge branch 'apache:master' into master-improved-MV-fixed-byte-index
jackluo923 Oct 15, 2024
e7f645c
Rebase with lastest master
jackluo923 Sep 27, 2024
5cb4575
Fixed uncovered code paths exposed via unit test
jackluo923 Oct 1, 2024
1ece331
Fix style issue
jackluo923 Oct 1, 2024
4c35683
Refactored code to use new class versions.
jackluo923 Oct 2, 2024
bd1da13
Fixed style.
jackluo923 Oct 2, 2024
2637e2d
Refactored MultiValueFixedByteRawIndexCreatorTest.java
jackluo923 Oct 2, 2024
731906e
Fix style.
jackluo923 Oct 2, 2024
171aaf4
Modified existing unit test and extended it for MultiValueFixedByteRa…
jackluo923 Oct 2, 2024
bca5eda
Improved unit test for MultiValueFixedByteRawIndexCreatorTest and Mul…
jackluo923 Oct 2, 2024
ce5eb7b
Remove redundant blank line
jackluo923 Oct 2, 2024
7274f4c
Adjusted comments content
jackluo923 Oct 2, 2024
ea29a13
Removed redundant constructor missed during refactoring.
jackluo923 Oct 3, 2024
256d774
Upgrade MVFixedByteRawIndex reader and writer from V4 to V5, retain f…
jackluo923 Oct 8, 2024
acfe864
Fix minor style issue.
jackluo923 Oct 8, 2024
a4751b6
Deleted FixByteChunkMVForwardIndexReaderV2
jackluo923 Oct 9, 2024
32062a1
Deleted FixByteChunkMVForwardIndexReaderV2Test
jackluo923 Oct 9, 2024
ef3f663
Add VarByteChunkV5Test unit test
jackluo923 Oct 9, 2024
38a8cb6
Add license to VarByteChunkV5Test unit test
jackluo923 Oct 9, 2024
b8dfacd
Improved unit test
jackluo923 Oct 9, 2024
bd9bdee
Refactored unit test
jackluo923 Oct 9, 2024
5b6e29e
Add blank line
jackluo923 Oct 9, 2024
597762f
Remove blank line
jackluo923 Oct 9, 2024
ff21345
Add blank line
jackluo923 Oct 10, 2024
cde2a6d
Remove blank line
jackluo923 Oct 10, 2024
d430d61
Refactored code to utilize changes from Extract common MV ser/de logi…
jackluo923 Oct 14, 2024
3f68f75
Rebased to latest
jackluo923 Oct 1, 2024
27328bf
Rebase to latest
jackluo923 Oct 1, 2024
79c6f66
Refactored code to use new class versions.
jackluo923 Oct 2, 2024
e9778d3
Fixed style.
jackluo923 Oct 2, 2024
b43f676
Refactored MultiValueFixedByteRawIndexCreatorTest.java
jackluo923 Oct 2, 2024
c6033b9
Fix style.
jackluo923 Oct 2, 2024
3f654e4
Modified existing unit test and extended it for MultiValueFixedByteRa…
jackluo923 Oct 2, 2024
12af1ce
Improved unit test for MultiValueFixedByteRawIndexCreatorTest and Mul…
jackluo923 Oct 2, 2024
063c5b4
Adjusted comments content
jackluo923 Oct 2, 2024
b8794f2
Upgrade MVFixedByteRawIndex reader and writer from V4 to V5, retain f…
jackluo923 Oct 8, 2024
9812e3e
Deleted FixByteChunkMVForwardIndexReaderV2
jackluo923 Oct 9, 2024
085aed6
Deleted FixByteChunkMVForwardIndexReaderV2Test
jackluo923 Oct 9, 2024
7e1d10c
Improved unit test
jackluo923 Oct 9, 2024
d06dda4
Refactored unit test
jackluo923 Oct 9, 2024
e9835c5
Add blank line
jackluo923 Oct 9, 2024
06c0b95
Remove blank line
jackluo923 Oct 9, 2024
07d6f75
Add blank line
jackluo923 Oct 10, 2024
680fc24
Remove blank line
jackluo923 Oct 10, 2024
1b22234
Removed redundant RuntimeException from method signature
jackluo923 Oct 14, 2024
cfbc9ee
Merge remote-tracking branch 'origin/master-improved-MV-fixed-byte-in…
jackluo923 Oct 15, 2024
0da0ca7
Updated javadoc for VarByteChunkForwardIndexWriterV5
jackluo923 Oct 16, 2024
89ec8af
Addressed code review comments to use `getVersion()` in forward index…
jackluo923 Oct 16, 2024
592967a
Addressed final minor code review suggestion.
jackluo923 Oct 16, 2024
44b0df8
Change getConcreteClassVersion back to getVersion
jackluo923 Oct 17, 2024
6fe4517
Adjusted member variable scope in VarByteChunkForwardIndexWriterV4
jackluo923 Oct 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -76,11 +76,13 @@
public class VarByteChunkForwardIndexWriterV4 implements VarByteChunkWriter {
public static final int VERSION = 4;

private static final Logger LOGGER = LoggerFactory.getLogger(VarByteChunkForwardIndexWriterV4.class);
// Use the run-time concrete class to retrieve the logger
protected final Logger _logger = LoggerFactory.getLogger(this.getClass());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
protected final Logger _logger = LoggerFactory.getLogger(this.getClass());
protected final Logger _logger = LoggerFactory.getLogger(getClass());


private static final String DATA_BUFFER_SUFFIX = ".buf";

private final File _dataBuffer;
private final RandomAccessFile _output;
protected final RandomAccessFile _output;
jackluo923 marked this conversation as resolved.
Show resolved Hide resolved
private final FileChannel _dataChannel;
private final ByteBuffer _chunkBuffer;
private final ByteBuffer _compressionBuffer;
Expand All @@ -105,11 +107,15 @@ public VarByteChunkForwardIndexWriterV4(File file, ChunkCompressionType compress
writeHeader(_chunkCompressor.compressionType(), chunkSize);
}

public int getVersion() {
return VERSION;
}

private void writeHeader(ChunkCompressionType compressionType, int targetDecompressedChunkSize)
throws IOException {
// keep metadata BE for backwards compatibility
// (e.g. the version needs to be read by a factory which assumes BE)
_output.writeInt(VERSION);
_output.writeInt(getVersion());
_output.writeInt(targetDecompressedChunkSize);
_output.writeInt(compressionType.getValue());
// reserve a slot to write the data offset into
Expand Down Expand Up @@ -270,7 +276,7 @@ private void write(ByteBuffer buffer, boolean huge) {
_chunkOffset += compressedSize;
_docIdOffset = _nextDocId;
} catch (IOException e) {
LOGGER.error("Exception caught while compressing/writing data chunk", e);
_logger.error("Exception caught while compressing/writing data chunk", e);
throw new RuntimeException(e);
} finally {
if (mapped != null) {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.apache.pinot.segment.local.io.writer.impl;

import java.io.File;
import java.io.IOException;
import javax.annotation.concurrent.NotThreadSafe;
import org.apache.pinot.segment.local.utils.ArraySerDeUtils;
import org.apache.pinot.segment.spi.compression.ChunkCompressionType;


/**
* Forward index writer that extends {@link VarByteChunkForwardIndexWriterV4} with the only difference being the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the only difference. Let's also document the value format difference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* version tag is now bumped from 4 to 5.
*
* <p>The {@code VERSION} tag is a {@code static final} class variable set to {@code 5}. Since static variables
* are shadowed in the child class thus associated with the class that defines them, care must be taken to ensure
* that the parent class can correctly observe the child class's {@code VERSION} value at runtime.</p>
*
* <p>To achieve this, the {@code getVersion()} method is overridden to return the concrete subclass's
* {@code VERSION} value, ensuring that the correct version number is returned even when using a reference
* to the parent class.</p>
*
* @see VarByteChunkForwardIndexWriterV4
* @see VarByteChunkForwardIndexWriterV5#getVersion()
*/
@NotThreadSafe
public class VarByteChunkForwardIndexWriterV5 extends VarByteChunkForwardIndexWriterV4 {
public static final int VERSION = 5;

public VarByteChunkForwardIndexWriterV5(File file, ChunkCompressionType compressionType, int chunkSize)
throws IOException {
super(file, compressionType, chunkSize);
}

@Override
public int getVersion() {
return VERSION;
}

@Override
public void putIntMV(int[] values) {
putBytes(ArraySerDeUtils.serializeIntArrayWithoutLength(values));
}

@Override
public void putLongMV(long[] values) {
putBytes(ArraySerDeUtils.serializeLongArrayWithoutLength(values));
}

@Override
public void putFloatMV(float[] values) {
putBytes(ArraySerDeUtils.serializeFloatArrayWithoutLength(values));
}

@Override
public void putDoubleMV(double[] values) {
putBytes(ArraySerDeUtils.serializeDoubleArrayWithoutLength(values));
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
import java.io.IOException;
import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkForwardIndexWriter;
import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkForwardIndexWriterV4;
import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkForwardIndexWriterV5;
import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkWriter;
import org.apache.pinot.segment.spi.V1Constants.Indexes;
import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
Expand Down Expand Up @@ -73,18 +74,28 @@ public MultiValueFixedByteRawIndexCreator(File indexFile, ChunkCompressionType c
DataType valueType, int maxNumberOfMultiValueElements, boolean deriveNumDocsPerChunk, int writerVersion,
int targetMaxChunkSizeBytes, int targetDocsPerChunk)
throws IOException {
// Store the length followed by the values
int totalMaxLength = Integer.BYTES + (maxNumberOfMultiValueElements * valueType.getStoredType().size());
if (writerVersion < VarByteChunkForwardIndexWriterV4.VERSION) {
// Store the length followed by the values
int totalMaxLength = Integer.BYTES + (maxNumberOfMultiValueElements * valueType.getStoredType().size());
int numDocsPerChunk = deriveNumDocsPerChunk ? Math.max(targetMaxChunkSizeBytes / (totalMaxLength
+ VarByteChunkForwardIndexWriter.CHUNK_HEADER_ENTRY_ROW_OFFSET_SIZE), 1) : targetDocsPerChunk;
_indexWriter =
new VarByteChunkForwardIndexWriter(indexFile, compressionType, totalDocs, numDocsPerChunk, totalMaxLength,
writerVersion);
} else {
int chunkSize =
ForwardIndexUtils.getDynamicTargetChunkSize(totalMaxLength, targetDocsPerChunk, targetMaxChunkSizeBytes);
_indexWriter = new VarByteChunkForwardIndexWriterV4(indexFile, compressionType, chunkSize);
if (writerVersion == VarByteChunkForwardIndexWriterV5.VERSION) {
// Store only the values
int totalMaxLength = maxNumberOfMultiValueElements * valueType.getStoredType().size();
int chunkSize =
ForwardIndexUtils.getDynamicTargetChunkSize(totalMaxLength, targetDocsPerChunk, targetMaxChunkSizeBytes);
_indexWriter = new VarByteChunkForwardIndexWriterV5(indexFile, compressionType, chunkSize);
} else {
// Store the length followed by the values
int totalMaxLength = Integer.BYTES + (maxNumberOfMultiValueElements * valueType.getStoredType().size());
int chunkSize =
ForwardIndexUtils.getDynamicTargetChunkSize(totalMaxLength, targetDocsPerChunk, targetMaxChunkSizeBytes);
_indexWriter = new VarByteChunkForwardIndexWriterV4(indexFile, compressionType, chunkSize);
}
}
_valueType = valueType;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

import java.util.Arrays;
import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkForwardIndexWriterV4;
import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkForwardIndexWriterV5;
import org.apache.pinot.segment.local.segment.creator.impl.fwd.CLPForwardIndexCreatorV1;
import org.apache.pinot.segment.local.segment.index.readers.forward.CLPForwardIndexReaderV1;
import org.apache.pinot.segment.local.segment.index.readers.forward.FixedBitMVEntryDictForwardIndexReader;
Expand All @@ -30,6 +31,7 @@
import org.apache.pinot.segment.local.segment.index.readers.forward.FixedByteChunkSVForwardIndexReader;
import org.apache.pinot.segment.local.segment.index.readers.forward.FixedBytePower2ChunkSVForwardIndexReader;
import org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkForwardIndexReaderV4;
import org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkForwardIndexReaderV5;
import org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkMVForwardIndexReader;
import org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReader;
import org.apache.pinot.segment.local.segment.index.readers.sorted.SortedIndexReaderImpl;
Expand Down Expand Up @@ -106,7 +108,10 @@ public static ForwardIndexReader createRawIndexReader(PinotDataBuffer dataBuffer
: new FixedByteChunkSVForwardIndexReader(dataBuffer, storedType);
}

if (version == VarByteChunkForwardIndexWriterV4.VERSION) {
if (version == VarByteChunkForwardIndexWriterV5.VERSION) {
// V5 is the same as V4 except the multi-value docs have implicit value count rather than explicit
return new VarByteChunkForwardIndexReaderV5(dataBuffer, storedType, isSingleValue);
} else if (version == VarByteChunkForwardIndexWriterV4.VERSION) {
// V4 reader is common for sv var byte, mv fixed byte and mv var byte
return new VarByteChunkForwardIndexReaderV4(dataBuffer, storedType, isSingleValue);
} else {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,7 @@ public class VarByteChunkForwardIndexReaderV4

public VarByteChunkForwardIndexReaderV4(PinotDataBuffer dataBuffer, FieldSpec.DataType storedType,
boolean isSingleValue) {
int version = dataBuffer.getInt(0);
Preconditions.checkState(version == VarByteChunkForwardIndexWriterV4.VERSION, "Illegal index version: %s", version);
validateIndexVersion(dataBuffer);
jackluo923 marked this conversation as resolved.
Show resolved Hide resolved
_storedType = storedType;
_targetDecompressedChunkSize = dataBuffer.getInt(4);
_chunkCompressionType = ChunkCompressionType.valueOf(dataBuffer.getInt(8));
Expand All @@ -81,6 +80,11 @@ public VarByteChunkForwardIndexReaderV4(PinotDataBuffer dataBuffer, FieldSpec.Da
_isSingleValue = isSingleValue;
}

public void validateIndexVersion(PinotDataBuffer dataBuffer) {
jackluo923 marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant we can add a getVersion() method into this class, and override it in v5 reader

int version = dataBuffer.getInt(0);
Preconditions.checkState(version == VarByteChunkForwardIndexWriterV4.VERSION, "Illegal index version: %s", version);
}

@Override
public boolean isDictionaryEncoded() {
return false;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.apache.pinot.segment.local.segment.index.readers.forward;

import com.google.common.base.Preconditions;
import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkForwardIndexWriterV4;
import org.apache.pinot.segment.local.io.writer.impl.VarByteChunkForwardIndexWriterV5;
import org.apache.pinot.segment.local.utils.ArraySerDeUtils;
import org.apache.pinot.segment.spi.memory.PinotDataBuffer;
import org.apache.pinot.spi.data.FieldSpec;


/**
* Chunk-based raw (non-dictionary-encoded) forward index reader for values of SV variable length data types
* (BIG_DECIMAL, STRING, BYTES), MV fixed length and MV variable length data types.
* <p>For data layout, please refer to the documentation for {@link VarByteChunkForwardIndexWriterV4}
*/
public class VarByteChunkForwardIndexReaderV5 extends VarByteChunkForwardIndexReaderV4 {
public VarByteChunkForwardIndexReaderV5(PinotDataBuffer dataBuffer, FieldSpec.DataType storedType,
boolean isSingleValue) {
super(dataBuffer, storedType, isSingleValue);
}

@Override
public void validateIndexVersion(PinotDataBuffer dataBuffer) {
int version = dataBuffer.getInt(0);
Preconditions.checkState(version == VarByteChunkForwardIndexWriterV5.VERSION, "Illegal index version: %s", version);
}

@Override
public int getIntMV(int docId, int[] valueBuffer, VarByteChunkForwardIndexReaderV4.ReaderContext context) {
return ArraySerDeUtils.deserializeIntArrayWithoutLength(context.getValue(docId), valueBuffer);
}

@Override
public int[] getIntMV(int docId, VarByteChunkForwardIndexReaderV4.ReaderContext context) {
return ArraySerDeUtils.deserializeIntArrayWithoutLength(context.getValue(docId));
}

@Override
public int getLongMV(int docId, long[] valueBuffer, VarByteChunkForwardIndexReaderV4.ReaderContext context) {
return ArraySerDeUtils.deserializeLongArrayWithoutLength(context.getValue(docId), valueBuffer);
}

@Override
public long[] getLongMV(int docId, VarByteChunkForwardIndexReaderV4.ReaderContext context) {
return ArraySerDeUtils.deserializeLongArrayWithoutLength(context.getValue(docId));
}

@Override
public int getFloatMV(int docId, float[] valueBuffer, VarByteChunkForwardIndexReaderV4.ReaderContext context) {
return ArraySerDeUtils.deserializeFloatArrayWithoutLength(context.getValue(docId), valueBuffer);
}

@Override
public float[] getFloatMV(int docId, VarByteChunkForwardIndexReaderV4.ReaderContext context) {
return ArraySerDeUtils.deserializeFloatArrayWithoutLength(context.getValue(docId));
}

@Override
public int getDoubleMV(int docId, double[] valueBuffer, VarByteChunkForwardIndexReaderV4.ReaderContext context) {
return ArraySerDeUtils.deserializeDoubleArrayWithoutLength(context.getValue(docId), valueBuffer);
}

@Override
public double[] getDoubleMV(int docId, VarByteChunkForwardIndexReaderV4.ReaderContext context) {
return ArraySerDeUtils.deserializeDoubleArrayWithoutLength(context.getValue(docId));
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,9 @@

public class MultiValueFixedByteRawIndexCreatorTest {

private static final String OUTPUT_DIR =
System.getProperty("java.io.tmpdir") + File.separator + "mvFixedRawTest";
protected static String _outputDir;

private static final Random RANDOM = new Random();
protected static final Random RANDOM = new Random();

@DataProvider(name = "compressionTypes")
public Object[][] compressionTypes() {
Expand All @@ -64,15 +63,16 @@ public Object[][] compressionTypes() {
@BeforeClass
public void setup()
throws Exception {
FileUtils.forceMkdir(new File(OUTPUT_DIR));
_outputDir = System.getProperty("java.io.tmpdir") + File.separator + "mvFixedRawTest";
FileUtils.forceMkdir(new File(_outputDir));
}

/**
* Clean up after test
*/
@AfterClass
public void cleanup() {
FileUtils.deleteQuietly(new File(OUTPUT_DIR));
FileUtils.deleteQuietly(new File(_outputDir));
}

@Test(dataProvider = "compressionTypes")
Expand Down Expand Up @@ -147,25 +147,34 @@ public void testMVDouble(ChunkCompressionType compressionType, int writerVersion
}, compressionType, writerVersion);
}

public MultiValueFixedByteRawIndexCreator getMultiValueFixedByteRawIndexCreator(ChunkCompressionType compressionType,
String column, int numDocs, DataType dataType, int maxElements, int writerVersion)
throws IOException {
return new MultiValueFixedByteRawIndexCreator(new File(_outputDir), compressionType, column, numDocs, dataType,
maxElements, false, writerVersion, 1024 * 1024, 1000);
}

public ForwardIndexReader getForwardIndexReader(PinotDataBuffer buffer, DataType dataType, int writerVersion) {
return writerVersion == VarByteChunkForwardIndexWriterV4.VERSION ? new VarByteChunkForwardIndexReaderV4(buffer,
dataType.getStoredType(), false) : new FixedByteChunkMVForwardIndexReader(buffer, dataType.getStoredType());
}

public <T> void testMV(DataType dataType, List<T> inputs, ToIntFunction<T> sizeof, IntFunction<T> constructor,
Injector<T> injector, Extractor<T> extractor, ChunkCompressionType compressionType, int writerVersion)
throws IOException {
String column = "testCol_" + dataType;
int numDocs = inputs.size();
int maxElements = inputs.stream().mapToInt(sizeof).max().orElseThrow(RuntimeException::new);
File file = new File(OUTPUT_DIR, column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
File file = new File(_outputDir, column + Indexes.RAW_MV_FORWARD_INDEX_FILE_EXTENSION);
file.delete();
MultiValueFixedByteRawIndexCreator creator =
new MultiValueFixedByteRawIndexCreator(new File(OUTPUT_DIR), compressionType, column, numDocs, dataType,
maxElements, false, writerVersion, 1024 * 1024, 1000);
getMultiValueFixedByteRawIndexCreator(compressionType, column, numDocs, dataType, maxElements, writerVersion);
inputs.forEach(input -> injector.inject(creator, input));
creator.close();

//read
final PinotDataBuffer buffer = PinotDataBuffer.mapFile(file, true, 0, file.length(), ByteOrder.BIG_ENDIAN, "");
ForwardIndexReader reader =
writerVersion == VarByteChunkForwardIndexWriterV4.VERSION ? new VarByteChunkForwardIndexReaderV4(buffer,
dataType.getStoredType(), false) : new FixedByteChunkMVForwardIndexReader(buffer, dataType.getStoredType());
ForwardIndexReader reader = getForwardIndexReader(buffer, dataType, writerVersion);

final ForwardIndexReaderContext context = reader.createContext();
T valueBuffer = constructor.apply(maxElements);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ private <T> void testWriteRead(File file, ChunkCompressionType compressionType,
}
}

private Stream<String> randomStrings(int count, int lengthOfLongestEntry) {
protected Stream<String> randomStrings(int count, int lengthOfLongestEntry) {
return IntStream.range(0, count)
.mapToObj(i -> {
int length = ThreadLocalRandom.current().nextInt(lengthOfLongestEntry);
Expand Down
Loading
Loading