Enable auto-flush in ion binary writer. #651

linlin-s · 2023-11-20T20:14:18Z

Issue #, if available:
N/A
Description of changes:
Motivation:
This PR introduces an option for users to enable auto-flush while writing a long stream of binary data. Frequent flushing improves performance by releasing memory pressure. This is achieved by preventing the continual allocation of new blocks, which can negatively impact performance.

The ion-java library currently offers methods for flushing data from buffers, such as writer.flush() and writer.finish(). However, many users are either unaware of these features or do not know when to use them for optimal performance. Typical real-world usage involves writing the entire data stream and then calling close()/finish() without any intermediate flushing. This pattern misses out the potential performance benefits of periodic flushing. To enable users to automatically leverage these benefits, we have added a configuration option to IonBinaryWriterBuilder. This allows users to enable auto-flushing capabilities in the binary writer.

Implementation details:
With auto-flush enabled, the flush operation is executed only between top-level values. The flush is triggered during the block boundary check while writing data into the buffer. When the incoming data exceeds the remaining block size and there are no reusable blocks available in the current buffer, an additional block will still be allocated. However, the flush operation will occur after completing the current top-level value. This approach enables the reuse of allocated blocks, thereby eliminating the need for continuous new block allocation.

Test:
For testing the auto-flush feature, we write predefined structures and calculate the number of values that will exceed the block size. We then compare the outputs between the writer with auto-flush enabled and the writer where flushing is done manually after writing the specific number of values. If the auto-flush is executed correctly, the outputs in both scenarios should be identical.

Benchmark results:
Benchmark a write of data equivalent to a of stream of 59155 nested values using IonWriter(binary). The output data will write into an in-memory buffer. (3 forks, 2 warmups, 2 iterations, preallocation 1)

Benchmark	No Flush	Enable Auto-Flush	Improvements	Units
Bench.run	502.608	492.289	2.05%	ms/op
Bench.run:Heap usage	347.461	302.414	12.96%	MB
Bench.run:Serialized size	21.271	21.272	Neutral	MB

Benchmark a write of data equivalent to a of stream of 194627 nested values using IonWriter(binary). The output data will write into an in-memory buffer. (3 forks, 2 warmups, 2 iterations, preallocation 1)

Benchmark	No Flush	Enable Auto-Flush	Improvements	Units
Bench.run	3980.508	3895.676	2.13%	ms/op
Bench.run:Heap usage	2782.781	2530.251	9.07%	MB
Bench.run:Serialized size	201.663	201.665	Neutral	MB

Benchmark a write of data equivalent to a of stream of 50000 nested values using IonWriter(binary). The output data will write into an in-memory buffer. (3 forks, 2 warmups, 2 iterations, preallocation 1)

Benchmark	No Flush	Enable Auto-Flush	Improvements	Units
Bench.run	764.474	575.393	24.73%	ms/op
Bench.run:Heap usage	1907.824	660.699	65.36%	MB
Bench.run:Serialized size	219	219	Neutral	MB

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

tgregg

This is a good feature. I've added some comments.

Please add tests that write values that are:

Exactly the size of a block
More than twice the size of a block

tgregg · 2023-11-28T01:38:55Z

src/com/amazon/ion/impl/bin/IonRawBinaryWriter.java

+    public boolean autoFlushEnabled;
+    public boolean flushAfterCurrentValue = false;


Do these need to be public? We should prefer default visibility, or private if possible.

tgregg · 2023-11-28T01:39:23Z

src/com/amazon/ion/impl/bin/IonRawBinaryWriter.java

@@ -369,7 +373,8 @@ public PatchPoint clear() {
                                   final StreamCloseMode streamCloseMode,
                                   final StreamFlushMode streamFlushMode,
                                   final PreallocationMode preallocationMode,
-                                   final boolean isFloatBinary32Enabled)
+                                   final boolean isFloatBinary32Enabled,
+                                   final boolean isAutoFlushEnabled, final IonManagedBinaryWriter managedBinaryWriter)


Add a newline before the final parameter for consistency with the existing style.

tgregg · 2023-11-28T01:44:42Z

src/com/amazon/ion/system/IonBinaryWriterBuilder.java

@@ -139,6 +139,7 @@ public IvmMinimizing getIvmMinimizing()
     */
    public abstract SymbolTable getInitialSymbolTable();

+    public abstract _Private_IonBinaryWriterBuilder withAutoFlushEnbaled(boolean autoFlushEnbaled);


This should return IonBinaryWriterBuilder, and it needs a JavaDoc comment like the others in this class.

tgregg · 2023-11-28T01:53:40Z

src/com/amazon/ion/impl/bin/WriteBuffer.java

+                    if (rawBinaryWriter.autoFlushEnabled){
+                        rawBinaryWriter.flushAfterCurrentValue = true;
+                    }


Rather than storing a reference to an IonRawBinaryWriter, we should keep the concerns of these two classes separate by storing a reference to a callback method in this class. The IonRawBinaryWriter would provide this callback to the WriteBuffer upon construction, using something like:

private boolean flushAfterCurrentValue; private WriteBuffer buffer = new WriteBuffer(allocator, this::endOfBlockSizeReached); private void endOfBlockReached() { flushAfterCurrentValue = autoFlushEnabled; }

And here you'd just invoke the callback method.

This will also help you clean up WriteBufferTest, because you can provide a test callback without having to instantiate a raw writer.

tgregg · 2023-11-28T01:58:41Z

src/com/amazon/ion/impl/bin/IonRawBinaryWriter.java

@@ -730,6 +741,10 @@ private void finishValue()
        }
        hasWrittenValuesSinceFinished = true;
        hasWrittenValuesSinceConstructed = true;
+        if (this.flushAfterCurrentValue && depth == 0) {
+            managedBinaryWriter.flush();


Rather than storing a reference to an IonManagedBinaryWriter, we should keep the concerns of these two classes separate by storing a reference to a callback method in this class. The IonManagedBinaryWriter would provide this callback to the IonRawBinaryWriter upon construction, using something like:

this.user = new IonRawBinaryWriter(..., this::flush);

And here you'd invoke that callback instead of calling the managed writer's flush() method directly.

tgregg · 2023-11-28T02:02:24Z

src/com/amazon/ion/impl/bin/IonRawBinaryWriter.java

@@ -420,6 +427,10 @@ public void setFieldNameSymbol(final SymbolToken name)
        setFieldNameSymbol(name.getSid());
    }

+    public WriteBuffer getCurrentBuffer() {


It looks like this can have default visibility, instead of public.

It looks like this can have default visibility, instead of public.

Thanks for catching this, will set to default in the next commit.

tgregg · 2023-11-28T02:07:47Z

test/com/amazon/ion/impl/bin/IonManagedBinaryWriterTest.java

+        while (reader.next() != null) {
+            actualWriter.writeValue(reader);
+        }
+        actualWriter.finish();


Should this be actualWriter.close();?

tgregg · 2023-11-28T02:09:15Z

test/com/amazon/ion/impl/bin/IonManagedBinaryWriterTest.java

+        if (lstAppendMode.isEnabled() && autoFlushMode.isEnabled()) {
+            assertArrayEquals(actual.toByteArray(), expectedOut.toByteArray());
+        }


Let's add an assertion after this branch to make sure that in all cases actual and expectedOut are data model equivalent.

tgregg · 2023-11-30T19:52:52Z

src/com/amazon/ion/impl/bin/WriteBuffer.java

@@ -30,20 +30,20 @@
    private final List<Block> blocks;
    private Block current;
    private int index;
-    private IonRawBinaryWriter rawBinaryWriter;
+    private Runnable action;


This needs a more descriptive name. When is the action invoked? (When the end of a block is reached.) Consider something like onEndOfBlock or endOfBlockCallback.

Sounds good, will update in the next commit.

tgregg · 2023-11-30T19:58:14Z

src/com/amazon/ion/system/IonBinaryWriterBuilder.java

+    /**
+     * Enables the automatic execution of flush operations. This functionality disabled by default.
+     * @param autoFlushEnabled A boolean parameter indicating whether this functionality is enabled or not.
+     */


This needs to describe when the auto-flush occurs, going into detail about the relationship to block size, and linking to the option that allows the user to configure that block size.

tgregg · 2023-11-30T20:08:04Z

src/com/amazon/ion/impl/bin/IonManagedBinaryWriter.java

+            try {
+                unsafeFlush();
+            } catch (IOException e) {
+                throw new RuntimeException(e);
+            }


We should not change the exception throwing behavior here. We may need to define our own Functional Interface, like

@FunctionalInterface class ThrowingRunnable() { void run() throws IOException; }

this::flush() should be able to conform to that without changing its signature.

tgregg · 2023-11-30T20:09:05Z

src/com/amazon/ion/impl/bin/IonRawBinaryWriter.java

-    public boolean flushAfterCurrentValue = false;
+    boolean autoFlushEnabled;
+    boolean flushAfterCurrentValue;
+    Runnable action;


This needs a more descriptive name, like onAutoFlush.

tgregg · 2023-11-30T20:12:32Z

test/com/amazon/ion/impl/bin/WriteBufferTest.java

+    @Test
+    public void testEndOfBufferReachedInvoked() throws UnsupportedEncodingException {
+        buf.writeBytes("taco".getBytes("UTF-8"));
+        buf.writeBytes("_burrito".getBytes("UTF-8"));
+        assertTrue(endOfBufferReached.get());
+    }
+
+    @Test
+    public void testEndOfBufferReachedNotInvoked() throws UnsupportedEncodingException {
+        buf.writeBytes("taco".getBytes("UTF-8"));
+        buf.writeBytes("burrito".getBytes("UTF-8"));
+        assertFalse(endOfBufferReached.get());
+    }


Please add some comments to these tests. It's not immediately clear why adding the underscore changes the behavior. Consider noting the block size, and add assertions before and after the boundary to verify the transition happens exactly when expected.

Please add some comments to these tests. It's not immediately clear why adding the underscore changes the behavior. Consider noting the block size, and add assertions before and after the boundary to verify the transition happens exactly when expected.

Sure, I will add the comment in the next commit. If we are adding assert before and after the boundary, we might not need two unit tests.

tgregg · 2023-11-30T20:17:41Z

test/com/amazon/ion/impl/bin/IonManagedBinaryWriterTest.java

        if (lstAppendMode.isEnabled() && autoFlushMode.isEnabled()) {
            assertArrayEquals(actual.toByteArray(), expectedOut.toByteArray());
        }
+        assertArrayEquals(actual.toByteArray(), expectedOut.toByteArray());


This is the same assertion from line 222. This should fail if auto flush is disabled, but I see that we're not actually running this test with auto-flush disabled. Consider using @ParameterizedTest to pass in both true and false for the value of withAutoFlushEnabled, then change this to an assertion of data model equality (i.e. using Equivalence.ionEquals). That way we can assert different behavior based on whether auto-flush is enabled, but verify in both cases that the streams are data model equivalent, which is what ultimately matters to the user.

tgregg · 2023-11-30T20:18:08Z

test/com/amazon/ion/impl/bin/IonManagedBinaryWriterTest.java

+        if (lstAppendMode.isEnabled() && autoFlushMode.isEnabled()) {
+            assertArrayEquals(actual.toByteArray(), expectedOut_32K.toByteArray());
+        }
+        assertArrayEquals(actual.toByteArray(), expectedOut_32K.toByteArray());


Same comment here regarding test parameterization and data model equality.

tgregg · 2023-11-30T20:18:18Z

test/com/amazon/ion/impl/bin/IonManagedBinaryWriterTest.java

+        if (lstAppendMode.isEnabled() && autoFlushMode.isEnabled()) {
+            assertArrayEquals(actual.toByteArray(), expectedOut_67K.toByteArray());
+        }
+        assertArrayEquals(actual.toByteArray(), expectedOut_67K.toByteArray());


Same comment here regarding test parameterization and data model equality.

tgregg · 2023-11-30T20:21:28Z

test/com/amazon/ion/impl/bin/IonManagedBinaryWriterTest.java

+        // Writing test data with continuous extending symbol table. The total data written in user's block is 32K.
+        IonWriter defaultWriter = IonBinaryWriterBuilder.standard().build(source_32K);
+        int i = 0;
+        while (i < 2990) {


How was 2990 chosen? A comment explaining the math behind it might help.

tgregg · 2023-11-30T20:22:33Z

test/com/amazon/ion/impl/bin/IonManagedBinaryWriterTest.java

+            defaultWriter.stepOut();
+            i++;
+        }
+        while (i >= 3200 && i < 6400) {


How were 3200 and 6400 chosen?

tgregg · 2023-12-05T00:30:29Z

src/com/amazon/ion/system/IonBinaryWriterBuilder.java

+     * Additionally, setting a larger block size can further tune performance when auto-flush is enabled.
+     * A larger block size leads to fewer block allocations and reduces the frequency of flush operations when auto-flush is enabled. {@link #withBlockSize(int) Here} is where you can set up the
+     * block size of write buffer.
+     * Auto-flush disabled by default and the default block size is 32K.


Suggested change

* Auto-flush disabled by default and the default block size is 32K.

* Auto-flush is disabled by default and the default block size is 32K.

tgregg · 2023-12-05T00:31:01Z

test/com/amazon/ion/impl/bin/IonManagedBinaryWriterTest.java

+
+    /**
+     * In test data, the field names are generated by appending a continuously increasing integer to the string "taco", resulting in names like "taco0", "taco1", and so on.
+     * These filed names are paired with symbol IDs and then stored in then symbol table during the writing process.


Suggested change

* These filed names are paired with symbol IDs and then stored in then symbol table during the writing process.

* These field names are paired with symbol IDs and then stored in then symbol table during the writing process.

tgregg · 2023-12-05T00:50:25Z

test/com/amazon/ion/impl/bin/IonManagedBinaryWriterTest.java

+        assertEquals(actualDatagram.size(), expectedDatagram.size());
+        for (int i = 0; i < actualDatagram.size(); i++) {
+            assertTrue(Equivalence.ionEquals(actualDatagram.get(i), expectedDatagram.get(i)));
+        }


Can you replace this with assertEquals(expectedDatagram, actualDatagram);?

Thanks for the suggestion. I will replace it in the next commit.

tgregg · 2023-12-05T00:51:03Z

test/com/amazon/ion/impl/bin/IonManagedBinaryWriterTest.java

+        }
+        actualWriter.close();
+        if (lstAppendMode.isEnabled() && autoFlushMode.isEnabled()) {
+            assertEquivalentDataModel(actual, expectedOut_32K);


Here you actually do want to compare the bytes, right? To ensure the auto-flush occurred.

tgregg · 2023-12-05T00:51:40Z

test/com/amazon/ion/impl/bin/IonManagedBinaryWriterTest.java

+        }
+        actualWriter.close();
+        if (lstAppendMode.isEnabled() && autoFlushMode.isEnabled()) {
+            assertEquivalentDataModel(actual, expectedOut_67K);


Here you actually do want to compare the bytes, right? To ensure the auto-flush occurred.

tgregg · 2023-12-05T00:51:48Z

test/com/amazon/ion/impl/bin/IonManagedBinaryWriterTest.java

+        }
+        actualWriter.close();
+        if (lstAppendMode.isEnabled() && autoFlushMode.isEnabled()) {
+            assertEquivalentDataModel(actual, expectedOut);


Here you actually do want to compare the bytes, right? To ensure the auto-flush occurred.

Yes, we should include both bytes comparison and data model comparison in the test. I will add the bytes comparison in the next commit.

…of a block.

tgregg reviewed Nov 28, 2023

View reviewed changes

tgregg reviewed Nov 30, 2023

View reviewed changes

linlin-s added 2 commits December 1, 2023 22:48

Enable auto-flush in ion binary writer.

7071298

Updates based on the comments.

4d37ef0

linlin-s force-pushed the auto-flush branch from 0432863 to f953f5c Compare December 2, 2023 20:41

Updates based on comments.

bb4b42c

linlin-s force-pushed the auto-flush branch from f953f5c to bb4b42c Compare December 2, 2023 22:30

tgregg approved these changes Dec 5, 2023

View reviewed changes

linlin-s marked this pull request as ready for review December 5, 2023 18:35

linlin-s added 2 commits December 5, 2023 11:41

Adds comparison of encoding to autoflush tests.

f204db7

Adds tests for single value in one block size and more than the size …

7a22f15

…of a block.

tgregg approved these changes Dec 5, 2023

View reviewed changes

linlin-s merged commit 3db3866 into set-block-size Dec 5, 2023
15 of 27 checks passed

linlin-s added a commit that referenced this pull request Dec 5, 2023

Enable auto-flush in ion binary writer. (#651)

d3760a5

linlin-s deleted the auto-flush branch January 16, 2024 19:24

tgregg mentioned this pull request Jan 18, 2024

Bumps version to 1.11.2-SNAPSHOT #701

Merged

tgregg mentioned this pull request Feb 9, 2024

Bumps version to 1.11.3-SNAPSHOT #720

Merged

tgregg mentioned this pull request Feb 22, 2024

Bumps version to 1.11.4-SNAPSHOT #753

Merged

tgregg mentioned this pull request Mar 1, 2024

Bumps version to 1.11.5-SNAPSHOT #760

Merged

This was referenced Apr 23, 2024

Bumps version to 1.11.6-SNAPSHOT #810

Closed

Bumps version to 1.11.8-SNAPSHOT. #815

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable auto-flush in ion binary writer. #651

Enable auto-flush in ion binary writer. #651

linlin-s commented Nov 20, 2023 •

edited

Loading

tgregg left a comment

tgregg Nov 28, 2023

tgregg Nov 28, 2023

tgregg Nov 28, 2023

tgregg Nov 28, 2023

tgregg Nov 28, 2023

tgregg Nov 28, 2023

linlin-s Nov 30, 2023

tgregg Nov 28, 2023

tgregg Nov 28, 2023

tgregg Nov 30, 2023

linlin-s Nov 30, 2023

tgregg Nov 30, 2023

tgregg Nov 30, 2023

tgregg Nov 30, 2023

tgregg Nov 30, 2023

linlin-s Nov 30, 2023

tgregg Nov 30, 2023

tgregg Nov 30, 2023

tgregg Nov 30, 2023

tgregg Nov 30, 2023

tgregg Nov 30, 2023

tgregg Dec 5, 2023

tgregg Dec 5, 2023

tgregg Dec 5, 2023

linlin-s Dec 5, 2023

tgregg Dec 5, 2023

tgregg Dec 5, 2023

tgregg Dec 5, 2023

linlin-s Dec 5, 2023

		public boolean autoFlushEnabled;
		public boolean flushAfterCurrentValue = false;

	* Auto-flush disabled by default and the default block size is 32K.
	* Auto-flush is disabled by default and the default block size is 32K.

	* These filed names are paired with symbol IDs and then stored in then symbol table during the writing process.
	* These field names are paired with symbol IDs and then stored in then symbol table during the writing process.

Enable auto-flush in ion binary writer. #651

Enable auto-flush in ion binary writer. #651

Conversation

linlin-s commented Nov 20, 2023 • edited Loading

tgregg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linlin-s commented Nov 20, 2023 •

edited

Loading