Allows values to be macro-aware transcoded value-by-value, adds support for creating macro-aware readers from InputStream, and improves testing. #1005

tgregg · 2024-12-03T22:27:10Z

Description of changes:

Value-by-value transcoding allows for the user to stop transcoding after a certain number of user values have been transcoded. ion-java-benchmark-cli makes use of this when --limit is specified. It also uses it for --ion-flush-period, which requires the writer to be flushed every N values.
The reader builder has been generalized to create MacroAwareIonReaders using the same compression and format detection logic as the regular readers. This allows macro-aware transcoding to be performed on both byte[] and InputStream. We now also parameterize the macro-aware transcoding tests in EncodingDirectiveCompilationTest to cover all combinations of input type (byte[] or InputStream) and output format (binary or text). Once support for macro-aware transcoding from text is added, an additional dimension of parameterization will be added for that.
I added some tests for transcoding containers with nested literals and macro invocations. The new transcodeValueLIteral method in the reader makes these tests pass by recursively calling transcodeNext() for containers, which ensures that any nested e-expressions will be transcoded as e-expressions and not the expanded values. I've left it as a TODO to find a factoring for this that does not include method recursion, as we've been trying to move away from that. I'm not making that a priority since this is intended as an internal API.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…rt for creating macro-aware readers from InputStream, and improves testing.

jobarr-amzn · 2024-12-09T22:50:40Z

src/test/java/com/amazon/ion/impl/EncodingDirectiveCompilationTest.java

+    @ParameterizedTest(name = "{0},{1}")
+    @MethodSource("allCombinations")
+    public void multiValuePartialMacroAwareTranscode(InputType inputType, StreamType outputFormat) throws Exception {
+        byte[] data = macroInvocationsProduceEncodingDirectivesThatModifyMacroTable(StreamType.BINARY);


Depending on macroInvocationsProduceEncodingDirectivesThatModifyMacroTable here adds a lot of specificity to this test but at a remove, which makes the test harder to read/understand. It isn't directly obvious why any of the substringCount values are the number that they are, and determining the correct value from the data generator method is tedious.

If possible it would be great to use smaller test data defined more locally.

jobarr-amzn · 2024-12-09T23:17:49Z

src/main/java/com/amazon/ion/MacroAwareIonReader.kt

+    /**
+     * Prepares the reader to perform a macro-aware transcode to the given
+     * writer. This must be called before calling [transcodeNext], but is not
+     * necessary if calling [transcodeAllTo].
+     * @param writer the writer to which the reader's stream will be transcoded.
+     */
+    fun prepareTranscodeTo(writer: MacroAwareIonWriter)


What makes the prepareTranscodeTo(writer)/transcodeNext() preferable to transcodeTo(writer)? From a glance over it looks like either would work, so I likely missed something.

It's not a huge difference, but the current factoring allows for the setup (registering the IVM callback) to happen once, rather than before every value that is transcoded.

That makes sense, I see it now. It's really more of a question of whether you know that the callback has been registered, right? So could that state also be carried implicitly, by virtue of a private method that assumes the callback has already been registered and a public method which does the registration then calls the private method? Or am I misunderstanding how this works?

Something like that may be possible. We can follow up in #1015. My main goal with this factoring was to avoid repetitive re-setting of the callback. It's not as easy as "is a callback already registered", because a callback is always registered so that the higher-level reader gets notified when IVMs are encountered. Here, we augment that functionality.

jobarr-amzn

Created a backlog item for the feedback here: #1015

Allows values to be macro-aware transcoded value-by-value, adds suppo…

8589c4f

…rt for creating macro-aware readers from InputStream, and improves testing.

tgregg mentioned this pull request Dec 9, 2024

Adds support for macro-aware transcoding from text. #1010

Merged

jobarr-amzn reviewed Dec 9, 2024

View reviewed changes

jobarr-amzn approved these changes Dec 12, 2024

View reviewed changes

tgregg merged commit 6722ac7 into ion-11-encoding Dec 12, 2024
17 checks passed

tgregg deleted the ion-11-encoding-by-value-transcode branch December 12, 2024 18:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allows values to be macro-aware transcoded value-by-value, adds support for creating macro-aware readers from InputStream, and improves testing. #1005

Allows values to be macro-aware transcoded value-by-value, adds support for creating macro-aware readers from InputStream, and improves testing. #1005

tgregg commented Dec 3, 2024

jobarr-amzn Dec 9, 2024

jobarr-amzn Dec 9, 2024

tgregg Dec 9, 2024

jobarr-amzn Dec 10, 2024

tgregg Dec 12, 2024

jobarr-amzn left a comment

Allows values to be macro-aware transcoded value-by-value, adds support for creating macro-aware readers from InputStream, and improves testing. #1005

Allows values to be macro-aware transcoded value-by-value, adds support for creating macro-aware readers from InputStream, and improves testing. #1005

Conversation

tgregg commented Dec 3, 2024

jobarr-amzn Dec 9, 2024

Choose a reason for hiding this comment

jobarr-amzn Dec 9, 2024

Choose a reason for hiding this comment

tgregg Dec 9, 2024

Choose a reason for hiding this comment

jobarr-amzn Dec 10, 2024

Choose a reason for hiding this comment

tgregg Dec 12, 2024

Choose a reason for hiding this comment

jobarr-amzn left a comment

Choose a reason for hiding this comment