Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allows values to be macro-aware transcoded value-by-value, adds support for creating macro-aware readers from InputStream, and improves testing. #1005

Merged
merged 1 commit into from
Dec 12, 2024

Conversation

tgregg
Copy link
Contributor

@tgregg tgregg commented Dec 3, 2024

Description of changes:

  • Value-by-value transcoding allows for the user to stop transcoding after a certain number of user values have been transcoded. ion-java-benchmark-cli makes use of this when --limit is specified. It also uses it for --ion-flush-period, which requires the writer to be flushed every N values.
  • The reader builder has been generalized to create MacroAwareIonReaders using the same compression and format detection logic as the regular readers. This allows macro-aware transcoding to be performed on both byte[] and InputStream. We now also parameterize the macro-aware transcoding tests in EncodingDirectiveCompilationTest to cover all combinations of input type (byte[] or InputStream) and output format (binary or text). Once support for macro-aware transcoding from text is added, an additional dimension of parameterization will be added for that.
  • I added some tests for transcoding containers with nested literals and macro invocations. The new transcodeValueLIteral method in the reader makes these tests pass by recursively calling transcodeNext() for containers, which ensures that any nested e-expressions will be transcoded as e-expressions and not the expanded values. I've left it as a TODO to find a factoring for this that does not include method recursion, as we've been trying to move away from that. I'm not making that a priority since this is intended as an internal API.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…rt for creating macro-aware readers from InputStream, and improves testing.
@ParameterizedTest(name = "{0},{1}")
@MethodSource("allCombinations")
public void multiValuePartialMacroAwareTranscode(InputType inputType, StreamType outputFormat) throws Exception {
byte[] data = macroInvocationsProduceEncodingDirectivesThatModifyMacroTable(StreamType.BINARY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on macroInvocationsProduceEncodingDirectivesThatModifyMacroTable here adds a lot of specificity to this test but at a remove, which makes the test harder to read/understand. It isn't directly obvious why any of the substringCount values are the number that they are, and determining the correct value from the data generator method is tedious.

If possible it would be great to use smaller test data defined more locally.

Comment on lines +22 to +28
/**
* Prepares the reader to perform a macro-aware transcode to the given
* writer. This must be called before calling [transcodeNext], but is not
* necessary if calling [transcodeAllTo].
* @param writer the writer to which the reader's stream will be transcoded.
*/
fun prepareTranscodeTo(writer: MacroAwareIonWriter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes the prepareTranscodeTo(writer)/transcodeNext() preferable to transcodeTo(writer)? From a glance over it looks like either would work, so I likely missed something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a huge difference, but the current factoring allows for the setup (registering the IVM callback) to happen once, rather than before every value that is transcoded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I see it now. It's really more of a question of whether you know that the callback has been registered, right? So could that state also be carried implicitly, by virtue of a private method that assumes the callback has already been registered and a public method which does the registration then calls the private method? Or am I misunderstanding how this works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like that may be possible. We can follow up in #1015. My main goal with this factoring was to avoid repetitive re-setting of the callback. It's not as easy as "is a callback already registered", because a callback is always registered so that the higher-level reader gets notified when IVMs are encountered. Here, we augment that functionality.

Copy link
Contributor

@jobarr-amzn jobarr-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a backlog item for the feedback here: #1015

@tgregg tgregg merged commit 6722ac7 into ion-11-encoding Dec 12, 2024
17 checks passed
@tgregg tgregg deleted the ion-11-encoding-by-value-transcode branch December 12, 2024 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants