Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds IonWriter_1_1 interface #656

Merged
merged 4 commits into from
Nov 27, 2023

Conversation

popematt
Copy link
Contributor

Issue #, if available:

Description of changes:

Adds an interface that can be used for raw writers (and extended for other writer types that build on top of it).

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

*
* This interface allows the user to write Ion data without being concerned about which output format is being used.
*/
interface Ion11Writer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change this if necessary, but we should decide on a convention for naming Ion 1.1 classes/methods and stick with it. I think we've both used _1_1 in some places, which is ugly due to the underscores but at least doesn't look like the number eleven. If we go with 11 as a convention, should it be Ion11Writer or IonWriter11?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of like Ion11..., but I'll go with IonWriter_1_1 for now to be consistent with what we already have.

fun writeAnnotations(annotation0: Long, annotation1: Long)

/** Writes three or more annotations for the next value. */
fun writeAnnotations(annotation0: Long, annotation1: Long, vararg annotations: Long)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need variants that allow mixing Strings and Longs, for use with the FlexSym encoding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I'll add overloads that accept SymbolTokens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SymbolToken isn't ideal because it may require allocating a new one just to specify that we want text and not a symbol ID, or vice versa. Is there some way to finesse this in Kotlin? Otherwise, maybe the API is addAnnotation with variants for both String and Long...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could create inline value classes.

@JvmInline value class SymbolText(val text: String): SymbolToken {
    override fun getText(): String = text
    override fun assumeText(): String = text
    override fun getSid(): Int = 0
}
@JvmInline value class SymbolId(val id: Int): SymbolToken {
    override fun getText(): String? = null
    override fun assumeText(): Nothing = throw UnknownSymbolException(id)
    override fun getSid(): Int = id
}

If that's not okay, I can change to something like this.

fun addAnnotation(annotation: CharSequence)
fun addAnnotation(annotation: Long)
fun writeAnnotations()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the inline value classes look like a good start. Let's go with that for now. (Note: the SID for SymbolText should be -1 (there's an UNKNOWN_SID = -1 constant somewhere), since 0 is a valid SID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the value classes might not play nice with Java. I'll have to double check that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up just adjusting the contract for writeAnnotations so that it is legal to call more than once preceding a value. I don't want to have a mix of addAnnotations and writeAnnotations because it would bloat the number of APIs for writing annotations and because it could be confusing having two different ways of writing annotations with different semantics.

* Writes the field name for the next value. Must be called while in a struct and must be called before [writeAnnotations].
* @throws com.amazon.ion.IonException if annotations are already written for the value or if not in a struct.
*/
fun writeFieldName(text: String)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will this be written inline vs as a symbol ID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation detail. The raw writer will write the string method as inline field name and the long method as a SID. A configuration-driven, managed writer may choose to intern the field name in the symbol table, but it also depends on what sort of struct the writer is currently in.

fun writeTimestamp(value: Instant)

fun writeSymbol(id: Long)
fun writeSymbol(text: CharSequence)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use String some places and CharSequence in others here. What's the rationale for each?

When is this written inline vs as a symbol ID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The String vs CharSequence issue is something I need to clean up. No rationale for having both.

Re. When inline be symbol ID?
That's an implementation detail. The raw writer will write the string arg as an inline symbol, but a "managed" writer could choose to intern the symbol text and write a numeric ID.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's quite that simple, because (as with all symbol tokens) the user may want to specify that some symbols should be inlined rather than interned. Or, we need to provide a detection mechanism with some configurable options to allow the user to influence when things should be inlined vs. interned. Is that what you had in mind?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what I had in mind is configuration driven "managed" writers. I say writers (plural) because I think we''ll want managed writers for both text and binary so that the text writer will also be able to write out templates.

I suspect that users who care about how individual symbols are encoded will probably want to manage their own symbol table too, so they can just use the raw writer, which would give them that control.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that users who care about how individual symbols are encoded will probably want to manage their own symbol table too

I don't think that's necessarily true, but I'm happy to move forward for now.

@popematt popematt changed the title Adds Ion11Writer interface Adds IonWriter_1_1 interface Nov 22, 2023
@popematt popematt merged commit 085ea3f into amazon-ion:ion-11-encoding Nov 27, 2023
17 of 27 checks passed
tgregg pushed a commit that referenced this pull request Jan 22, 2024
tgregg pushed a commit that referenced this pull request Feb 10, 2024
tgregg pushed a commit that referenced this pull request May 2, 2024
tgregg pushed a commit that referenced this pull request Jun 28, 2024
tgregg pushed a commit that referenced this pull request Sep 9, 2024
tgregg pushed a commit that referenced this pull request Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants