-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds IonWriter_1_1 interface #656
Conversation
* | ||
* This interface allows the user to write Ion data without being concerned about which output format is being used. | ||
*/ | ||
interface Ion11Writer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can change this if necessary, but we should decide on a convention for naming Ion 1.1 classes/methods and stick with it. I think we've both used _1_1
in some places, which is ugly due to the underscores but at least doesn't look like the number eleven. If we go with 11
as a convention, should it be Ion11Writer
or IonWriter11
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of like Ion11...
, but I'll go with IonWriter_1_1
for now to be consistent with what we already have.
fun writeAnnotations(annotation0: Long, annotation1: Long) | ||
|
||
/** Writes three or more annotations for the next value. */ | ||
fun writeAnnotations(annotation0: Long, annotation1: Long, vararg annotations: Long) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need variants that allow mixing Strings and Longs, for use with the FlexSym encoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I'll add overloads that accept SymbolToken
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SymbolToken
isn't ideal because it may require allocating a new one just to specify that we want text and not a symbol ID, or vice versa. Is there some way to finesse this in Kotlin? Otherwise, maybe the API is addAnnotation
with variants for both String and Long...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could create inline value classes.
@JvmInline value class SymbolText(val text: String): SymbolToken {
override fun getText(): String = text
override fun assumeText(): String = text
override fun getSid(): Int = 0
}
@JvmInline value class SymbolId(val id: Int): SymbolToken {
override fun getText(): String? = null
override fun assumeText(): Nothing = throw UnknownSymbolException(id)
override fun getSid(): Int = id
}
If that's not okay, I can change to something like this.
fun addAnnotation(annotation: CharSequence)
fun addAnnotation(annotation: Long)
fun writeAnnotations()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the inline value classes look like a good start. Let's go with that for now. (Note: the SID for SymbolText
should be -1
(there's an UNKNOWN_SID = -1
constant somewhere), since 0 is a valid SID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the value classes might not play nice with Java. I'll have to double check that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up just adjusting the contract for writeAnnotations
so that it is legal to call more than once preceding a value. I don't want to have a mix of addAnnotations
and writeAnnotations
because it would bloat the number of APIs for writing annotations and because it could be confusing having two different ways of writing annotations with different semantics.
* Writes the field name for the next value. Must be called while in a struct and must be called before [writeAnnotations]. | ||
* @throws com.amazon.ion.IonException if annotations are already written for the value or if not in a struct. | ||
*/ | ||
fun writeFieldName(text: String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When will this be written inline vs as a symbol ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation detail. The raw writer will write the string method as inline field name and the long method as a SID. A configuration-driven, managed writer may choose to intern the field name in the symbol table, but it also depends on what sort of struct the writer is currently in.
fun writeTimestamp(value: Instant) | ||
|
||
fun writeSymbol(id: Long) | ||
fun writeSymbol(text: CharSequence) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use String some places and CharSequence in others here. What's the rationale for each?
When is this written inline vs as a symbol ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The String vs CharSequence issue is something I need to clean up. No rationale for having both.
Re. When inline be symbol ID?
That's an implementation detail. The raw writer will write the string arg as an inline symbol, but a "managed" writer could choose to intern the symbol text and write a numeric ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's quite that simple, because (as with all symbol tokens) the user may want to specify that some symbols should be inlined rather than interned. Or, we need to provide a detection mechanism with some configurable options to allow the user to influence when things should be inlined vs. interned. Is that what you had in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, what I had in mind is configuration driven "managed" writers. I say writers (plural) because I think we''ll want managed writers for both text and binary so that the text writer will also be able to write out templates.
I suspect that users who care about how individual symbols are encoded will probably want to manage their own symbol table too, so they can just use the raw writer, which would give them that control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that users who care about how individual symbols are encoded will probably want to manage their own symbol table too
I don't think that's necessarily true, but I'm happy to move forward for now.
7f36f3e
to
9782d24
Compare
4a2c25b
to
478f534
Compare
Issue #, if available:
Description of changes:
Adds an interface that can be used for raw writers (and extended for other writer types that build on top of it).
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.