Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restores the previous binary reader implementation's local symbol table handling behavior for several edge cases. #619

Merged
merged 1 commit into from
Oct 20, 2023

Conversation

tgregg
Copy link
Contributor

@tgregg tgregg commented Oct 20, 2023

Description of changes:
Restores previous behavior for three cases, which map to the three modified unit test classes in the diff.

  1. DatagramTest.java: before this PR, the new binary reader always returned an immutable view of its local symbol table. However, there are cases where the reader's symbol table is transferred either to a writer or to an IonDatagram. In those cases the user can add to the symbol table, requiring it to be mutable. All existing tests covered the transfer of a symbol table from a text reader only. The new test tests transferring a symbol table from the binary reader and verifying that subsequent changes to the transferred table are correctly reflected.
  2. IonReaderContinuableTopLevelBinaryTest.java: The old reader threw UnknownSymbolException (a subclass of IonException) when an out-of-range symbol was encountered. The new implementation threw a base IonException. This case was previously only tested by the ion-tests harness, which asserted failure with any IonException. The added tests assert that an UnknownSymbolException is thrown.
  3. IonBinaryWriterBuilderTest.java: The binary writer builder allows users to supply an initial symbol table to be used by the writer. Before the new reader implementation was added, all local symbol tables were instances of the LocalSymbolTable class, so the writer builder could safely cast the provided local SymbolTable to LocalSymbolTable. The new reader implementation added its own SymbolTable implementation (LocalSymbolTableSnapshot) for local symbol tables to achieve better performance in common cases. This implementation would fail the cast to LocalSymbolTable when provided to the builder. The solution was to extract the new _Private_LocalSymbolTable interface to be implemented by both LocalSymbolTable and LocalSymbolTableSnapshot, and make the builder cast to that instead. Existing tests for the builder's setInitialSymbolTable method all constructed a symbol table manually; the new test provides a symbol table retrieved from the binary reader.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov
Copy link

codecov bot commented Oct 20, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Files Coverage Δ
...on/impl/IonReaderContinuableApplicationBinary.java 92.71% <100.00%> (-0.62%) ⬇️
...n/ion/impl/IonReaderContinuableTopLevelBinary.java 94.44% <100.00%> (+0.10%) ⬆️
src/com/amazon/ion/impl/LocalSymbolTable.java 83.39% <ø> (ø)
...azon/ion/impl/_Private_IonBinaryWriterBuilder.java 83.18% <100.00%> (ø)
src/com/amazon/ion/impl/_Private_Utils.java 56.91% <100.00%> (+3.21%) ⬆️

... and 3 files with indirect coverage changes

📢 Thoughts on this report? Let us know!.

if (symbolTableLastTransferred.isLocalTable()) {
// This method is called when transferring the reader's symbol table to either a writer or an IonDatagram.
// Those cases require a mutable copy of the reader's symbol table.
return ((_Private_LocalSymbolTable) symbolTableLastTransferred).makeCopy();
Copy link
Contributor

@popematt popematt Oct 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious—makeCopy() is a synchronized method. Is this in a hot path in the code? Are there any performance concerns with this? If there are and it's feasible, is it worth making a second, unsynchronized copy method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class only deals with _Private_LocalSymbolTable implementations provided by IonReaderContinuableApplicationBinary (i.e., LocalSymbolTableSnapshot), which does not mark this method synchronized. It is synchronized in LocalSymbolTable; I did not change that.

@tgregg
Copy link
Contributor Author

tgregg commented Oct 20, 2023

Since some of the regression detector checks failed, I did a more in-depth run locally and did not observe any impact. Merging.

…le handling behavior for several edge cases.
@tgregg tgregg force-pushed the local-symtab-consistency branch from c763b25 to c7afd21 Compare October 20, 2023 19:08
@tgregg tgregg merged commit 9a74c6a into master Oct 20, 2023
15 of 33 checks passed
@tgregg tgregg deleted the local-symtab-consistency branch October 20, 2023 19:15
Comment on lines +727 to 747
@Test
public void modifySymbolsAfterLoadThenSerialize()
throws Exception
{
IonDatagram dg = system().getLoader().load(TestUtils.ensureBinary(system(), "{foo: bar::baz}".getBytes(StandardCharsets.UTF_8)));
IonStruct struct = (IonStruct) dg.get(0);
IonSymbol baz = (IonSymbol) struct.get("foo");
baz.setTypeAnnotations("bar", "abc");
byte[] serialized = dg.getBytes();
try (IonReader reader = IonReaderBuilder.standard().build(serialized)) {
assertEquals(IonType.STRUCT, reader.next());
reader.stepIn();
assertEquals(IonType.SYMBOL, reader.next());
assertEquals("foo", reader.getFieldName());
String[] annotations = reader.getTypeAnnotations();
assertEquals(2, annotations.length);
assertEquals("bar", annotations[0]);
assertEquals("abc", annotations[1]);
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is named after the setup, I have to wade through it to understand the assertions before I can construct backwards an understanding of what property the test is trying to demonstrate. If possible please name tests such that test failure tells you what behavioral expectation has been violated, not the context in which it has been violated.

Imperfect car analogy: if I want to know why someone failed their license test I want the reason "whenTakingTest_shouldNotExceedTheSpeedLimit" not the reason "runningLateToAnAppointment".

Suggested change
@Test
public void modifySymbolsAfterLoadThenSerialize()
throws Exception
{
IonDatagram dg = system().getLoader().load(TestUtils.ensureBinary(system(), "{foo: bar::baz}".getBytes(StandardCharsets.UTF_8)));
IonStruct struct = (IonStruct) dg.get(0);
IonSymbol baz = (IonSymbol) struct.get("foo");
baz.setTypeAnnotations("bar", "abc");
byte[] serialized = dg.getBytes();
try (IonReader reader = IonReaderBuilder.standard().build(serialized)) {
assertEquals(IonType.STRUCT, reader.next());
reader.stepIn();
assertEquals(IonType.SYMBOL, reader.next());
assertEquals("foo", reader.getFieldName());
String[] annotations = reader.getTypeAnnotations();
assertEquals(2, annotations.length);
assertEquals("bar", annotations[0]);
assertEquals("abc", annotations[1]);
}
}
}
@Test
public void whenSymbolTableIsModifiedAfterLoad_thatChangeSucceedsAndWillBePersisted()
throws Exception
{
IonDatagram dg = system().getLoader().load(TestUtils.ensureBinary("old_annotation::foo"));
IonSymbol foo = (IonSymbol) dg.get(0);
// Changing annotations requires mutation of the symbol table associated with the IonValue above, adds a symbol
foo.setTypeAnnotations("new_annotation");
byte[] serialized = dg.getBytes();
try (IonReader reader = IonReaderBuilder.standard().build(serialized)) {
MatcherAssert.assertThat(reader.next(), Matchers.is(IonType.SYMBOL));
assertEquals("foo", reader.stringValue());
MatcherAssert.assertThat(reader.getTypeAnnotations(), arrayContaining("new_annotation"));
}
}

This test assumes some additions to TestUtils:

    public static byte[] ensureBinary(String ionData) {
        return ensureBinary(ionData.getBytes(StandardCharsets.UTF_8));
    }

    public static byte[] ensureBinary(byte[] ionData)
    {
        if (IonStreamUtils.isIonBinary(ionData)) return ionData;

        return IonSystemBuilder.standard().build()
                .getLoader()
                .load(ionData)
                .getBytes();
    }

Comment on lines +773 to +801
@ParameterizedTest(name = "constructFromBytes={0}")
@ValueSource(booleans = {true, false})
public void unknownSymbolInFieldName(boolean constructFromBytes) throws Exception {
reader = readerFor(constructFromBytes, 0xD3, 0x8A, 0x21, 0x01);
assertSequence(next(IonType.STRUCT), STEP_IN, next(IonType.INT));
assertThrows(UnknownSymbolException.class, reader::getFieldNameSymbol);
assertThrows(UnknownSymbolException.class, reader::getFieldName);
reader.close();
}

@ParameterizedTest(name = "constructFromBytes={0}")
@ValueSource(booleans = {true, false})
public void unknownSymbolInAnnotation(boolean constructFromBytes) throws Exception {
reader = readerFor(constructFromBytes, 0xE4, 0x81, 0x8A, 0x21, 0x01);
assertSequence(next(IonType.INT));
assertThrows(UnknownSymbolException.class, reader::getTypeAnnotationSymbols);
assertThrows(UnknownSymbolException.class, reader::getTypeAnnotations);
reader.close();
}

@ParameterizedTest(name = "constructFromBytes={0}")
@ValueSource(booleans = {true, false})
public void unknownSymbolInValue(boolean constructFromBytes) throws Exception {
reader = readerFor(constructFromBytes, 0x71, 0x0A);
assertSequence(next(IonType.SYMBOL));
assertThrows(UnknownSymbolException.class, reader::symbolValue);
assertThrows(UnknownSymbolException.class, reader::stringValue);
reader.close();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are likewise named after their circumstance rather than behavior, though in this case the behavior and setup are simple enough that it matters less. Your mileage may vary, feel free to take or discard this suggestion:

Suggested change
@ParameterizedTest(name = "constructFromBytes={0}")
@ValueSource(booleans = {true, false})
public void unknownSymbolInFieldName(boolean constructFromBytes) throws Exception {
reader = readerFor(constructFromBytes, 0xD3, 0x8A, 0x21, 0x01);
assertSequence(next(IonType.STRUCT), STEP_IN, next(IonType.INT));
assertThrows(UnknownSymbolException.class, reader::getFieldNameSymbol);
assertThrows(UnknownSymbolException.class, reader::getFieldName);
reader.close();
}
@ParameterizedTest(name = "constructFromBytes={0}")
@ValueSource(booleans = {true, false})
public void unknownSymbolInAnnotation(boolean constructFromBytes) throws Exception {
reader = readerFor(constructFromBytes, 0xE4, 0x81, 0x8A, 0x21, 0x01);
assertSequence(next(IonType.INT));
assertThrows(UnknownSymbolException.class, reader::getTypeAnnotationSymbols);
assertThrows(UnknownSymbolException.class, reader::getTypeAnnotations);
reader.close();
}
@ParameterizedTest(name = "constructFromBytes={0}")
@ValueSource(booleans = {true, false})
public void unknownSymbolInValue(boolean constructFromBytes) throws Exception {
reader = readerFor(constructFromBytes, 0x71, 0x0A);
assertSequence(next(IonType.SYMBOL));
assertThrows(UnknownSymbolException.class, reader::symbolValue);
assertThrows(UnknownSymbolException.class, reader::stringValue);
reader.close();
}
@ParameterizedTest(name = "constructFromBytes={0}")
@ValueSource(booleans = {true, false})
public void inFieldName_accessingUnknownSymbol_throwsUnknownSymbolException(boolean constructFromBytes) throws Exception {
reader = readerFor(constructFromBytes, 0xD3, 0x8A, 0x21, 0x01);
assertSequence(next(IonType.STRUCT), STEP_IN, next(IonType.INT));
assertThrows(UnknownSymbolException.class, reader::getFieldNameSymbol);
assertThrows(UnknownSymbolException.class, reader::getFieldName);
reader.close();
}
@ParameterizedTest(name = "constructFromBytes={0}")
@ValueSource(booleans = {true, false})
public void inAnnotation_accessingUnknownSymbol_throwsUnknownSymbolException(boolean constructFromBytes) throws Exception {
reader = readerFor(constructFromBytes, 0xE4, 0x81, 0x8A, 0x21, 0x01);
assertSequence(next(IonType.INT));
assertThrows(UnknownSymbolException.class, reader::getTypeAnnotationSymbols);
assertThrows(UnknownSymbolException.class, reader::getTypeAnnotations);
reader.close();
}
@ParameterizedTest(name = "constructFromBytes={0}")
@ValueSource(booleans = {true, false})
public void asValue_accessingUnknownSymbol_throwsUnknownSymbolException(boolean constructFromBytes) throws Exception {
reader = readerFor(constructFromBytes, 0x71, 0x0A);
assertSequence(next(IonType.SYMBOL));
assertThrows(UnknownSymbolException.class, reader::symbolValue);
assertThrows(UnknownSymbolException.class, reader::stringValue);
reader.close();
}

throws Exception
{
SymbolTable lst;
try (IonReader reader = IonReaderBuilder.standard().build(TestUtils.ensureBinary(IonSystemBuilder.standard().build(), "foo".getBytes(StandardCharsets.UTF_8)))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struck me as odd, specifically the call to ensureBinary with the construction of a throw-away IonSystem. Why does ensureBinary take an IonSystem as a parameter?

It wasn't clear to me from a quick look that there are any uses where we should care about which IonSystem is used, but I'm sure you're more familiar with the implications here than I am so I'll make that a question rather than a statement. Are there any?

EDIT: In any case this test doesn't need to build a system, we can add a helper in TestUtils which does that as in the example above.

Suggested change
try (IonReader reader = IonReaderBuilder.standard().build(TestUtils.ensureBinary(IonSystemBuilder.standard().build(), "foo".getBytes(StandardCharsets.UTF_8)))) {
try (IonReader reader = IonReaderBuilder.standard().build(TestUtils.ensureBinary("foo"))) {

Comment on lines +323 to +327
IonWriter writer = b.build(out);
assertTrue(symbolTableEquals(lst, writer.getSymbolTable()));

writer = b.build(out);
assertTrue(symbolTableEquals(lst, writer.getSymbolTable()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here? What does repeated writer builds do for us?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants