Skip to content

Commit

Permalink
feat: Add decimal support to integration tester (#361)
Browse files Browse the repository at this point in the history
This PR adds decimal support to the integration test utility. Because
decimal buffers are implemented in the integration test JSON format as
strings containing the integer representation of the decimal, it meant
that nanoarrow needed an implementation of arbitrarily large integer
to/from string. I modified this from Arrow C++ (links in comments next
to the implementation) with a few differences to avoid porting the
complete int128 implementation and the C++ standard library.

- [x] Parse strings containing arbitrarily large integers into decimal
words
- [x] Convert decimal words into arbitrarily large integer strings
- [x] Wire the converters into the integration tester

The gaps in test coverage are from big-endian parts, which I are tested
as part of weekly verification and that I tested locally with:

```shell
export NANOARROW_ARCH=s390x
docker compose run --rm verify
```

With `archery integration --with-cpp=true --with-nanoarrow=true
--run-c-data`, the decimal tests now pass:

<details>

```
##########################################################
C Data Interface: C++ exporting, C++ importing
##########################################################
======================================================================
Testing C ArrowSchema from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_zerolength'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null_trivial'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal256'
======================================================================
======================================================================
Testing C ArrowSchema from file 'datetime'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duration'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval_mdn'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map_non_canonical'
-- Skipping test because producer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'recursive_nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'union'
======================================================================
======================================================================
Testing C ArrowSchema from file 'custom_metadata'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duplicate_fieldnames'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary_unsigned'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'run_end_encoded'
======================================================================
======================================================================
Testing C ArrowSchema from file 'binary_view'
======================================================================
======================================================================
Testing C ArrowSchema from file 'extension'
-- Skipping test because producer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_zerolength'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null_trivial'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal256'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'datetime'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'duration'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval_mdn'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map_non_canonical'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'recursive_nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'union'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'custom_metadata'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'duplicate_fieldnames'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary_unsigned'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'run_end_encoded'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'binary_view'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'extension'
-- Skipping test because producer C++ does not support C ArrowArray
======================================================================
##########################################################
C Data Interface: C++ exporting, nanoarrow importing
##########################################################
======================================================================
Testing C ArrowSchema from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_zerolength'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null_trivial'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal256'
======================================================================
======================================================================
Testing C ArrowSchema from file 'datetime'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duration'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval_mdn'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map_non_canonical'
-- Skipping test because producer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'recursive_nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'union'
======================================================================
======================================================================
Testing C ArrowSchema from file 'custom_metadata'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duplicate_fieldnames'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary_unsigned'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'run_end_encoded'
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 471, in _run_c_schema_test_case
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 454, in do_run
    importer.import_schema_and_compare_to_json(json_path, c_schema_ptr)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 138, in import_schema_and_compare_to_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'
======================================================================
======================================================================
Testing C ArrowSchema from file 'binary_view'
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 471, in _run_c_schema_test_case
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 454, in do_run
    importer.import_schema_and_compare_to_json(json_path, c_schema_ptr)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 138, in import_schema_and_compare_to_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'
======================================================================
======================================================================
Testing C ArrowSchema from file 'extension'
-- Skipping test because producer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_zerolength'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null_trivial'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal256'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'datetime'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'duration'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval_mdn'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map_non_canonical'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'recursive_nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'union'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'custom_metadata'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'duplicate_fieldnames'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary_unsigned'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'run_end_encoded'
... with record batch #0
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 521, in _run_c_array_test_cases
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 501, in do_run
    importer.import_batch_and_compare_to_json(json_path,
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 144, in import_batch_and_compare_to_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'
======================================================================
======================================================================
Testing C ArrowArray from file 'binary_view'
... with record batch #0
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 521, in _run_c_array_test_cases
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 501, in do_run
    importer.import_batch_and_compare_to_json(json_path,
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 144, in import_batch_and_compare_to_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'
======================================================================
======================================================================
Testing C ArrowArray from file 'extension'
-- Skipping test because producer C++ does not support C ArrowArray
======================================================================
##########################################################
C Data Interface: nanoarrow exporting, C++ importing
##########################################################
======================================================================
Testing C ArrowSchema from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_zerolength'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null_trivial'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal256'
======================================================================
======================================================================
Testing C ArrowSchema from file 'datetime'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duration'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval_mdn'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map_non_canonical'
-- Skipping test because consumer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'recursive_nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'union'
======================================================================
======================================================================
Testing C ArrowSchema from file 'custom_metadata'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duplicate_fieldnames'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary_unsigned'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'run_end_encoded'
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 471, in _run_c_schema_test_case
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 453, in do_run
    exporter.export_schema_from_json(json_path, c_schema_ptr)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 117, in export_schema_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'
======================================================================
======================================================================
Testing C ArrowSchema from file 'binary_view'
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 471, in _run_c_schema_test_case
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 453, in do_run
    exporter.export_schema_from_json(json_path, c_schema_ptr)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 117, in export_schema_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'
======================================================================
======================================================================
Testing C ArrowSchema from file 'extension'
-- Skipping test because consumer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_zerolength'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null_trivial'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal256'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'datetime'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'duration'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval_mdn'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map_non_canonical'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'recursive_nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'union'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'custom_metadata'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'duplicate_fieldnames'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary_unsigned'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'run_end_encoded'
... with record batch #0
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 521, in _run_c_array_test_cases
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 498, in do_run
    exporter.export_batch_from_json(json_path,
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 123, in export_batch_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'
======================================================================
======================================================================
Testing C ArrowArray from file 'binary_view'
... with record batch #0
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 521, in _run_c_array_test_cases
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 498, in do_run
    exporter.export_batch_from_json(json_path,
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 123, in export_batch_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'
======================================================================
======================================================================
Testing C ArrowArray from file 'extension'
-- Skipping test because consumer C++ does not support C ArrowArray
======================================================================
##########################################################
C Data Interface: nanoarrow exporting, nanoarrow importing
##########################################################
======================================================================
Testing C ArrowSchema from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_zerolength'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null_trivial'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal256'
======================================================================
======================================================================
Testing C ArrowSchema from file 'datetime'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duration'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval_mdn'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map_non_canonical'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'recursive_nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'union'
======================================================================
======================================================================
Testing C ArrowSchema from file 'custom_metadata'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duplicate_fieldnames'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary_unsigned'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'run_end_encoded'
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 471, in _run_c_schema_test_case
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 453, in do_run
    exporter.export_schema_from_json(json_path, c_schema_ptr)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 117, in export_schema_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'
======================================================================
======================================================================
Testing C ArrowSchema from file 'binary_view'
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 471, in _run_c_schema_test_case
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 453, in do_run
    exporter.export_schema_from_json(json_path, c_schema_ptr)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 117, in export_schema_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'
======================================================================
======================================================================
Testing C ArrowSchema from file 'extension'
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_zerolength'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null_trivial'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal256'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'datetime'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'duration'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval_mdn'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map_non_canonical'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'recursive_nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'union'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'custom_metadata'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'duplicate_fieldnames'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary_unsigned'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'run_end_encoded'
... with record batch #0
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 521, in _run_c_array_test_cases
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 498, in do_run
    exporter.export_batch_from_json(json_path,
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 123, in export_batch_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'
======================================================================
======================================================================
Testing C ArrowArray from file 'binary_view'
... with record batch #0
Traceback (most recent call last):
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 521, in _run_c_array_test_cases
    do_run()
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 498, in do_run
    exporter.export_batch_from_json(json_path,
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 123, in export_batch_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/deweydunnington/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'
======================================================================
======================================================================
Testing C ArrowArray from file 'extension'
... with record batch #0
... with record batch #1
======================================================================


################# FAILURES #################
FAILED TEST: run_end_encoded C++ producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'

FAILED TEST: binary_view C++ producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'

FAILED TEST: run_end_encoded C++ producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'

FAILED TEST: binary_view C++ producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'

FAILED TEST: run_end_encoded nanoarrow producing,  C++ consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'

FAILED TEST: binary_view nanoarrow producing,  C++ consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'

FAILED TEST: run_end_encoded nanoarrow producing,  C++ consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'

FAILED TEST: binary_view nanoarrow producing,  C++ consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'

FAILED TEST: run_end_encoded nanoarrow producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'

FAILED TEST: binary_view nanoarrow producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'

FAILED TEST: run_end_encoded nanoarrow producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'

FAILED TEST: binary_view nanoarrow producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'

12 failures, 9 skips
```

</details
  • Loading branch information
paleolimbot authored Jan 25, 2024
1 parent e0a5d9d commit c4844e3
Show file tree
Hide file tree
Showing 6 changed files with 525 additions and 2 deletions.
11 changes: 11 additions & 0 deletions src/nanoarrow/nanoarrow.h
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,9 @@
NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowBufferDeallocator)
#define ArrowErrorSet NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowErrorSet)
#define ArrowLayoutInit NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowLayoutInit)
#define ArrowDecimalSetDigits NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowDecimalSetDigits)
#define ArrowDecimalAppendDigitsToBuffer \
NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowDecimalAppendDigitsToBuffer)
#define ArrowSchemaInit NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowSchemaInit)
#define ArrowSchemaInitFromType \
NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowSchemaInitFromType)
Expand Down Expand Up @@ -280,6 +283,14 @@ void ArrowLayoutInit(struct ArrowLayout* layout, enum ArrowType storage_type);
/// \brief Create a string view from a null-terminated string
static inline struct ArrowStringView ArrowCharView(const char* value);

/// \brief Sets the integer value of an ArrowDecimal from a string
ArrowErrorCode ArrowDecimalSetDigits(struct ArrowDecimal* decimal,
struct ArrowStringView value);

/// \brief Get the integer value of an ArrowDecimal as string
ArrowErrorCode ArrowDecimalAppendDigitsToBuffer(const struct ArrowDecimal* decimal,
struct ArrowBuffer* buffer);

/// @}

/// \defgroup nanoarrow-schema Creating schemas
Expand Down
68 changes: 68 additions & 0 deletions src/nanoarrow/nanoarrow_testing.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -850,6 +850,13 @@ class TestingJSONWriter {
break;
}

case NANOARROW_TYPE_DECIMAL128:
NANOARROW_RETURN_NOT_OK(WriteDecimalData(out, value, 128));
break;
case NANOARROW_TYPE_DECIMAL256:
NANOARROW_RETURN_NOT_OK(WriteDecimalData(out, value, 256));
break;

default:
// Not supported
return ENOTSUP;
Expand Down Expand Up @@ -935,6 +942,37 @@ class TestingJSONWriter {
}
}

ArrowErrorCode WriteDecimalData(std::ostream& out, const ArrowArrayView* view,
int bitwidth) {
ArrowDecimal value;
ArrowDecimalInit(&value, bitwidth, 0, 0);
nanoarrow::UniqueBuffer tmp;

NANOARROW_RETURN_NOT_OK(WriteDecimalMaybeNull(out, view, 0, &value, tmp.get()));
for (int64_t i = 1; i < view->length; i++) {
out << ", ";
NANOARROW_RETURN_NOT_OK(WriteDecimalMaybeNull(out, view, i, &value, tmp.get()));
}

return NANOARROW_OK;
}

ArrowErrorCode WriteDecimalMaybeNull(std::ostream& out, const ArrowArrayView* view,
int64_t i, ArrowDecimal* decimal,
ArrowBuffer* tmp) {
if (ArrowArrayViewIsNull(view, i)) {
out << R"("0")";
return NANOARROW_OK;
} else {
ArrowArrayViewGetDecimalUnsafe(view, i, decimal);
tmp->size_bytes = 0;
NANOARROW_RETURN_NOT_OK(ArrowDecimalAppendDigitsToBuffer(decimal, tmp));
out << R"(")" << std::string(reinterpret_cast<char*>(tmp->data), tmp->size_bytes)
<< R"(")";
return NANOARROW_OK;
}
}

void WriteString(std::ostream& out, ArrowStringView value) {
out << R"(")";

Expand Down Expand Up @@ -2115,6 +2153,10 @@ class TestingJSONReader {
return SetBufferIntervalDayTime(data, buffer, error);
case NANOARROW_TYPE_INTERVAL_MONTH_DAY_NANO:
return SetBufferIntervalMonthDayNano(data, buffer, error);
case NANOARROW_TYPE_DECIMAL128:
return SetBufferDecimal(data, buffer, 128, error);
case NANOARROW_TYPE_DECIMAL256:
return SetBufferDecimal(data, buffer, 256, error);
default:
ArrowErrorSet(error, "storage type %s DATA buffer not supported",
ArrowTypeString(array_view->storage_type));
Expand Down Expand Up @@ -2379,6 +2421,32 @@ class TestingJSONReader {
return NANOARROW_OK;
}

ArrowErrorCode SetBufferDecimal(const json& value, ArrowBuffer* buffer, int bitwidth,
ArrowError* error) {
NANOARROW_RETURN_NOT_OK(
Check(value.is_array(), error, "decimal buffer must be array"));

ArrowDecimal decimal;
ArrowDecimalInit(&decimal, bitwidth, 0, 0);

ArrowStringView item_view;

for (const auto& item : value) {
NANOARROW_RETURN_NOT_OK(
Check(item.is_string(), error, "decimal buffer item must be string"));
auto item_str = item.get<std::string>();
item_view.data = item_str.data();
item_view.size_bytes = item_str.size();
NANOARROW_RETURN_NOT_OK_WITH_ERROR(ArrowDecimalSetDigits(&decimal, item_view),
error);
NANOARROW_RETURN_NOT_OK_WITH_ERROR(
ArrowBufferAppend(buffer, decimal.words, decimal.n_words * sizeof(uint64_t)),
error);
}

return NANOARROW_OK;
}

void SetArrayAllocatorRecursive(ArrowArray* array) {
for (int i = 0; i < array->n_buffers; i++) {
ArrowArrayBuffer(array, i)->allocator = allocator_;
Expand Down
6 changes: 4 additions & 2 deletions src/nanoarrow/nanoarrow_testing_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1141,9 +1141,11 @@ TEST(NanoarrowTestingTest, NanoarrowTestingTestFieldFixedSizeBinary) {

TEST(NanoarrowTestingTest, NanoarrowTestingTestFieldDecimal) {
TestTypeRoundtrip(
R"({"name": "decimal", "bitWidth": 128, "precision": 10, "scale": 3})");
R"({"name": "decimal", "bitWidth": 128, "precision": 10, "scale": 3})",
R"({"name": null, "count": 3, "VALIDITY": [0, 1, 1], "DATA": ["0", "0", "258"]})");
TestTypeRoundtrip(
R"({"name": "decimal", "bitWidth": 256, "precision": 10, "scale": 3})");
R"({"name": "decimal", "bitWidth": 256, "precision": 10, "scale": 3})",
R"({"name": null, "count": 3, "VALIDITY": [0, 1, 1], "DATA": ["0", "0", "258"]})");

TestTypeError(R"({"name": "decimal", "bitWidth": 123, "precision": 10, "scale": 3})",
"Type[name=='decimal'] bitWidth must be 128 or 256");
Expand Down
22 changes: 22 additions & 0 deletions src/nanoarrow/nanoarrow_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -916,6 +916,28 @@ static inline void ArrowDecimalSetInt(struct ArrowDecimal* decimal, int64_t valu
decimal->words[decimal->low_word_index] = value;
}

/// \brief Negate the value of this decimal in place
/// \ingroup nanoarrow-utils
static inline void ArrowDecimalNegate(struct ArrowDecimal* decimal) {
uint64_t carry = 1;

if (decimal->low_word_index == 0) {
for (int i = 0; i < decimal->n_words; i++) {
uint64_t elem = decimal->words[i];
elem = ~elem + carry;
carry &= (elem == 0);
decimal->words[i] = elem;
}
} else {
for (int i = decimal->low_word_index; i >= 0; i--) {
uint64_t elem = decimal->words[i];
elem = ~elem + carry;
carry &= (elem == 0);
decimal->words[i] = elem;
}
}
}

/// \brief Copy bytes from a buffer into this decimal
/// \ingroup nanoarrow-utils
static inline void ArrowDecimalSetBytes(struct ArrowDecimal* decimal,
Expand Down
199 changes: 199 additions & 0 deletions src/nanoarrow/utils.c
Original file line number Diff line number Diff line change
Expand Up @@ -231,3 +231,202 @@ struct ArrowBufferAllocator ArrowBufferDeallocator(
allocator.private_data = private_data;
return allocator;
}

static const int kInt32DecimalDigits = 9;

static const uint64_t kUInt32PowersOfTen[] = {
1ULL, 10ULL, 100ULL, 1000ULL, 10000ULL,
100000ULL, 1000000ULL, 10000000ULL, 100000000ULL, 1000000000ULL};

// Adapted from Arrow C++ to use 32-bit words for better C portability
// https://github.com/apache/arrow/blob/cd3321b28b0c9703e5d7105d6146c1270bbadd7f/cpp/src/arrow/util/decimal.cc#L524-L544
static void ShiftAndAdd(struct ArrowStringView value, uint32_t* out, int64_t out_size) {
// We use strtoll for parsing, which needs input that is null-terminated
char chunk_string[16];

for (int64_t posn = 0; posn < value.size_bytes;) {
int64_t remaining = value.size_bytes - posn;

int64_t group_size;
if (remaining > kInt32DecimalDigits) {
group_size = kInt32DecimalDigits;
} else {
group_size = remaining;
}

const uint64_t multiple = kUInt32PowersOfTen[group_size];

memcpy(chunk_string, value.data + posn, group_size);
chunk_string[group_size] = '\0';
uint32_t chunk = (uint32_t)strtoll(chunk_string, NULL, 10);

for (int64_t i = 0; i < out_size; i++) {
uint64_t tmp = out[i];
tmp *= multiple;
tmp += chunk;
out[i] = (uint32_t)(tmp & 0xFFFFFFFFULL);
chunk = (uint32_t)(tmp >> 32);
}
posn += group_size;
}
}

ArrowErrorCode ArrowDecimalSetDigits(struct ArrowDecimal* decimal,
struct ArrowStringView value) {
// Check for sign
int is_negative = value.data[0] == '-';
int has_sign = is_negative || value.data[0] == '+';
value.data += has_sign;
value.size_bytes -= has_sign;

// Check all characters are digits that are not the negative sign
for (int64_t i = 0; i < value.size_bytes; i++) {
char c = value.data[i];
if (c < '0' || c > '9') {
return EINVAL;
}
}

// Skip over leading 0s
int64_t n_leading_zeroes = 0;
for (int64_t i = 0; i < value.size_bytes; i++) {
if (value.data[i] == '0') {
n_leading_zeroes++;
} else {
break;
}
}

value.data += n_leading_zeroes;
value.size_bytes -= n_leading_zeroes;

// Use 32-bit words for portability
uint32_t words32[8];
int n_words32 = decimal->n_words * 2;
NANOARROW_DCHECK(n_words32 <= 8);
memset(words32, 0, sizeof(words32));

ShiftAndAdd(value, words32, n_words32);

if (decimal->low_word_index == 0) {
memcpy(decimal->words, words32, sizeof(uint32_t) * n_words32);
} else {
uint64_t lo;
uint64_t hi;

for (int i = 0; i < decimal->n_words; i++) {
lo = (uint64_t)words32[i * 2];
hi = (uint64_t)words32[i * 2 + 1] << 32;
decimal->words[decimal->n_words - i - 1] = lo | hi;
}
}

if (is_negative) {
ArrowDecimalNegate(decimal);
}

return NANOARROW_OK;
}

// Adapted from Arrow C++ for C
// https://github.com/apache/arrow/blob/cd3321b28b0c9703e5d7105d6146c1270bbadd7f/cpp/src/arrow/util/decimal.cc#L365
ArrowErrorCode ArrowDecimalAppendDigitsToBuffer(const struct ArrowDecimal* decimal,
struct ArrowBuffer* buffer) {
int is_negative = ArrowDecimalSign(decimal) < 0;

uint64_t words_little_endian[4];
if (decimal->low_word_index == 0) {
memcpy(words_little_endian, decimal->words, decimal->n_words * sizeof(uint64_t));
} else {
for (int i = 0; i < decimal->n_words; i++) {
words_little_endian[i] = decimal->words[decimal->n_words - i - 1];
}
}

// We've already made a copy, so negate that if needed
if (is_negative) {
uint64_t carry = 1;
for (int i = 0; i < decimal->n_words; i++) {
uint64_t elem = words_little_endian[i];
elem = ~elem + carry;
carry &= (elem == 0);
words_little_endian[i] = elem;
}
}

// Find the most significant word that is non-zero
int most_significant_elem_idx = -1;
for (int i = decimal->n_words - 1; i >= 0; i--) {
if (words_little_endian[i] != 0) {
most_significant_elem_idx = i;
break;
}
}

// If they are all zero, the output is just '0'
if (most_significant_elem_idx == -1) {
NANOARROW_RETURN_NOT_OK(ArrowBufferAppendInt8(buffer, '0'));
return NANOARROW_OK;
}

// Define segments such that each segment represents 9 digits with the
// least significant group of 9 digits first. For example, if the input represents
// 9876543210123456789, then segments will be [123456789, 876543210, 9].
// We handle at most a signed 256 bit integer, whose maximum value occupies 77
// characters. Thus, we need at most 9 segments.
const uint32_t k1e9 = 1000000000U;
int num_segments = 0;
uint32_t segments[9];
memset(segments, 0, sizeof(segments));
uint64_t* most_significant_elem = words_little_endian + most_significant_elem_idx;

do {
// Compute remainder = words_little_endian % 1e9 and words_little_endian =
// words_little_endian / 1e9.
uint32_t remainder = 0;
uint64_t* elem = most_significant_elem;

do {
// Compute dividend = (remainder << 32) | *elem (a virtual 96-bit integer);
// *elem = dividend / 1e9;
// remainder = dividend % 1e9.
uint32_t hi = (uint32_t)(*elem >> 32);
uint32_t lo = (uint32_t)(*elem & 0xFFFFFFFFULL);
uint64_t dividend_hi = ((uint64_t)(remainder) << 32) | hi;
uint64_t quotient_hi = dividend_hi / k1e9;
remainder = (uint32_t)(dividend_hi % k1e9);
uint64_t dividend_lo = ((uint64_t)(remainder) << 32) | lo;
uint64_t quotient_lo = dividend_lo / k1e9;
remainder = (uint32_t)(dividend_lo % k1e9);

*elem = (quotient_hi << 32) | quotient_lo;
} while (elem-- != words_little_endian);

segments[num_segments++] = remainder;
} while (*most_significant_elem != 0 || most_significant_elem-- != words_little_endian);

// We know our output has no more than 9 digits per segment, plus a negative sign,
// plus any further digits between our output of 9 digits plus enough
// extra characters to ensure that snprintf() with n = 21 (maximum length of %lu
// including a the null terminator) is bounded properly.
NANOARROW_RETURN_NOT_OK(ArrowBufferReserve(buffer, num_segments * 9 + 1 + 21 - 9));
if (is_negative) {
buffer->data[buffer->size_bytes++] = '-';
}

// The most significant segment should have no leading zeroes
int n_chars = snprintf((char*)buffer->data + buffer->size_bytes, 21, "%lu",
(unsigned long)segments[num_segments - 1]);
buffer->size_bytes += n_chars;

// Subsequent output needs to be left-padded with zeroes such that each segment
// takes up exactly 9 digits.
for (int i = num_segments - 2; i >= 0; i--) {
int n_chars = snprintf((char*)buffer->data + buffer->size_bytes, 21, "%09lu",
(unsigned long)segments[i]);
buffer->size_bytes += n_chars;
NANOARROW_DCHECK(buffer->size_bytes <= buffer->capacity_bytes);
}

return NANOARROW_OK;
}
Loading

0 comments on commit c4844e3

Please sign in to comment.