v0.0.0-alpha.1
Changed the format to contain tiny batches (256 numbers each) with contiguous 4-way interleaved tANS codes and contiguous offsets. This increased the buffer space needed, but allowed decent CPU utilization during tANS decoding and excellent SIMD utilization during offset decoding, approximately a 30% decompression speedup overall.