This is classic implementation of LZW algorithm based on Mark Nelson's book on
data compression from 1995. It uses variable-width dictionary codes up to
16-bit.
Original development was done in 1996-97.
It has been modified to be a single executable performing both compression and
decompression, taking maximum bit length as argument for Compress2 function.
The program has been tested on Ubuntu 18.04 (with gcc and clang) and on Windows
10 (with Visual Studio 2019).
Running make produces (1) executable `lzw16`; (2) static library `liblzw16.a`
with three exported functions called Compress, Compress2 and Decompress (see
export.h) and (3) test program `lzw_test` statically linked with the above
library.
Type `./lzw16` to see all command line options.
Examples:
`./lzw16 -p sample.txt sample.lzw` (pack sample.txt)
`./lzw16 -u sample.lzw sample_copy.txt` (unpack sample.lzw)
`./lzw16 -tv sample.txt` (test compression/decompression, verbose mode)
`./lzw16 -large` (test synthetic data)
`./lzw16 -b14 -large 10` (test synthetic data size 10 x 256 Kb, use max 14-bit)
'./lzw16 -pv -b12 sample.txt sample.lzw` (pack sample.txt using codes up to
12-bit width, verbose)
1. It supports input files up to 2GB in size.
2. It is currently supported on little-endian machines only.
1. Add big endian support.
2. Input file size limit can be fixed by replacing 4-byte file size header in
compressed file with 48 or 64-bit values.
3. BUFFLEN define (see common.h) which is shared between Compress and Decompress
calls can be included in compressed output file header and then set dynamically
by Decompress.
4. Add functions taking binary buffers as opposed to filenames.
////////////////////////////////////////////////////////////////////////////////
The code can be relatively easily converted to support 17-bit and even larger
maximum width bit encoding. For this, some variables in Compress must be made
32-bit, and Decompress' NOT_CODE value must be more than 16-bits, along with
stack/suffix/prefix arrays.
At the same time, as maximum code width grows, we hit the "law of diminishing
returns" on data compression while also increasing memory requirements, so for
most practical purposes it's not really worth doing.
Original versions written in 1996 had maximum code width of 12-13 bits also in
part due to memory constraints of DOS and early Windows; 15-bit width maximum on
modern computers gives more than adequate compression for LZW while taking
advantage of much larger available memory.
MIT