Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve block type selection algorithm #100

Open
claudiosdc opened this issue Dec 18, 2020 · 2 comments
Open

Improve block type selection algorithm #100

claudiosdc opened this issue Dec 18, 2020 · 2 comments

Comments

@claudiosdc
Copy link

I am currently developing a library that uses data compression, and to handle that task I have chosen to use the flate2 crate with its default compression backend, miniz_oxide.

While in the process of writing unit tests for my library, I noticed that a particular piece of data was not generating the expected compression result. After further investigation, I realized that the data produced from the compression was made of a single non-compressed data block. That same data, however, when compressed using zlib, produces a different result, which is comprised of one compressed data block.

This can be verified using the code snippet below.

    #[test]
    fn it_compress_issue() {
        let data = r#"{"status":"success","data":{"messageId":"mg9x9vCqYMg9YtKdDwQx"}}"#.as_bytes();

        // Compression using 'miniz_oxide' crate directly
        let compressed_data = miniz_oxide::deflate::compress_to_vec(data, 9);

        assert!(compressed_data.len() > data.len());
        assert_eq!(&compressed_data.as_slice()[5..], data);

        // Compression using 'flate2' crate with 'zlib' feature enabled
        let mut enc = flate2::read::DeflateEncoder::new(data, Compression::default());
        let mut compressed_data_2 = Vec::new();

        enc.read_to_end(&mut compressed_data_2).unwrap();

        assert!(compressed_data_2.len() < data.len());
    }

This might be related to issue #77, I guess.

@oyvindln
Copy link
Collaborator

Yeah it might similar to what causes differences in #77, the block selection algorithm being a bit too dumb. You could check by seeing if you get the same result with the C miniz backend (or C miniz with same settings).

@oyvindln
Copy link
Collaborator

oyvindln commented Jan 2, 2021

Yeah, looked at it a bit, it's due to the simpler block selection algorithm in miniz_oxide (and C miniz). May change it to do a more thorough check like zlib, though it requires a little restructuring.

@oyvindln oyvindln changed the title Deflate produces unexpected non-compressed output Improve block type selection algorithm Feb 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants