Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blosc filters fail silently if not enabled through 'blosc-src' #52

Open
kevin-servaenergy opened this issue Jan 23, 2025 · 1 comment
Open

Comments

@kevin-servaenergy
Copy link

kevin-servaenergy commented Jan 23, 2025

See aldanor#273

Even with 'blosc' feature enabled, using blosc compression to write dataset does nothing for most blosc filters. Datasets written with no filter and with blosc filters were the same size. blosc_blosclz was the only one that did anything.

It took me a little bit to notice this and I only discovered this when I manually created test datasets to compare.
My assumption was that including the blosc feature makes all filters available as the enums and methods were defined, but that is not the case.
As this issue can go unnoticed easily, I thought it would be good to mention this.

These items made it confusing to diagnose.

  • Filters are still reported with hdf5_metno::hl::dataset::Dataset::filters()
  • hdf5_metno::filters::blosc_available() returns true
  • No warnings or errors
  • No way to check if filters are available unless blosc-src is explicitly listed as dependency

I resolved the issue by explicitly listing blosc-src as a dependency

hdf5 = { version = "0.9.3", features = ["zlib", "blosc", "lzf", "f16", "complex"], package = "hdf5-metno" }
# Have to add this line
blosc-sys = { version = "0.3", package = "blosc-src", features = ["lz4", "zlib", "zstd", "snappy"]}

While this works, I'd prefer also not to have to explicitly list a dependency that I'm not actually using, and would rather use features like:

blosc-lz4 = ["blosc-sys/lz4"]
blosc-zlib = ["blosc-sys/zlib"]
blosc-zstd = ["blosc-sys/zstd"]
blosc-snappy = ["blosc-sys/snappy"]
blosc-all = ["blosc", "blosc-lz4", "blosc-zlib", "blosc-zstd", "blosc-snappy"]

The associated enums and methods for each filter should also not be defined if the specific filter is not available. I feel that this would make users better understand that the filter is not available.

To Reproduce:

main.rs

fn main() -> std::io::Result<()> {
    let filename = std::path::PathBuf::from("test.hdf5");
    let f = hdf5::File::create(filename.clone())?;
    let grp = f.create_group("test")?;
    let data = ndarray::Array1::from_elem(1024, 1u16);
    let print_ds = |d: hdf5::Dataset| {
        println!(
            "DS: {}\tSize: {}\tFilters: {:?}",
            d.name(),
            d.storage_size(),
            d.filters()
        );
    };

    println!(
        "Blosc Available: {}; nthreads: {}",
        hdf5::filters::blosc_available(),
        hdf5::filters::blosc_get_nthreads()
    );
    let ds = grp.new_dataset_builder().with_data(&data).create("raw")?;
    print_ds(ds);

    let ds = grp
        .new_dataset_builder()
        .with_data(&data)
        .lzf()
        .create("lzf")?;
    print_ds(ds);

    let ds = grp
        .new_dataset_builder()
        .with_data(&data)
        .blosc_blosclz(9, true)
        .create("blosclz")?;
    print_ds(ds);

    let ds = grp
        .new_dataset_builder()
        .with_data(&data)
        .blosc_lz4(9, true)
        .create("blosc-lz4")?;
    print_ds(ds);

    let ds = grp
        .new_dataset_builder()
        .with_data(&data)
        .blosc_lz4hc(9, true)
        .create("blosc-lz4hc")?;
    print_ds(ds);

    let ds = grp
        .new_dataset_builder()
        .with_data(&data)
        .blosc_snappy(9, true)
        .create("blosc-snappy")?;
    print_ds(ds);

    let ds = grp
        .new_dataset_builder()
        .with_data(&data)
        .blosc_zlib(9, true)
        .create("blosc-zlib")?;
    print_ds(ds);

    let ds = grp
        .new_dataset_builder()
        .with_data(&data)
        .blosc_zstd(9, true)
        .create("blosc-zstd")?;
    print_ds(ds);

    f.close()?;
    std::fs::remove_file(filename)?;
    Ok(())
}

Cargo.toml

[dependencies]
hdf5 = { version = "0.9.3", features = ["zlib", "blosc", "lzf", "f16", "complex"], package = "hdf5-metno" }

Output

Blosc Available: true; nthreads: 1
DS: /test/raw   Size: 2048      Filters: []
DS: /test/lzf   Size: 31        Filters: [LZF]
DS: /test/blosclz       Size: 58        Filters: [Blosc(BloscLZ, 9, Byte)]
DS: /test/blosc-lz4     Size: 2048      Filters: [Blosc(LZ4, 9, Byte)]
DS: /test/blosc-lz4hc   Size: 2048      Filters: [Blosc(LZ4HC, 9, Byte)]
DS: /test/blosc-snappy  Size: 2048      Filters: [Blosc(Snappy, 9, Byte)]
DS: /test/blosc-zlib    Size: 2048      Filters: [Blosc(ZLib, 9, Byte)]
DS: /test/blosc-zstd    Size: 2048      Filters: [Blosc(ZStd, 9, Byte)]

Fixed

Cargo.toml

[dependencies]
hdf5 = { version = "0.9.3", features = ["zlib", "blosc", "lzf", "f16", "complex"], package = "hdf5-metno" }
blosc-sys = { version = "0.3", package = "blosc-src", features = ["lz4", "zlib", "zstd", "snappy"]}

Output

Blosc Available: true; nthreads: 1
DS: /test/raw   Size: 2048      Filters: []
DS: /test/lzf   Size: 31        Filters: [LZF]
DS: /test/blosclz       Size: 58        Filters: [Blosc(BloscLZ, 9, Byte)]
DS: /test/blosc-lz4     Size: 56        Filters: [Blosc(LZ4, 9, Byte)]
DS: /test/blosc-lz4hc   Size: 56        Filters: [Blosc(LZ4HC, 9, Byte)]
DS: /test/blosc-snappy  Size: 132       Filters: [Blosc(Snappy, 9, Byte)]
DS: /test/blosc-zlib    Size: 62        Filters: [Blosc(ZLib, 9, Byte)]
DS: /test/blosc-zstd    Size: 46        Filters: [Blosc(ZStd, 9, Byte)]
@magnusuMET
Copy link
Collaborator

This is the same issue as aldanor#273

When I tried flipping the flag to mandatory I triggered an assert in hdf5-c, so the fix is slightly more convoluted than just that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants