You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using ArchGDAL to save climate data as large 3D GeoTIFFs, but I've noticed that files created in ArchGDAL are roughly 3x larger than those created with gdal_* binaries. As a diagnostic I eliminated all my calculations and wrote an ArchGDAL function that just resaves its input file using a given compression method. Then I wrapped this in a test function that also recompresses the ArchGDAL output using gdal_translate and compares resulting file sizes.
Spoiler: small input GeoTIFFs have the same output size, but very large input files (4 GB) become much larger after passing through ArchGDAL.
First my test code:
using ArchGDAL, GDAL_jll
functioncopy_gtiff(infile, outfile; compressmethod="ZSTD")
ArchGDAL.readraster(infile) do origdataset
nbands = ArchGDAL.nraster(origdataset)
band1 = ArchGDAL.getband(origdataset, 1)
nodata = ArchGDAL.getnodatavalue(band1)
isfile(outfile) &&rm(outfile)
origdata = origdataset[:,:,:]
ArchGDAL.create(outfile,
driver = ArchGDAL.getdriver("GTiff"),
width = ArchGDAL.width(origdataset),
height = ArchGDAL.height(origdataset),
nbands = nbands,
dtype = ArchGDAL.pixeltype(band1),
options = ["BIGTIFF=YES", "COMPRESS=$compressmethod"]
) do dataset
geotransform = ArchGDAL.getgeotransform(origdataset)
proj = ArchGDAL.getproj(origdataset)
ArchGDAL.setgeotransform!(dataset, geotransform)
ArchGDAL.setproj!(dataset, proj)
for b =1:nbands
band = ArchGDAL.getband(dataset, b)
ArchGDAL.setnodatavalue!(band, nodata)
ArchGDAL.write!(band, origdata[:,:,b])
endendreturn origdata
endendfunctionrecompress_gtiff(infile, outfile; compressmethod="ZSTD")
gdal_translate_path() do gdal_translate
run(`$gdal_translate -q -co BIGTIFF=YES -co COMPRESS=$compressmethod$infile$outfile`)
endendfilesizeMB(filename) =round(filesize(filename)/1024^2, digits=1)
functionreadalldata(filename)
ArchGDAL.readraster(filename) do dataset
return dataset[:,:,:]
endendfunctiontestcopy(compressmethod="ZSTD")
for nbands in [3, 30, 300, 1000, 2920]
infile ="test_ZSTD_$(nbands)bands.tif"
outfile1, outfile2 =tempname(), tempname()
println("\nTesting with $nbands bands, compression method = $compressmethod:")
print(" copying $infile with ArchGDAL...")
time =round(@elapsed origdata =copy_gtiff(infile, outfile1; compressmethod); digits=1)
println(" ($time s). Original file $(filesizeMB(infile)) MB, copy $(filesizeMB(outfile1)) MB.")
print(" recompressing with gdal_translate...")
time =round(@elapsedrecompress_gtiff(outfile1, outfile2; compressmethod); digits=1)
println(" ($time s). Recompressed file $(filesizeMB(outfile2)) MB.")
print(" reading copied file...")
time =round(@elapsed newdata =readalldata(outfile1); digits=1)
println(" ($time s). Identical to original file: $(all(origdata .== newdata)).")
print(" reading recompressed file...")
time =round(@elapsed newdata =readalldata(outfile2); digits=1)
println(" ($time s). Identical to original file: $(all(origdata .== newdata)).")
rm(outfile1)
rm(outfile2)
endend
Running the tests using ZSTD compression:
julia> testcopy("ZSTD")
Testing with 3 bands, compression method = ZSTD:
copying test_ZSTD_3bands.tif with ArchGDAL... (1.3 s). Original file 3.9 MB, copy 3.9 MB.
recompressing with gdal_translate... (1.2 s). Recompressed file 3.9 MB.
reading copied file... (0.0 s). Identical to original file: true.
reading recompressed file... (0.0 s). Identical to original file: true.
Testing with 30 bands, compression method = ZSTD:
copying test_ZSTD_30bands.tif with ArchGDAL... (2.1 s). Original file 40.2 MB, copy 40.2 MB.
recompressing with gdal_translate... (2.1 s). Recompressed file 40.2 MB.
reading copied file... (0.2 s). Identical to original file: true.
reading recompressed file... (0.3 s). Identical to original file: true.
Testing with 300 bands, compression method = ZSTD:
copying test_ZSTD_300bands.tif with ArchGDAL... (8.6 s). Original file 405.8 MB, copy 405.8 MB.
recompressing with gdal_translate... (8.4 s). Recompressed file 405.8 MB.
reading copied file... (1.6 s). Identical to original file: true.
reading recompressed file... (1.6 s). Identical to original file: true.
Testing with 1000 bands, compression method = ZSTD:
copying test_ZSTD_1000bands.tif with ArchGDAL... (23.1 s). Original file 1355.7 MB, copy 1355.7 MB.
recompressing with gdal_translate... (22.3 s). Recompressed file 1355.7 MB.
reading copied file... (5.6 s). Identical to original file: true.
reading recompressed file... (5.8 s). Identical to original file: true.
Testing with 2920 bands, compression method = ZSTD:
copying test_ZSTD_2920bands.tif with ArchGDAL... (171.5 s). Original file 3968.5 MB, copy 10906.3 MB.
recompressing with gdal_translate... (61.3 s). Recompressed file 3968.5 MB.
reading copied file... (15.1 s). Identical to original file: true.
reading recompressed file... (15.6 s). Identical to original file: true.
My 5 test files are available in a Box folder here. I get similar results for other compression methods. Some valid values are DEFLATE, LZW and NONE. My test results for these are in a log file in that Box folder. My installed versions: ArchGDAL v0.7.4 and GDAL_jll v300.202.100+0.
I suspect the problem is some kind of interaction with the BIGTIFF setting, which is required for creating GeoTIFFs larger than 4 GB.
The text was updated successfully, but these errors were encountered:
Ha I meant that that this is more likely a OSGeo/gdal issue, in that there is unlikely to be any Julia wrapping code that behaves differently somewhere between 1000 and 2920 bands.
It might be some work, but perhaps you can reproduce this with generated data, such that people don't need to download such a large raster? Then you can also see if it also happens if you have say 2 rows and 2 columns. And one way to verify is to write this test in Python, to see if it occurs there as well. GDAL devs are more likely to be able to run those reproducers as well.
For compression it can play a big role whether your file is band or pixel interleaved and whether or not it is tiled. gdal_translate might copy this info from the source file, while your code is not.
Originally posted at discourse, full text below including test code and links to sample data. I opened the issue here because visr suggested the problem might be in GDAL.jl instead of ArchGDAL.
https://discourse.julialang.org/t/inefficient-compression-of-very-large-geotiffs-with-archgdal-compared-to-gdal-binaries/70687/1
I'm using ArchGDAL to save climate data as large 3D GeoTIFFs, but I've noticed that files created in ArchGDAL are roughly 3x larger than those created with gdal_* binaries. As a diagnostic I eliminated all my calculations and wrote an ArchGDAL function that just resaves its input file using a given compression method. Then I wrapped this in a test function that also recompresses the ArchGDAL output using gdal_translate and compares resulting file sizes.
Spoiler: small input GeoTIFFs have the same output size, but very large input files (4 GB) become much larger after passing through ArchGDAL.
First my test code:
Running the tests using ZSTD compression:
My 5 test files are available in a Box folder here. I get similar results for other compression methods. Some valid values are DEFLATE, LZW and NONE. My test results for these are in a log file in that Box folder. My installed versions: ArchGDAL v0.7.4 and GDAL_jll v300.202.100+0.
I suspect the problem is some kind of interaction with the BIGTIFF setting, which is required for creating GeoTIFFs larger than 4 GB.
The text was updated successfully, but these errors were encountered: