Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applying Dust and largest_k dtype output option #100

Open
doiko opened this issue Sep 3, 2022 · 2 comments
Open

Applying Dust and largest_k dtype output option #100

doiko opened this issue Sep 3, 2022 · 2 comments
Labels
performance Lower memory or faster computation. question Further information is requested

Comments

@doiko
Copy link

doiko commented Sep 3, 2022

Hi,
I apply dust before largest_k; is this the right order? Or performance wise largest_k should be applied first?
My input is a boolean array.

Do you consider casting of largest_k output to the relevant dtype based on k value?
If k in largest_k is less than 65535 no need to for label_out to be uint32, uint16 will be sufficient. So for k<255 label_out can be uint8.
Can this be considered to reduce memory requirements?

Dimitris

@william-silversmith
Copy link
Contributor

william-silversmith commented Sep 3, 2022

Hi Dimitris,

Can you let me know how large your array is and how fast it is executing (and what you are expecting) how much memory it is using (and what you are expecting)? Unfortunately, even if the final array fits in 255, usually at least 10x the number of provisional labels are assigned during the calculation and so using uint8 is rarely possible except for fairly simple images.

Will

@william-silversmith william-silversmith added question Further information is requested performance Lower memory or faster computation. labels Sep 3, 2022
@doiko
Copy link
Author

doiko commented Sep 4, 2022

Hi Will,
Indeed 256 is unlikely to be useful. But asking for the 3000 largest objects might be the case.
My volumes, semantic segmentation binary results, might vary from as small as 1kx1kx1k voxels to 10k x 10k x 4k voxels with most common in the middle of these sizes.
More important for me and possible others is to understand the maximum expected memory requirement of the algorithm for a given volume size.
Currently I am using an overlapping workaround to allow processing volumes that do not fit memory.
I would like to be able to use something like:
z_chunk_size = min( int( max_memory_footprint / (z_size * (dest_cube.dtype.itemsize + source_cube.dtype.itemsize)) ), z_size, )
to estimate the z direction cube chunk size that will fit my memory availability defined in the parameter max_memory_footprint.
Understanding what one has to put in the denominator (z_size * (dest_cube.dtype.itemsize + source_cube.dtype.itemsize)) will be a great help.
Best,
Dimitris

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Lower memory or faster computation. question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants