A bug about batch_size #10

AprilYuge · 2023-03-02T19:38:29Z

I found if a dataset contains cells less than 1/4 the number of cells in the other dataset, the algorithm would train without using the smaller dataset at all. Look at line 142 for batch_size assignment in scDART.py and line 151 in train.py.

Also I am curious about the way you set bacth_size (1/4 of the larger dataset), would that be unnecessarily too large for dataset contain hundreds of thousands of cells in terms of memory usage? Will that be a concern if we fix batch_size at smaller number like 128 (this is what I saw other auto-encoder based algorithms used)?

PeterZZQ · 2023-03-06T01:40:19Z

Hi,

Thanks for pointing out the issue. We have updated the code in line 142batch_size assignment. Instead of using the max, we use min. Yes, you can set your own batch size, and we have that option in the scDART model: batch_size parameter in class initialization. We chose a large batch size as it performs better in the tests of the manuscript.

AprilYuge · 2023-03-06T03:18:05Z

Gotcha, thx!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A bug about batch_size #10

A bug about batch_size #10

AprilYuge commented Mar 2, 2023

PeterZZQ commented Mar 6, 2023

AprilYuge commented Mar 6, 2023

A bug about batch_size #10

A bug about batch_size #10

Comments

AprilYuge commented Mar 2, 2023

PeterZZQ commented Mar 6, 2023

AprilYuge commented Mar 6, 2023