Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A bug about batch_size #10

Open
AprilYuge opened this issue Mar 2, 2023 · 2 comments
Open

A bug about batch_size #10

AprilYuge opened this issue Mar 2, 2023 · 2 comments

Comments

@AprilYuge
Copy link

I found if a dataset contains cells less than 1/4 the number of cells in the other dataset, the algorithm would train without using the smaller dataset at all. Look at line 142 for batch_size assignment in scDART.py and line 151 in train.py.

Also I am curious about the way you set bacth_size (1/4 of the larger dataset), would that be unnecessarily too large for dataset contain hundreds of thousands of cells in terms of memory usage? Will that be a concern if we fix batch_size at smaller number like 128 (this is what I saw other auto-encoder based algorithms used)?

@PeterZZQ
Copy link
Owner

PeterZZQ commented Mar 6, 2023

Hi,

Thanks for pointing out the issue. We have updated the code in line 142batch_size assignment. Instead of using the max, we use min. Yes, you can set your own batch size, and we have that option in the scDART model: batch_size parameter in class initialization. We chose a large batch size as it performs better in the tests of the manuscript.

@AprilYuge
Copy link
Author

Gotcha, thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants