-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training issues #3
Comments
Sorry, it should be "guided-diffusion_64_256_upsampler.pt". README and scripts have been updated. |
Get it! Thanks so much for such a quick reply! Thank you! |
Also, I would like to know which file is the "landscape_linear1000_16x64x64_shiftT_window148_lr1e-4_ema_100000.pt" in multimodal_train.sh. |
Training multimodal-generation model requires no initialization, it has been updated now. |
Thinks for your reply! |
Hello, I'm disturbing you again. I saw in the supplemental material of the paper that the training process uses 32 V100s with a batch size of 128, but the current open source training script has a batch size of 4 and the number of graphics cards used is 1. Could you please provide me with the training script you used in your experiments? I look forward to your reply, thank you very much. |
The batchsize aims at one GPU. For example, set "--GPU 0,1,2,3,4,5,6,7 mpiexec -n 8 python..." , the total batchsize equals 4*8=32. Our training requires 4 Nodes, that is 32*GPUs, you need to apply the scripts across multiple nodes according the requirements of your own cluster. |
Get it,thank you! |
Hello, I am training AIST dataset with 8 A100 cards, each card has a batch size of 12 and the overall batch size is 96. After 10,000 steps of training, the video as well as the sound from the test is still full noise. I'm not sure what the reason is at the moment.
Looking forward to your reply, thank you! |
In my experiments, updating to 50000 steps will have meaningful results. You can follow these advices and continue training on the current training checkpoints. |
Thank you very much for your reply, I am sure it will help me in my experiment! |
Hello @ludanruan , thanks for sharing information. I was wondering what is the average time in hours to have meaningful results (or average step time) on Landscape or AIST++ datasets? |
In my experiments, 50,000 iter brings meaningful results.
…---- Replied Message ----
| From | Ahmet Selim ***@***.***> |
| Date | 06/19/2023 03:47 |
| To | ***@***.***> |
| Cc | Ludan ***@***.***>***@***.***> |
| Subject | Re: [researchmm/MM-Diffusion] Training issues (Issue #3) |
Hello @ludanruan , thanks for sharing information. I was wondering what is the average time in hours to have meaningful results (or average step time) on Landscape or AIST++ datasets?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@ludanruan Perhaps, my question was not so clear. I meant the average training time in hours/days to achieve this 50,000 iterations with V100 GPU's (as I have read from the paper.)? Thanks, best |
In my experiments setting(32x32GV100), 30,000 iter takes one day.
In other words, I get meaningful results?within 2 days.
…---- Replied Message ----
| From | Ahmet Selim ***@***.***> |
| Date | 06/19/2023 13:59 |
| To | ***@***.***> |
| Cc | Ludan ***@***.***>***@***.***> |
| Subject | Re: [researchmm/MM-Diffusion] Training issues (Issue #3) |
@ludanruan Perhaps, my question was not so clear. I meant the average training time in hours/days to achieve this 50,000 iterations with V100 GPU's (as I have read from the paper.)?
Thanks, best
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Thank you for the information! I needed since I am planning to do research on this :) |
Hi,
I found a file "landscape_linear1000_16x64x64_shiftT_window148_lr1e-4_ema_100000.pt" in the training script. Is this file the landscape.pt in the open-source model? I am looking forward to your answer, thank you very much.
The text was updated successfully, but these errors were encountered: