Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用DPP进行多卡训练报错 #115

Open
zhangchi9 opened this issue Jan 15, 2025 · 3 comments
Open

使用DPP进行多卡训练报错 #115

zhangchi9 opened this issue Jan 15, 2025 · 3 comments

Comments

@zhangchi9
Copy link

首先非常感谢开源这个库,受益匪浅。

我用单卡训练没有问题,但是尝试多卡训练的时候报错,具体信息:
OS:Window 11
Driver Version: 560.94
CUDA Version: 12.6
显卡:3080 + 3060
运行指令:torchrun --nproc_per_node 2 1-pretrain.py
Error: RuntimeError: use_libuv was requested but PyTorch was build without libuv support

请问这种情况要怎么解决呢? 谢谢

@heliar-k
Copy link

heliar-k commented Jan 16, 2025

目前看来,pytorch在windows下支持分布式训练比较差,目前并没有一个通用解决方法,可以参考下面这两个链接
https://discuss.pytorch.org/t/how-to-enable-libuv-with-pytorch-on-windows/208836
https://discuss.ray.io/t/executing-ray-train-with-pytorch/21363/2

PS: 或许也可以考虑在wsl里面运行

@jingyaogong
Copy link
Owner

目前看来,pytorch在windows下支持分布式训练比较差,目前并没有一个通用解决方法,可以参考下面这两个链接 https://discuss.pytorch.org/t/how-to-enable-libuv-with-pytorch-on-windows/208836 https://discuss.ray.io/t/executing-ray-train-with-pytorch/21363/2

PS: 或许也可以考虑在wsl里面运行

yes,建议安装wsl,可参考。 @zhangchi9

@zhangchi9
Copy link
Author

感谢 @heliar-k@jingyaogong 的建议,在window11 下安装wsl 成功运行DPP。

具体步骤:

  • 用admin 打开terminal
  • 运行 wsl --install
  • 记得enable CPU virtualization in the BIOS. 否则安装会报错

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants