Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about train question #447

Open
GuGuGuGun opened this issue Nov 7, 2024 · 2 comments
Open

about train question #447

GuGuGuGun opened this issue Nov 7, 2024 · 2 comments
Assignees

Comments

@GuGuGuGun
Copy link

when model trained with LoRa, is the visual part trained as well

when i use 2×a800 to full train the model, the command prompt stays on 'True' and there is no response,but gpu has allocated the memory
image
image

@GuGuGuGun
Copy link
Author

and alert "Watchdog caught a collective operation timeout: WorkNCCL(SeqNum=1, OpType=BROADCAST, Timeout(ms)=1800000) ran for 1808543 milliseconds before timing out."

@GuGuGuGun
Copy link
Author

new state: OOM
:(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants