You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I run the following code directly, it seems there is a deadlock inside with spawn multiple processes. The follow code is exactly the same as just use llm.import_ckpt directly as a main function.
from nemo.collections import llm
import nemo_run as run
ckpt_import = run.Partial(
llm.import_ckpt,
model=run.Config(llm.LlamaModel, config=run.Config(llm.Llama31Config8B)),
source=f'hf://{model_path}',
overwrite=True,
)
local_executor = run.LocalExecutor()
run.run(ckpt_import, executor=local_executor, direct=True)
However, if I remove direct=True, and use the local_executor to run the importer, the checkpoint can be successfully transformed.
run.run(ckpt_import, executor=local_executor) # can sucess
Can you let me know whether this behavior is expected or there is a bug within the ModelConnector?
Thank you!
The text was updated successfully, but these errors were encountered:
When I run the following code directly, it seems there is a deadlock inside with spawn multiple processes. The follow code is exactly the same as just use llm.import_ckpt directly as a main function.
However, if I remove direct=True, and use the local_executor to run the importer, the checkpoint can be successfully transformed.
Can you let me know whether this behavior is expected or there is a bug within the ModelConnector?
Thank you!
The text was updated successfully, but these errors were encountered: