Can I load from non-FSDP optimizer state with FSDP2? #765

syncdoth · 2024-12-31T15:52:59Z

I have been running training on a different framework with FSDP1, where I saved the states with FULL_STATE_DICT - leading to optimizer states that are in a normal torch.save format. I'd love to resume from this checkpoint - is this currently supported by FSDP2 / DCP? When I naively try dcp.load it resulted in a shard index out of range error.

The text was updated successfully, but these errors were encountered:

awgu · 2024-12-31T16:42:42Z

There should be a way to load it with DCP cc: @fegin @mori360 .

Full state dicts are state dicts without FSDP sharding. To make them loadable to FSDP2, you just need to iterate over the tensors in the optimizer state that should match the parameter sharding and shard them on dim-0 with DTensor. This can be done with some relatively simple code natively, but I will let @fegin or others comment on what the right way to do this with DCP APIs is.

fegin · 2025-01-08T05:12:09Z

Yes, you can write a script to do the conversion offline -- simply loading the torch.save optimizer state_dict and then call DCP.save. Then the saved checkpoints should be loadable with FSDP2 + DCP. This is a more complicated version: https://github.com/pytorch/torchtitan/blob/main/scripts/convert_llama_to_dcp.py but the idea is the same. In your case, if everything stay the same (e.g., parameter group), simply loading (torch.load) -> DCP.save should work.

fegin · 2025-01-28T18:47:25Z

@syncdoth I'm going to close the issue. Please let me know if you have any further questions.

syncdoth changed the title ~~Can I load from non-FSDP optimizer state?~~ Can I load from non-FSDP optimizer state with FSDP2? Dec 31, 2024

tianyu-l added the question Further information is requested label Jan 2, 2025

fegin closed this as completed Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I load from non-FSDP optimizer state with FSDP2? #765

Can I load from non-FSDP optimizer state with FSDP2? #765

syncdoth commented Dec 31, 2024

awgu commented Dec 31, 2024

fegin commented Jan 8, 2025

fegin commented Jan 28, 2025

Can I load from non-FSDP optimizer state with FSDP2? #765

Can I load from non-FSDP optimizer state with FSDP2? #765

Comments

syncdoth commented Dec 31, 2024

awgu commented Dec 31, 2024

fegin commented Jan 8, 2025

fegin commented Jan 28, 2025