Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add modal launch hf-download utility #2744

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mwaskom
Copy link
Contributor

@mwaskom mwaskom commented Jan 9, 2025

Adds a new modal launch utility intended to make it simple to cache an asset from the Hugging Face hub onto a modal.Volume.

Backward/forward compatibility checks

Check these boxes or delete any item (or this section) if not relevant for this PR.

  • Client+Server: this change is compatible with old servers
  • Client forward compatibility: this change ensures client can accept data intended for later versions of itself

Note on protobuf: protobuf message changes in one place may have impact to
multiple entities (client, server, worker, database). See points above.


Changelog

  • Added a modal launch hf-download utility. This provides a simple CLI for caching an asset from the Hugging Face Hub onto a Modal Volume:

    modal launch hf-download stabilityai/stable-diffusion-xl-base-1.0 hf-hub-cache --secret=hugging-face
    

    To use the assets, mount the Volume to a location that corresponds to the Hugging Face cache directory, as specified by the HF_HUB_CACHE environment variable or passed as the cache_dir in relevant functions.

@erikbern
Copy link
Contributor

I'm not entirely sold on this as a utility vs just making sure our examples show how to download stuff. Feels like this as a tool hides the more general principle of how to run data prep in Modal. Also with HF, you can rely on lazy caching in most cases.

@mwaskom
Copy link
Contributor Author

mwaskom commented Jan 10, 2025

I agree that we should make it clear that Modal can be used throughout the project lifecycle! But this step doesn't seem like a very interesting "preprocessing" step to demonstrate; it's something you'd need to put in every App and it's pure boilerplate.

Relying on implicit caching can also work, but there's a couple advantages here:

  • We're enabling the rust-based hf-transfer download, which is a lot faster, but requires a few extra steps that aren't otherwise relevant for Apps that use the models
  • Writing a Model class so that you can trigger a download as a side effect of spinning up a container isn't always trivial; you need to write some nullary endpoint (which might not be the natural way to write your inference interface), and the download will happen in a container that requests resources you need to actually serve the model, which might be overkill (e.g. it's wasteful to have idling GPUs while the download is happening)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants