Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama2-finetuning-slurm YAML blueprint: schedmd-slurm-gcp-v7-partition not found #3149

Open
xibinliu opened this issue Oct 18, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@xibinliu
Copy link

xibinliu commented Oct 18, 2024

Describe the bug

gcluster create hpc-slurm-llama2.yaml failed

Steps to reproduce

Steps to reproduce the behavior:

  1. Install the latest cluster-toolkit

  2. Clone the scientific-computing-examples and create the installation directory for llama2-finetuning-slurm

> cd scientific-computing-examples/llama2-finetuning-slurm
> gcluster create hpc-slurm-llama2.yaml --vars project_id=$(gcloud config get-value project) -w --vars bucket_model=llama2

Expected behavior

The command should be completed successfully.

Actual behavior

Your active configuration is: [cloudshell-13783]
Error: failed to get info using tfconfig for terraform module at community/modules/compute/schedmd-slurm-gcp-v7-partition: source is not a terraform or packer module: community/modules/compute/schedmd-slurm-gcp-v7-partition
375:     source: community/modules/compute/schedmd-slurm-gcp-v7-partition
         ^

Version (gcluster --version)

> gcluster --version
gcluster version v1.40.1
Built from 'main' branch.
Commit info: v1.40.1-0-geb002543
Terraform version: 1.5.7

Blueprint

If applicable, attach or paste the blueprint YAML used to produce the bug.

scientific-computing-examples/llama2-finetuning-slurm/hpc-slurm-llama2.yaml

Execution environment

  • OS: cloudshell
  • Shell (To find this, run ps -p $$): [bash, zsh, ...]
ps -p $$
    PID TTY          TIME CMD
    493 pts/2    00:00:00 bash
  • go version:
go version
go version go1.23.2 linux/amd64
@xibinliu xibinliu added the bug Something isn't working label Oct 18, 2024
@Aryido
Copy link

Aryido commented Oct 22, 2024

@harshthakkar01
Copy link
Contributor

There was a typo in GoogleCloudPlatform/scientific-computing-examples@edfaf52

Cluster Toolkit doesn't have v7 modules as of now.
Discussed with the Author. This should be fixed today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants