SageMaker HyperPod CLI is a command line tool that helps create and manage training jobs on the SageMaker HyperPod clusters orchestrated by Amazon EKS.
Data scientist users can train foundational models using the EKS cluster set as the orchestrator for the SageMaker HyperPod cluster. Scientists leverage the SageMaker HyperPod CLI to find available SageMaker HyperPod clusters, submit training jobs (Pods), and manage their workloads. The SageMaker HyperPod CLI enables job submission using a training job schema file, and provides capabilities for job listing, description, cancellation, and execution. Scientists can use Kubeflow Training Operator, Kueue (K8s tool for job queuing) and SageMaker-managed MLflow to manage ML experiments and training runs.