The HyperPod CLI now support (HyperPod recipes).
The HyperPod recipes enable customers to get started training and fine-tuning popular publicly-available foundation models like Llama 3.1 405B in minutes. Learn more https://github.com/aws/sagemaker-hyperpod-recipes.
Introducing job scheduling integration with SageMaker managed quota allocation policies
Learn more: https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-hyperpod-eks-operate-console-ui-governance.html
- New Default Scheduler Type - “SageMaker”:
- Autofill command with accessible SageMaker managed namespace
- Autofill command with SageMaker managed queue name
- Validation checks on priority, namespace and provide valid options if invalid values are detected
- Auto discovery of namespace across all HyperPod commands:
- Namespace will be filled in the following order (ranked from high to low): User provided namespace in CLI parameter, User provided namespace when connecting to the cluster, the system dynamically identifies and configures the namespace where SageMaker resources should operate without requiring manual intervention.
- Get available clusters and total accelerator quota allocation per namespace:
- Users can specify the namespace when invoking get-clusters , then HypePod CLI will read the corresponding cluster queue and display the available/total accelerators allocated to the queue
- List jobs with priority:
- List jobs now includes an extra attribute for each job summary to show the WorkloadPriorityClass specified for each of the job
Important note
:
In version 1.0, if the user does not explicitly specify a namespace parameter when running commands (e.g., submitting a job), the CLI would automatically map the Kubernetes namespace to default. However, starting from 2.0 release, if no namespace parameter is specified, HyperPod CLI will auto-discover the namespace user has access to. In order to replicate the same behavior in 2.0, please specify default namespace when connecting to the cluster which will prevent HyperPod CLI from auto discovering. When submitting the jobs, please also override the default scheduler type by adding --scheduler-type Kueue. in order to use Kueue. If you don’t want to use scheduler at all, please set —scheduler-type None
- Example on explicitly connect to cluster using default namespace:
hyperpod connect-cluster --namespace default
- Example on using Kueue in Version 2.0:
hyperpod start-job \
--job-name my-training-job \
--scheduler-type Kueue \
--image my-docker-image:latest \
--volume /data:/mnt/data
- Example on not using scheduler in Version 2.0
hyperpod start-job \
--job-name my-training-job \
--scheduler-type None \
--image my-docker-image:latest \
--volume /data:/mnt/data
Helm Chart Changes
- enhanced Helm chart support for team-level role association