Skip to content

Releases: aws/sagemaker-hyperpod-cli

SageMaker V2.0.0

04 Dec 16:14
bb25aed
Compare
Choose a tag to compare

The HyperPod CLI now support (HyperPod recipes).

The HyperPod recipes enable customers to get started training and fine-tuning popular publicly-available foundation models like Llama 3.1 405B in minutes. Learn more https://github.com/aws/sagemaker-hyperpod-recipes.

Introducing job scheduling integration with SageMaker managed quota allocation policies

Learn more: https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-hyperpod-eks-operate-console-ui-governance.html

  1. New Default Scheduler Type - “SageMaker”:
  • Autofill command with accessible SageMaker managed namespace
  • Autofill command with SageMaker managed queue name
  • Validation checks on priority, namespace and provide valid options if invalid values are detected
  1. Auto discovery of namespace across all HyperPod commands:
  • Namespace will be filled in the following order (ranked from high to low): User provided namespace in CLI parameter, User provided namespace when connecting to the cluster, the system dynamically identifies and configures the namespace where SageMaker resources should operate without requiring manual intervention.
  1. Get available clusters and total accelerator quota allocation per namespace:
  • Users can specify the namespace when invoking get-clusters , then HypePod CLI will read the corresponding cluster queue and display the available/total accelerators allocated to the queue
  1. List jobs with priority:
  • List jobs now includes an extra attribute for each job summary to show the WorkloadPriorityClass specified for each of the job

Important note:

In version 1.0, if the user does not explicitly specify a namespace parameter when running commands (e.g., submitting a job), the CLI would automatically map the Kubernetes namespace to default. However, starting from 2.0 release, if no namespace parameter is specified, HyperPod CLI will auto-discover the namespace user has access to. In order to replicate the same behavior in 2.0, please specify default namespace when connecting to the cluster which will prevent HyperPod CLI from auto discovering. When submitting the jobs, please also override the default scheduler type by adding --scheduler-type Kueue. in order to use Kueue. If you don’t want to use scheduler at all, please set —scheduler-type None

  1. Example on explicitly connect to cluster using default namespace:
hyperpod connect-cluster --namespace default
  1. Example on using Kueue in Version 2.0:
hyperpod start-job \
  --job-name my-training-job \
  --scheduler-type Kueue \
  --image my-docker-image:latest \
  --volume /data:/mnt/data
  1. Example on not using scheduler in Version 2.0
hyperpod start-job \
  --job-name my-training-job \
  --scheduler-type None \
  --image my-docker-image:latest \
  --volume /data:/mnt/data

Helm Chart Changes

  1. enhanced Helm chart support for team-level role association

SageMaker HyperPod CLI v1.0.0

10 Sep 00:05
f365f57
Compare
Choose a tag to compare

SageMaker HyperPod CLI is a command line tool that helps create and manage training jobs on the SageMaker HyperPod clusters orchestrated by Amazon EKS.

Data scientist users can train foundational models using the EKS cluster set as the orchestrator for the SageMaker HyperPod cluster. Scientists leverage the SageMaker HyperPod CLI to find available SageMaker HyperPod clusters, submit training jobs (Pods), and manage their workloads. The SageMaker HyperPod CLI enables job submission using a training job schema file, and provides capabilities for job listing, description, cancellation, and execution. Scientists can use Kubeflow Training Operator, Kueue (K8s tool for job queuing) and SageMaker-managed MLflow to manage ML experiments and training runs.