diff --git a/.gitignore b/.gitignore index 497c021..a2cc488 100644 --- a/.gitignore +++ b/.gitignore @@ -17,3 +17,7 @@ __pycache__/ /doc/_apidoc/ /build + +# Ignore all contents of result and results directories +/result/ +/results/ \ No newline at end of file diff --git a/.gitmodules b/.gitmodules index 5955f18..63ec6f5 100644 --- a/.gitmodules +++ b/.gitmodules @@ -1,4 +1,3 @@ -[submodule "src/hyperpod_cli/custom_launcher/launcher/nemo/nemo_framework_launcher"] - path = src/hyperpod_cli/custom_launcher/launcher/nemo/nemo_framework_launcher - url = https://github.com/NVIDIA/NeMo-Framework-Launcher.git - branch = 3d41c31 +[submodule "src/hyperpod_cli/sagemaker_hyperpod_recipes"] + path = src/hyperpod_cli/sagemaker_hyperpod_recipes + url = https://github.com/aws/sagemaker-hyperpod-recipes.git diff --git a/CHANGELOG.md b/CHANGELOG.md index 96b92a5..c4e9997 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,11 @@ # Changelog +## v2.0.0 (2024-12-04) + +### Features + +- feature: The HyperPod CLI now support ([Hyperpod recipes](https://github.com/aws/sagemaker-hyperpod-recipes.git)). The HyperPod recipes enable customers to get started training and fine-tuning popular publicly-available foundation models like Llama 3.1 405B in minutes. Learn more ([here](https://github.com/aws/sagemaker-hyperpod-recipes.git)). + ## v1.0.0 (2024-09-09) ### Features diff --git a/README.md b/README.md index 85728df..98d76d3 100644 --- a/README.md +++ b/README.md @@ -24,7 +24,7 @@ This documentation serves as a reference for the available HyperPod CLI commands ## Overview -The SageMaker HyperPod CLI is a tool that helps submit training jobs to the Amazon SageMaker HyperPod clusters orchestrated by Amazon EKS. It provides a set of commands for managing the full lifecycle of training jobs, including submitting, describing, listing, and canceling jobs, as well as accessing logs and executing commands within the job's containers. The CLI is designed to abstract away the complexity of working directly with Kubernetes for these core actions of managing jobs on SageMaker HyperPod clusters orchestrated by Amazon EKS. +The SageMaker HyperPod CLI is a tool that helps submit training jobs to the Amazon SageMaker HyperPod clusters orchestrated by Amazon EKS. It provides a set of commands for managing the full lifecycle of training jobs, including submitting, describing, listing, patching and canceling jobs, as well as accessing logs and executing commands within the job's containers. The CLI is designed to abstract away the complexity of working directly with Kubernetes for these core actions of managing jobs on SageMaker HyperPod clusters orchestrated by Amazon EKS. ## Prerequisites @@ -76,6 +76,10 @@ SageMaker HyperPod CLI currently supports start training job with: ``` hyperpod get-clusters ``` + - Get your HyperPod clusters to show their capacities and quota allocation info for a team. + ``` + hyperpod get-clusters -n hyperpod-ns- + ``` - Connect to one HyperPod cluster and specify a namespace you have access to. ``` hyperpod connect-cluster --cluster-name @@ -104,11 +108,12 @@ The HyperPod CLI provides the following commands: This command lists the available SageMaker HyperPod clusters and their capacity information. ``` -hyperpod get-clusters [--region ] [--clusters ] [--orchestrator ] [--output ] +hyperpod get-clusters [--region ] [--clusters ] [--namespace ] [--orchestrator ] [--output ] ``` * `region` (string) - Optional. The region that the SageMaker HyperPod and EKS clusters are located. If not specified, it will be set to the region from the current AWS account credentials. * `clusters` (list[string]) - Optional. A list of SageMaker HyperPod cluster names that users want to check the capacity for. This is useful for users who know some of their most commonly used clusters and want to check the capacity status of the clusters in the AWS account. +* `namespace` (string) - Optional. The namespace that users want to check the quota with. Only the SageMaker managed namespaces are supported. * `orchestrator` (enum) - Optional. The orchestrator type for the cluster. Currently, `'eks'` is the only available option. * `output` (enum) - Optional. The output format. Available values are `table` and `json`. The default value is `json`. @@ -122,19 +127,19 @@ hyperpod connect-cluster --cluster-name [--region ] [--na * `cluster-name` (string) - Required. The SageMaker HyperPod cluster name to configure with. * `region` (string) - Optional. The region that the SageMaker HyperPod and EKS clusters are located. If not specified, it will be set to the region from the current AWS account credentials. -* `namespace` (string) - Optional. The namespace that you want to connect to. If not specified, this command uses the [Kubernetes namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/) of the Amazon EKS cluster associated with the SageMaker HyperPod cluster in your AWS account. +* `namespace` (string) - Optional. The namespace that you want to connect to. If not specified, Hyperpod cli commands will auto discover the accessible namespace. ### Submitting a Job This command submits a new training job to the connected SageMaker HyperPod cluster. ``` -hyperpod start-job --job-name [--namespace ] [--job-kind ] [--image ] [--command ] [--entry-script