-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release v0.6.0 requirements #523
Comments
v1 API? |
Thanks for the reminder, added to must-haves |
Removed API v1 graduation pending finalized design for Kubeflow training v2 which may require JobSet API changes. |
Before cutting the 0.6 release, I'm waiting for the cherry picks of kubernetes/kubernetes#126046 to be completed and patch release of k8s.io/api to be published so we can bump the dependency in JobSet. This upstream bug blocks our ability to use configurable failure policy rules with PodFailurePolicies, which is the main use case for the new configurable failure policy feature (#262). Configurable failure policy is the main feature included in 0.6, one which many customers/users are waiting on, so I don't want to cut the release with it in an incomplete state. |
> supported : - foreach + parallel - parallel with Argo - dynamically set worker-counts. - should work with @timeout / @project /@card etc. - retries working with native Argo - fully self contained jobset with argo support - Requires Jobset v0.6.0 [kubernetes-sigs/jobset#523] > not-supported: - support for catch > Notes - not using the `{{retries}}` like we do in container templates - Instead passing down {{retries}} as a `inputs.parameters` which will be accessible in the Jobset manifest. - Temporary tweek to boto dep to ensure that boto install failures dont fail deployment. - instead of relying on the kubernetes object, we freshly create a object in the ArgoContainer templates. - Code in the same style as the kubernetes/argo integrations with explicit filling of variables and decoupled abstractions - setting annotations explicitly as they wont be passed down from WorkflowTemplate level. - support for jobset native success conditions (requires Jobset v0.6 on controller) - REFACTORS THAT HAVE WENT INTO THIS COMMIT: - [argo][feedback] refactor dag template parameter /output setting - just move conditional block around - [argo][feedback] refactor references to `task_id_base` to `task_id_entropy` - these are set/used in the argo outputs and variable names - [argo][feedback] refactor references to `task-id-base` to `task-id-entropy` - these are uses a Argo Parameter Names. - [argo][feedback] refactor to match code style - [argo][feedback] refactor to match code style (refactor some conditionals) - [argo][feedback] remove k8s client and make `KubernetesArgoJobSet` directly use `kubernetes_sdk` - [argo][feedback] added `environment_variables_from_selectors` for code simplification - [argo][feedback] fix comment. - [argo][feedback] refactor condition for readabililty. - [argo][feedback] rollback temp boto3 installation change in metaflow env - [argo][feedback] remove rogue type hint
> supported : - foreach + parallel - parallel with Argo - dynamically set worker-counts. - should work with @timeout / @project /@card etc. - retries working with native Argo - fully self contained jobset with argo support - Requires Jobset v0.6.0 [kubernetes-sigs/jobset#523] > not-supported: - support for catch > Notes - not using the `{{retries}}` like we do in container templates - Instead passing down {{retries}} as a `inputs.parameters` which will be accessible in the Jobset manifest. - Temporary tweek to boto dep to ensure that boto install failures dont fail deployment. - instead of relying on the kubernetes object, we freshly create a object in the ArgoContainer templates. - Code in the same style as the kubernetes/argo integrations with explicit filling of variables and decoupled abstractions - setting annotations explicitly as they wont be passed down from WorkflowTemplate level. - support for jobset native success conditions (requires Jobset v0.6 on controller) - REFACTORS THAT HAVE WENT INTO THIS COMMIT: - [argo][feedback] refactor dag template parameter /output setting - just move conditional block around - [argo][feedback] refactor references to `task_id_base` to `task_id_entropy` - these are set/used in the argo outputs and variable names - [argo][feedback] refactor references to `task-id-base` to `task-id-entropy` - these are uses a Argo Parameter Names. - [argo][feedback] refactor to match code style - [argo][feedback] refactor to match code style (refactor some conditionals) - [argo][feedback] remove k8s client and make `KubernetesArgoJobSet` directly use `kubernetes_sdk` - [argo][feedback] added `environment_variables_from_selectors` for code simplification - [argo][feedback] fix comment. - [argo][feedback] refactor condition for readabililty. - [argo][feedback] rollback temp boto3 installation change in metaflow env - [argo][feedback] remove rogue type hint
> supported : - foreach + parallel - parallel with Argo - dynamically set worker-counts. - should work with @timeout / @project /@card etc. - retries working with native Argo - fully self contained jobset with argo support - Requires Jobset v0.6.0 [kubernetes-sigs/jobset#523] > not-supported: - support for catch > Notes - not using the `{{retries}}` like we do in container templates - Instead passing down {{retries}} as a `inputs.parameters` which will be accessible in the Jobset manifest. - Temporary tweek to boto dep to ensure that boto install failures dont fail deployment. - instead of relying on the kubernetes object, we freshly create a object in the ArgoContainer templates. - Code in the same style as the kubernetes/argo integrations with explicit filling of variables and decoupled abstractions - setting annotations explicitly as they wont be passed down from WorkflowTemplate level. - support for jobset native success conditions (requires Jobset v0.6 on controller) - REFACTORS THAT HAVE WENT INTO THIS COMMIT: - [argo][feedback] refactor dag template parameter /output setting - just move conditional block around - [argo][feedback] refactor references to `task_id_base` to `task_id_entropy` - these are set/used in the argo outputs and variable names - [argo][feedback] refactor references to `task-id-base` to `task-id-entropy` - these are uses a Argo Parameter Names. - [argo][feedback] refactor to match code style - [argo][feedback] refactor to match code style (refactor some conditionals) - [argo][feedback] remove k8s client and make `KubernetesArgoJobSet` directly use `kubernetes_sdk` - [argo][feedback] added `environment_variables_from_selectors` for code simplification - [argo][feedback] fix comment. - [argo][feedback] refactor condition for readabililty. - [argo][feedback] rollback temp boto3 installation change in metaflow env - [argo][feedback] remove rogue type hint
> supported : - foreach + parallel - parallel with Argo - dynamically set worker-counts. - should work with @timeout / @project /@card etc. - retries working with native Argo - fully self contained jobset with argo support - Requires Jobset v0.6.0 [kubernetes-sigs/jobset#523] > not-supported: - support for catch > Notes - not using the `{{retries}}` like we do in container templates - Instead passing down {{retries}} as a `inputs.parameters` which will be accessible in the Jobset manifest. - Temporary tweek to boto dep to ensure that boto install failures dont fail deployment. - instead of relying on the kubernetes object, we freshly create a object in the ArgoContainer templates. - Code in the same style as the kubernetes/argo integrations with explicit filling of variables and decoupled abstractions - setting annotations explicitly as they wont be passed down from WorkflowTemplate level. - support for jobset native success conditions (requires Jobset v0.6 on controller) - REFACTORS THAT HAVE WENT INTO THIS COMMIT: - [argo][feedback] refactor dag template parameter /output setting - just move conditional block around - [argo][feedback] refactor references to `task_id_base` to `task_id_entropy` - these are set/used in the argo outputs and variable names - [argo][feedback] refactor references to `task-id-base` to `task-id-entropy` - these are uses a Argo Parameter Names. - [argo][feedback] refactor to match code style - [argo][feedback] refactor to match code style (refactor some conditionals) - [argo][feedback] remove k8s client and make `KubernetesArgoJobSet` directly use `kubernetes_sdk` - [argo][feedback] added `environment_variables_from_selectors` for code simplification - [argo][feedback] fix comment. - [argo][feedback] refactor condition for readabililty. - [argo][feedback] rollback temp boto3 installation change in metaflow env - [argo][feedback] remove rogue type hint
Update: the upstream k8s issue has been fixed and cherry picks merged, and will be included in the patch release on 08/13, at which point we can bump our k8s api dependency packages |
k8s api 1.31 packages with the fix mentioned above were released, but after attempting to bump our dependencies I ran into a compatibility issue with controller-runtime, which I found others have hit as well: kubernetes-sigs/controller-runtime#2925 Maintainers say controller-runtime v0.19.0 will be released soon which will support k8s api v0.31.0, so we'll have to wait a bit longer for the JobSet v0.6.0 release. In the meantime I've added a couple more features to the "must have" feature list for this release, since due to these delays we ended up having time to implement them and include them in the release. |
Update: the changes for #617 and #649 are needed urgently by customers and we cannot wait for dependency compatibility issues described in #523 to be resolved. So I will publish the v0.6.0 release today, then include the dependency version bumps in a patch release v0.6.1 once they are ready. After upgrading to k8s.io packages to v0.31.0 and controller-runtime to v0.19.0, I ran into this issue which I haven't debugged yet. |
Actually, since @mimowo completed the cherry picks for the fix we should be able to use k8s v0.30.4 packages, which we just bumped to before the v0.6.0 release. Testing this now. |
Release v0.6.0 published |
Targeting release around June 1st, 2024.
Must haves
Nice to haves
The text was updated successfully, but these errors were encountered: