v0.1 API Review #154

kfswain · 2025-01-06T21:09:57Z

This PR is not intended to be merged, merely a point of reference for review.

Slides: https://docs.google.com/presentation/d/1gtOJS1YA0Ax8KvsGPrHiyZR2dBoWnlLd9aACyojmk68/edit#slide=id.p

k8s-ci-robot · 2025-01-06T21:10:04Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kfswain
Once this PR has been reviewed and has the lgtm label, please assign nikhita for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kfswain · 2025-01-06T21:10:32Z

/hold

api/inferencemodel_types.go

shaneutt · 2025-01-08T17:17:57Z

/cc

candita · 2025-01-08T19:54:53Z

api/inferencemodel_types.go

+	// Sheddable defines the lowest level of criticality. Requests to this band will be shed before
+	// all other bands.
+	Sheddable Criticality = "Sheddable"


nit: Using the name Sheddable exposes the result of this Criticality type rather than a descriptive type. I suggest using NonCritical or some other descriptive type name.

I do agree that it describes what the criticality mechanism does.

I'm having a hard time coming up with an adequate name that wouldn't be synonymous. Non-Critical could be viewed as anything that isn't of the Critical level, which is would include the Default and Sheddable.

I suppose we could use BestEffort to be in line with the Pod QoS classes. But we don't have an equivalent for Guaranteed and maybe a loose equivalent for Burstable.

Definitely open to better names though. LMKWYT

Also open to other name ideas here. Currently we have:

Critical

Default

Sheddable

I don't really love Criticality: NonCritical or Criticality: Critical, maybe there are 3 different terms that clearly indicate a 3 distinct layers of criticality? A naive example would be High, Medium, Low, but that's not the best.

cc @ahg-g

Thoughts on changing Criticality to Priority or introducing a CriticalityClass concept similar to PriorityClass, where the level of criticality can be user-defined? For example:

apiVersion: inference.networking.x-k8s.io/v1alpha1 kind: CriticalityClass metadata: name: non-essential value: 100 --- apiVersion: inference.networking.x-k8s.io/v1alpha1 kind: InferenceModel metadata: name: inferencemodel-sample spec: criticalityClassName: non-essential modelName: tweet-summary poolRef: name: vllm-llama2-7b-pool targetModels: - name: tweet-summary-0

I'm so torn here. I agree with the idea of a user defined priority system... However:

I worry how that might interact with an algo. If we open the gates to user defined prio, we are essentially letting untested priority sets run in prod and implicitly supporting that. I'm no Load Balancing algo expert, but from my limited experience, it seems that could be incredibly challenging to guarantee that it works correctly.
If we keep a discrete set, we maintain the bounds on what is possible, and are able to provide stronger promises.

I also could be worrying about nothing. LMKWYT

Just to summarize today's discussion during the community meeting:

There is some agreement on continuing to use an enum and a predefined set of criticality levels

Change it to a reference and not default it in the api to allow extending this in the future with another parameter forming a oneOf semantics.

We have another comment thread on changing the name for Default, but regarding Sheddable I feel it offers clear semantics as to what it means: that it is the first to be dropped when capacity is constrained. I don't have better suggestions though to address @candita which is completely valid!

Throwing out an idea. Instead of Criticality there could be a simple Sheddable boolean, described with caveats like: when model workloads exceed resources, some model workloads may be dropped (or moved to the end of a scheduling queue) in order to avoid starvation of higher value model workloads. If this model allows for best-effort scheduling, marking it as Sheddable will help obtain results faster from higher value model workloads.

This begs the question about whether the scheduling priority belongs on workloads rather than models. Is there an API object for workloads?

The InferenceModel object is intended to represent a model as a workload(inference). The term workload was elected to not be used, as K8s already has a pretty well established definition: https://kubernetes.io/docs/concepts/workloads/

All that to say, I think we are in agreement.

api/inferencemodel_types.go

candita · 2025-01-08T20:04:03Z

api/inferencemodel_types.go

+type InferenceModelStatus struct {
+	// Conditions track the state of the InferencePool.
+	Conditions []metav1.Condition `json:"conditions,omitempty"`
+}


I suggest using a specifically typed condition as the type of the array, rather than a generic one. This helps to set expectations about what types of conditions are generated. For example, start with a set that describes the availability and degradation of the InferenceModell?

Ack! I left the array as metav1.Condition to match Gateway's prior art: https://github.com/kubernetes-sigs/gateway-api/blob/0e465a76f43137e1d39771ea1a75b6190e7b4ac6/apis/v1/gateway_types.go#L735

But have created typed condition types/reasons.

@candita I completely agree that some sample typed conditions would be helpful here so we can have consistent status across implementations.

As far as a new Condition type, I don't think we've done that before. In Gateway API, we've exclusively used metav1.Condition and then added typed conditions as recommendations for what we'd expect to be set within those conditions. As far as I know, we've never actually added custom validation or types for these conditions though.

@kfswain @robscott sounds good. Having a default/starting condition like set in https://github.com/kubernetes-sigs/gateway-api/blob/main/apis/v1/gateway_types.go#L734 would be great.

+1, completely agree, this also feels like something we should try to get in before releasing v0.1

api/inferencepool_types.go

api/inferencemodel_types.go

candita · 2025-01-08T20:10:35Z

api/inferencepool_types.go

+// InferencePoolStatus defines the observed state of InferencePool
+type InferencePoolStatus struct {
+	// Conditions track the state of the InferencePool.
+	Conditions []metav1.Condition `json:"conditions,omitempty"`


See https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/154/files#r1907775587

Likewise with this section also

candita · 2025-01-08T20:22:15Z

api/inferencepool_types.go

+	// The number must be in the range 1 to 65535.
+	//
+	// +kubebuilder:validation:Minimum=1
+	// +kubebuilder:validation:Maximum=65535


If applicable, I suggest reducing this port range to a smaller list of well-known ports that users can rely on for firewall configuration purposes. Also, don't allow overlap with other well-known ports like those used for dns, http/s, etc.

Holding off on changing this one, just to gather consensus on what range we should limit it to. But I do agree with the idea.

It's possible that we could start with a small range and relax as needed. As the other direction would be nigh impossible

@candita this is meant to be a reference to a port number on a Pod, I can't think of any reasonable way to limit that since I think Kubernetes has likely scaled far enough that there's probably at least one case of each individual port being in use across the many Kubernetes that exist.

robscott · 2025-01-09T00:50:58Z

Thanks to everyone for the help reviewing this! Any/all comments are very appreciated.

We also went over this at a high level in a meeting with SIG-Net TLs (and others), walking through some slides as a reference point. Since this is starting as an x-k8s.io API, formal API review is optional, but we're trying to get as much feedback as we can early in the process. Barring any blocking feedback, we're hoping to release v0.1 next week.

Copying SIG-Net TLs and chairs if any have time for additional review cycles.

/cc @aojea @danwinship @MikeZappa87 @shaneutt @thockin

shaneutt · 2025-01-09T11:47:55Z

Copying SIG-Net TLs and chairs if any have time for additional review cycles.

I've set aside time tomorrow (Friday) to review 👍

danehans · 2025-01-09T18:16:37Z

api/inferencemodel_types.go

+	// defaults to 1.
+	//
+	// +optional
+	// +kubebuilder:default=1


Based on API review feedback from @smarterclayton, this default should be removed. Defaults are easier to add later as the API matures.

Additionally, weight is a concept specific to a load-balancing/scheduling extension. During the API review, we discussed the need to document this dependency.

+1 on all points. I think we can tackle all of these items with smaller individual PRs instead of trying to solve everything in one go, so if anyone's looking for ways to jump in, feel free to grab some of these actionable comments and turn them into PRs.

Ah, I was going to capture them here and then just copy the contents of the files back in to their original homes. Handling them piecewise works also. Open to whatever

danehans · 2025-01-09T18:18:18Z

api/inferencemodel_types.go

+	// +kubebuilder:default=1
+	// +kubebuilder:validation:Minimum=0
+	// +kubebuilder:validation:Maximum=1000000
+	Weight int32 `json:"weight,omitempty"`


If defaulting is removed, should this field be a pointer since it's optional? API conventions state this should be a pointer if it's optional.

Yep, agree on both points - default should be removed and this should become a pointer.

danehans · 2025-01-09T18:21:14Z

api/inferencemodel_types.go

+	// Criticality defines how important it is to serve the model compared to other models referencing the same pool.
+	//
+	// +optional
+	// +kubebuilder:default="Default"


Based on feedback from @smarterclayton, this default should be removed to allow this concept to evolve by introducing oneOf (union types). If this default is being removed, should we also remove the associated "Default" enum?

That's a good point - if we're no longer defaulting, we can't really call the middle value "default", so we likely need at least one new name here. Any ideas?

cc @ahg-g

Agreed, based on today's discussion we should remove this defaulting.

Regarding "Default", I think we have two options:

Rename to Medium or BestEffort

Remove it completely and stick with two only: Critical and Sheddable. This is perhaps acceptable now that we have a clearer path to extending the scope in the future.

wdyt?

If we are boiling this to two values, do we want to make the Criticality type be an indirection to bool rather than string, and then just rename to Critical?

@candita mentions the same: #154 (comment)

I am still trying to understand the intent of criticality. Can someone help me understand what "serve" means in the godocs? Does "serve" mean if the Node does not have enough resources, i.e. memory, model servers of an InferenceModel with criticality: BestEffort will be evicted, or is criticality a networking classification concept?

fair question. We used to have a glossary to define some of these terms. (we call it priority here)

But to quickly summarize. Criticality is intended to be a load-balancing concept. So if there is always enough capacity to place a request on any model server, criticality does nothing. But should we need to queue requests, criticality will prioritize the requests that come from a critical workload. WE cant exactly guarantee like we can with scheduling on a node as traffic can wax and wane much more quickly

@kfswain thanks for the glossary reference. Consider including points from the glossary or ^ to the Criticality godocs to better define the concept. Without the additional details, I can see someone interpreting Criticality as pod priority, network priority, etc.

api/inferencemodel_types.go

api/inferencepool_types.go

k8s-ci-robot · 2025-01-09T21:32:16Z

@kfswain: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-gateway-api-inference-extension-test-unit-main	`613fd37`	link	true	`/test pull-gateway-api-inference-extension-test-unit-main`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 6, 2025

k8s-ci-robot requested a review from mrbobbytables January 6, 2025 21:10

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jan 6, 2025

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 6, 2025

mrbobbytables removed their request for review January 6, 2025 21:13

danehans reviewed Jan 6, 2025

View reviewed changes

api/inferencemodel_types.go Outdated Show resolved Hide resolved

v0.1

3c6bd6e

kfswain force-pushed the api-review branch from 23e6d36 to 3c6bd6e Compare January 6, 2025 22:55

k8s-ci-robot requested a review from shaneutt January 8, 2025 17:18

candita reviewed Jan 8, 2025

View reviewed changes

api/inferencemodel_types.go Outdated Show resolved Hide resolved

candita reviewed Jan 8, 2025

View reviewed changes

api/inferencepool_types.go Show resolved Hide resolved

candita reviewed Jan 8, 2025

View reviewed changes

api/inferencemodel_types.go Outdated Show resolved Hide resolved

candita reviewed Jan 8, 2025

View reviewed changes

candita feedback

3f971ca

k8s-ci-robot requested review from aojea, danwinship and thockin January 9, 2025 00:50

danehans reviewed Jan 9, 2025

View reviewed changes

api/inferencemodel_types.go Outdated Show resolved Hide resolved

candita reviewed Jan 9, 2025

View reviewed changes

api/inferencepool_types.go Outdated Show resolved Hide resolved

feedback updates

613fd37

v0.1 API Review #154

Are you sure you want to change the base?

v0.1 API Review #154

Conversation

kfswain commented Jan 6, 2025 • edited Loading

k8s-ci-robot commented Jan 6, 2025

kfswain commented Jan 6, 2025

shaneutt commented Jan 8, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahg-g Jan 9, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

candita Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robscott commented Jan 9, 2025 • edited Loading

shaneutt commented Jan 9, 2025

danehans Jan 9, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danehans Jan 9, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Jan 9, 2025

kfswain commented Jan 6, 2025 •

edited

Loading

ahg-g Jan 9, 2025 •

edited

Loading

candita Jan 8, 2025 •

edited

Loading

robscott commented Jan 9, 2025 •

edited

Loading

danehans Jan 9, 2025 •

edited

Loading

danehans Jan 9, 2025 •

edited

Loading