-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v0.1 API Review #154
base: v0.0
Are you sure you want to change the base?
v0.1 API Review #154
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: kfswain The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
/cc |
// Sheddable defines the lowest level of criticality. Requests to this band will be shed before | ||
// all other bands. | ||
Sheddable Criticality = "Sheddable" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Using the name Sheddable
exposes the result of this Criticality type rather than a descriptive type. I suggest using NonCritical
or some other descriptive type name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree that it describes what the criticality mechanism does.
I'm having a hard time coming up with an adequate name that wouldn't be synonymous. Non-Critical
could be viewed as anything that isn't of the Critical
level, which is would include the Default
and Sheddable
.
I suppose we could use BestEffort
to be in line with the Pod QoS classes. But we don't have an equivalent for Guaranteed
and maybe a loose equivalent for Burstable
.
Definitely open to better names though. LMKWYT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also open to other name ideas here. Currently we have:
- Critical
- Default
- Sheddable
I don't really love Criticality: NonCritical
or Criticality: Critical
, maybe there are 3 different terms that clearly indicate a 3 distinct layers of criticality? A naive example would be High, Medium, Low, but that's not the best.
cc @ahg-g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on changing Criticality
to Priority
or introducing a CriticalityClass concept similar to PriorityClass, where the level of criticality can be user-defined? For example:
apiVersion: inference.networking.x-k8s.io/v1alpha1
kind: CriticalityClass
metadata:
name: non-essential
value: 100
---
apiVersion: inference.networking.x-k8s.io/v1alpha1
kind: InferenceModel
metadata:
name: inferencemodel-sample
spec:
criticalityClassName: non-essential
modelName: tweet-summary
poolRef:
name: vllm-llama2-7b-pool
targetModels:
- name: tweet-summary-0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm so torn here. I agree with the idea of a user defined priority system... However:
I worry how that might interact with an algo. If we open the gates to user defined prio, we are essentially letting untested priority sets run in prod and implicitly supporting that. I'm no Load Balancing algo expert, but from my limited experience, it seems that could be incredibly challenging to guarantee that it works correctly.
If we keep a discrete set, we maintain the bounds on what is possible, and are able to provide stronger promises.
I also could be worrying about nothing. LMKWYT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to summarize today's discussion during the community meeting:
- There is some agreement on continuing to use an enum and a predefined set of criticality levels
- Change it to a reference and not default it in the api to allow extending this in the future with another parameter forming a oneOf semantics.
We have another comment thread on changing the name for Default, but regarding Sheddable I feel it offers clear semantics as to what it means: that it is the first to be dropped when capacity is constrained. I don't have better suggestions though to address @candita which is completely valid!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throwing out an idea. Instead of Criticality
there could be a simple Sheddable
boolean, described with caveats like: when model workloads exceed resources, some model workloads may be dropped (or moved to the end of a scheduling queue) in order to avoid starvation of higher value model workloads. If this model allows for best-effort scheduling, marking it as Sheddable will help obtain results faster from higher value model workloads.
This begs the question about whether the scheduling priority belongs on workloads rather than models. Is there an API object for workloads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The InferenceModel
object is intended to represent a model as a workload(inference). The term workload was elected to not be used, as K8s already has a pretty well established definition: https://kubernetes.io/docs/concepts/workloads/
All that to say, I think we are in agreement.
type InferenceModelStatus struct { | ||
// Conditions track the state of the InferencePool. | ||
Conditions []metav1.Condition `json:"conditions,omitempty"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest using a specifically typed condition as the type of the array, rather than a generic one. This helps to set expectations about what types of conditions are generated. For example, start with a set that describes the availability and degradation of the InferenceModell?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack! I left the array as metav1.Condition
to match Gateway's prior art: https://github.com/kubernetes-sigs/gateway-api/blob/0e465a76f43137e1d39771ea1a75b6190e7b4ac6/apis/v1/gateway_types.go#L735
But have created typed condition types/reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@candita I completely agree that some sample typed conditions would be helpful here so we can have consistent status across implementations.
As far as a new Condition type, I don't think we've done that before. In Gateway API, we've exclusively used metav1.Condition
and then added typed conditions as recommendations for what we'd expect to be set within those conditions. As far as I know, we've never actually added custom validation or types for these conditions though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kfswain @robscott sounds good. Having a default/starting condition like set in https://github.com/kubernetes-sigs/gateway-api/blob/main/apis/v1/gateway_types.go#L734 would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, completely agree, this also feels like something we should try to get in before releasing v0.1
// InferencePoolStatus defines the observed state of InferencePool | ||
type InferencePoolStatus struct { | ||
// Conditions track the state of the InferencePool. | ||
Conditions []metav1.Condition `json:"conditions,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise with this section also
// The number must be in the range 1 to 65535. | ||
// | ||
// +kubebuilder:validation:Minimum=1 | ||
// +kubebuilder:validation:Maximum=65535 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If applicable, I suggest reducing this port range to a smaller list of well-known ports that users can rely on for firewall configuration purposes. Also, don't allow overlap with other well-known ports like those used for dns, http/s, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Holding off on changing this one, just to gather consensus on what range we should limit it to. But I do agree with the idea.
It's possible that we could start with a small range and relax as needed. As the other direction would be nigh impossible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@candita this is meant to be a reference to a port number on a Pod, I can't think of any reasonable way to limit that since I think Kubernetes has likely scaled far enough that there's probably at least one case of each individual port being in use across the many Kubernetes that exist.
Thanks to everyone for the help reviewing this! Any/all comments are very appreciated. We also went over this at a high level in a meeting with SIG-Net TLs (and others), walking through some slides as a reference point. Since this is starting as an Copying SIG-Net TLs and chairs if any have time for additional review cycles. |
I've set aside time tomorrow (Friday) to review 👍 |
// defaults to 1. | ||
// | ||
// +optional | ||
// +kubebuilder:default=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on API review feedback from @smarterclayton, this default should be removed. Defaults are easier to add later as the API matures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, weight
is a concept specific to a load-balancing/scheduling extension. During the API review, we discussed the need to document this dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on all points. I think we can tackle all of these items with smaller individual PRs instead of trying to solve everything in one go, so if anyone's looking for ways to jump in, feel free to grab some of these actionable comments and turn them into PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was going to capture them here and then just copy the contents of the files back in to their original homes. Handling them piecewise works also. Open to whatever
// +kubebuilder:default=1 | ||
// +kubebuilder:validation:Minimum=0 | ||
// +kubebuilder:validation:Maximum=1000000 | ||
Weight int32 `json:"weight,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If defaulting is removed, should this field be a pointer since it's optional? API conventions state this should be a pointer if it's optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, agree on both points - default should be removed and this should become a pointer.
api/inferencemodel_types.go
Outdated
// Criticality defines how important it is to serve the model compared to other models referencing the same pool. | ||
// | ||
// +optional | ||
// +kubebuilder:default="Default" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on feedback from @smarterclayton, this default should be removed to allow this concept to evolve by introducing oneOf
(union types). If this default is being removed, should we also remove the associated "Default" enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point - if we're no longer defaulting, we can't really call the middle value "default", so we likely need at least one new name here. Any ideas?
cc @ahg-g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, based on today's discussion we should remove this defaulting.
Regarding "Default", I think we have two options:
- Rename to
Medium
orBestEffort
- Remove it completely and stick with two only:
Critical
andSheddable
. This is perhaps acceptable now that we have a clearer path to extending the scope in the future.
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are boiling this to two values, do we want to make the Criticality type be an indirection to bool rather than string, and then just rename to Critical
?
@candita mentions the same: #154 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still trying to understand the intent of criticality
. Can someone help me understand what "serve" means in the godocs? Does "serve" mean if the Node does not have enough resources, i.e. memory, model servers of an InferenceModel with criticality: BestEffort
will be evicted, or is criticality
a networking classification concept?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair question. We used to have a glossary to define some of these terms. (we call it priority here)
But to quickly summarize. Criticality
is intended to be a load-balancing concept. So if there is always enough capacity to place a request on any model server, criticality does nothing. But should we need to queue requests, criticality will prioritize the requests that come from a critical
workload. WE cant exactly guarantee like we can with scheduling on a node as traffic can wax and wane much more quickly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kfswain thanks for the glossary reference. Consider including points from the glossary or ^ to the Criticality
godocs to better define the concept. Without the additional details, I can see someone interpreting Criticality
as pod priority, network priority, etc.
@kfswain: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This PR is not intended to be merged, merely a point of reference for review.
Slides: https://docs.google.com/presentation/d/1gtOJS1YA0Ax8KvsGPrHiyZR2dBoWnlLd9aACyojmk68/edit#slide=id.p