-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surface cpu and mem requests forbidden errors (and other ones too) in KSVC creation #14453
Surface cpu and mem requests forbidden errors (and other ones too) in KSVC creation #14453
Conversation
/assign @dprotaso |
@dprotaso I found out that sometimes when the code gets to this point: https://github.com/knative/serving/blob/main/pkg/reconciler/revision/reconcile_resources.go#L77 The: I simply modified IsActivationRequired() function as follows: so the logger returned the following logs whenever the error was successfully surfaced: -> logger message: So we rethink how we are surfacing the deployment status here. Removing the WDYT? |
That's weird - prior to reconciliation we preprocess the resource and initialize the conditions Is there something that's deleting the condition erroneously? |
I think is because RevisionActive is not part of the Revision's api.LivingConditionSet One thing about it, when I add it then the cpu limit error don't surface anymore |
Ah yeah that's it. Looks like we never mark it unknown hmmm.. Maybe that's something we should do |
…there is something wrong (like low resources request and limits), since it was just beign propagated when the revision status was nil and the deployment existed trying to surface replicaset creation errors
dfe2bd7
to
933ff82
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #14453 +/- ##
==========================================
- Coverage 86.11% 86.02% -0.10%
==========================================
Files 196 197 +1
Lines 14790 14922 +132
==========================================
+ Hits 12736 12836 +100
- Misses 1747 1776 +29
- Partials 307 310 +3
☔ View full report in Codecov by Sentry. |
…not nil after the revision is created
Since we have the scale to zero edge case I think it makes sense to leave it outside the Revision conditionset, I think the solution to propagate the status is good as it is right now, finishing the tests to remove the WIP |
…can have Ready Inactive Revisions (scale to zero) * added docs and tests for this and the replicaset failure propagation
/retest istio-latest-no-mesh_serving_main |
@gabo1208: The
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test istio-latest-no-mesh |
Found a flaky test so changin this to WIP again |
Hmm true! Checking that failed test case |
… service creation test
/retest istio-latest-no-mesh_serving_main |
@gabo1208: The
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test istio-latest-no-mesh |
… error so adding that case too
/test istio-latest-no-mesh |
1 similar comment
/test istio-latest-no-mesh |
/test istio-latest-no-mesh |
/hold cancel |
/lgtm |
/test istio-latest-no-mesh |
@gabo1208: new pull request created: #14618 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I think the flake surfaces that the |
|
/cherry-pick release-1.12 |
@gabo1208: #14453 failed to apply on top of branch "release-1.12":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
modify rest of ut Update net-kourier nightly (knative#14605) bumping knative.dev/net-kourier 6e4d79d...fb8da93: > fb8da93 upgrade to latest dependencies (# 1151) Signed-off-by: Knative Automation <automation@knative.team> Update net-gateway-api nightly (knative#14602) bumping knative.dev/net-gateway-api a8d56a3...0aa321a: > 0aa321a upgrade to latest dependencies (# 579) Signed-off-by: Knative Automation <automation@knative.team> upgrade to latest dependencies (knative#14608) bumping knative.dev/pkg 5c9b7a8...35011d4: > 35011d4 upgrade to latest dependencies (# 2892) bumping knative.dev/networking 18529fd...e0bee34: > e0bee34 upgrade to latest dependencies (# 889) bumping knative.dev/hack 8834794...5deadde: > 5deadde 🐛 Set latest release only when publishing to Github (# 346) bumping knative.dev/caching c642577...b3781bc: > b3781bc upgrade to latest dependencies (# 806) Signed-off-by: Knative Automation <automation@knative.team> Update net-contour nightly (knative#14612) bumping knative.dev/net-contour 467a573...d2054f2: > d2054f2 upgrade to latest dependencies (# 999) Signed-off-by: Knative Automation <automation@knative.team> Update net-certmanager nightly (knative#14611) bumping knative.dev/net-certmanager 11e6219...8b2a470: > 8b2a470 upgrade to latest dependencies (# 625) > 2248405 upgrade to latest dependencies (# 624) Signed-off-by: Knative Automation <automation@knative.team> Update net-kourier nightly (knative#14613) bumping knative.dev/net-kourier fb8da93...ad58d90: > ad58d90 upgrade to latest dependencies (# 1155) Signed-off-by: Knative Automation <automation@knative.team> Update net-istio nightly (knative#14614) bumping knative.dev/net-istio 7f77e97...e3db912: > e3db912 upgrade to latest dependencies (# 1209) > 1e021c8 upgrade to latest dependencies (# 1208) Signed-off-by: Knative Automation <automation@knative.team> Update net-gateway-api nightly (knative#14616) bumping knative.dev/net-gateway-api 0aa321a...29bf0b9: > 29bf0b9 upgrade to latest dependencies (# 581) > cd26216 upgrade to latest dependencies (# 580) Signed-off-by: Knative Automation <automation@knative.team> Surface cpu and mem requests forbidden errors (and other ones too) in KSVC creation (knative#14453) * reconciling the revision so the deployment status is propagated when there is something wrong (like low resources request and limits), since it was just beign propagated when the revision status was nil and the deployment existed trying to surface replicaset creation errors * added revision condition active to revision live condition set so is not nil after the revision is created * removing RevisionConditionActive from Revision ConditionSet since we can have Ready Inactive Revisions (scale to zero) * added docs and tests for this and the replicaset failure propagation * fixing lint * adjusted e2e and unit tests for the replica failure erros propagation, improved propagation logic + left todos regarding revision conditionset * removed todo from revision lifecycle since the discussion has settled * added test case for revision fast failures when it comes to replicaset failing to create * fixed resource quota error, now it never waits for progress deadline and it fails fast, so removing the bits where it can go one way or another in the e2e resource_quota_error_test * finishing the replicaset deployment status failure bubbling up to the revision table test * removed unused test methods from revision testing package * adding condition to wait for container creation in the resource quota service creation test * with some istio cases this could fail with progress deadline exceeded error so adding that case too * Update resource_quota_error_test.go * formated the test file fix deploy-replicaset-failure UT rename tests remove coment remove comment wip want go correct need fix ScaleTargetInitialized only extra patch patch scale to 0 more test helper changes new tc ut WithPubService activation failure is unreachable copy inactive condition comments comment comment refactor computeActiveCondition allow scale to 0 with no metrics if unreachable remoce comment add UT: initial scale zero: with ready pods change to 1 ready pod replicas 1 change logic order readd activeThreshold change Failed message scaler_test handle no metrics case mend silly typo fix UT shortten test cases scale to 0 if unreachable comment + logic remove controlls podinformer as its not needed revert computeActiveCondition refactor use reachability variables init scale 0 stays no traffic undo imports reorder add comments pa change Inactive reason to Unreachable
changed activationrequired condition while reconciling the revision so the deployment status is propagated when there is something wrong (like low resources request and limits), since it was just beign propagated when the revision status was nil and the deployment existed
Fixes #9857 #4416
Proposed Changes
Release Note