Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Object 'Kind' is missing in 'null'" #3743

Open
alex-berger opened this issue Dec 15, 2024 · 8 comments
Open

"Object 'Kind' is missing in 'null'" #3743

alex-berger opened this issue Dec 15, 2024 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@alex-berger
Copy link
Contributor

What steps did you take and what happened:

After upgrading to Gatekeeper 3.18.0 we are observing below error message (exactly 4 messages every 2 minutes) and now I am wondering what might cause this.

{
    "level": "error",
    "ts": 1734272869.0688941,
    "logger": "webhook",
    "msg": "error while excluding namespace",
    "hookType": "validation",
    "error": "Object 'Kind' is missing in 'null'",
    "stacktrace": "github.com/open-policy-agent/gatekeeper/v3/pkg/webhook.(*validationHandler).Handle\n\t/go/src/github.com/open-policy-agent/gatekeeper/pkg/webhook/policy.go:172\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle\n\t/go/src/github.com/open-policy-agent/gatekeeper/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/webhook.go:169\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP\n\t/go/src/github.com/open-policy-agent/gatekeeper/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/http.go:119\nsigs.k8s.io/controller-runtime/pkg/webhook/internal/metrics.InstrumentedHook.InstrumentHandlerInFlight.func1\n\t/go/src/github.com/open-policy-agent/gatekeeper/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:60\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2220\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1\n\t/go/src/github.com/open-policy-agent/gatekeeper/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:147\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2220\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2\n\t/go/src/github.com/open-policy-agent/gatekeeper/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:109\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2220\nnet/http.(*ServeMux).ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2747\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:3210\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2092"
}

Here a formatted version of the error message's stack trace:

github.com/open-policy-agent/gatekeeper/v3/pkg/webhook.(*validationHandler).Handle
    /go/src/github.com/open-policy-agent/gatekeeper/pkg/webhook/policy.go:172\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle
    /go/src/github.com/open-policy-agent/gatekeeper/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/webhook.go:169\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP
    /go/src/github.com/open-policy-agent/gatekeeper/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/http.go:119\nsigs.k8s.io/controller-runtime/pkg/webhook/internal/metrics.InstrumentedHook.InstrumentHandlerInFlight.func1
    /go/src/github.com/open-policy-agent/gatekeeper/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:60\nnet/http.HandlerFunc.ServeHTTP
    /usr/local/go/src/net/http/server.go:2220\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1
    /go/src/github.com/open-policy-agent/gatekeeper/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:147\nnet/http.HandlerFunc.ServeHTTP
    /usr/local/go/src/net/http/server.go:2220\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2
    /go/src/github.com/open-policy-agent/gatekeeper/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:109\nnet/http.HandlerFunc.ServeHTTP
    /usr/local/go/src/net/http/server.go:2220\nnet/http.(*ServeMux).ServeHTTP
    /usr/local/go/src/net/http/server.go:2747\nnet/http.serverHandler.ServeHTTP
    /usr/local/go/src/net/http/server.go:3210\nnet/http.(*conn).serve
    /usr/local/go/src/net/http/server.go:2092

It looks like this is because skipExcludedNamespace is calling deserializer.Decode(nil, ...) as req.Object.Raw seems to be nil.

func (h *webhookHandler) skipExcludedNamespace(req *admissionv1.AdmissionRequest, excludedProcess process.Process) (bool, error) {
	obj := &unstructured.Unstructured{}
	if _, _, err := deserializer.Decode(req.Object.Raw, nil, obj); err != nil {
		return false, err
	}
        ...
	return isNamespaceExcluded, err
}

What did you expect to happen:

No errors :-)

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Gatekeeper version: 3.18.0
  • Kubernetes version: v1.30.6-eks-7f9249a
@alex-berger alex-berger added the bug Something isn't working label Dec 15, 2024
@ritazh
Copy link
Member

ritazh commented Dec 16, 2024

Thanks for the report @alex-berger!
To help us reproduce this issue, how did you install Gatekeeper v3.18.0? via helm chart or deploy/gatekeeper.yaml?
There is a known issue: #3738 (comment) such that --operation=generate needs to be added to the audit deployment.

There are few reasons why req.Object.Raw could be nil:

  • operation type is DELETE or CONNECT
  • if a CRD does not have a proper schema or preserveUnknownFields: true, the object might fail to serialize
  • bad webhook configuration
  • if request does not modify or need the entire object data (like GET or LIST requests), the Object.Raw field might not be populated

To help further debug the issue, please check the k8s API server logs to inspect the request object that ended in error e.g. operation and if it's a CRD does it have a proper schema. And if you can share the resource to help us reproduce that would be great.

@alex-berger
Copy link
Contributor Author

To help us reproduce this issue, how did you install Gatekeeper v3.18.0? via helm chart or deploy/gatekeeper.yaml?

@ritazh In our case Gatekeeper v3.18.0 was installed via Helm Chart, actually it was upgrade from version v3.15.1 (via FluxCD / GitOps).

To help further debug the issue, please check the k8s API server logs to inspect the request object that ended in error e.g. operation and if it's a CRD does it have a proper schema.

Interestingly, we have not found anything in the k8s API server logs that would indicated failing webhook calls. But, that might well be, because the webhook's Handler function is not failing but just logging the error and then continues (i.e. does not immediately return).

	isExcludedNamespace, err := h.skipExcludedNamespace(&req.AdmissionRequest, process.Webhook)
	if err != nil {
		h.log.Error(err, "error while excluding namespace")
	}

@JaydipGabani
Copy link
Contributor

@alex-berger can you provide the steps to reproduce? I am not able to reproduce the same on my end.

Did you update to 3.18.1? if so, after 3.18.1 are you still facing the same issue?

@shashank-shridhar
Copy link

I'm facing the same issue @JaydipGabani @ritazh .

Gatekeeper Version: 3.18.0
Kubernetes Version: 1.31

@JaydipGabani
Copy link
Contributor

@shashank-shridhar can you share steps to repro? I am not able to reproduce this on my side.

@shashank-shridhar
Copy link

Hi @JaydipGabani , we updated gatekeeper through helm chart from version 3.16.3.
Currently, once we upgraded it - the manager-controller pods went into a CrashLoopBackoff state, and the same "Object 'Kind' is missing in 'null'" issue arose.
We updated gatekeeper by updating our helm charts, templates and CRDs to the ones present in 3.18.0, then updating the helm chart in our environment after which we noticed the issue.

@JaydipGabani
Copy link
Contributor

@shashank-shridhar I tried upgrading from 3.16.3 to 3.18.0 using helm chart with below CT/C on cluster -

CT

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8srequiredlabels
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredLabels
      validation:
        # Schema for the `parameters` field
        openAPIV3Schema:
          type: object
          properties:
            labels:
              type: array
              items:
                type: string
  targets:
  - target: admission.k8s.gatekeeper.sh
    rego: |
      package k8srequiredlabels
      violation[{"msg": msg, "details": {"missing_labels": missing}}] {
        provided := {label | input.review.object.metadata.labels[label]}
        required := {label | label := input.parameters.labels[_]}
        missing := required - provided
        count(missing) > 0
        msg := sprintf("you must provide labels: %v", [missing])
      }

Constraint -

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: pod-must-have-test
spec:
  match:
    scope: "Namespaced"
    namespaces: [ "nginx" ]
    kinds:
    - apiGroups: [ "" ]
      kinds: [ "Pod" ]
  parameters:
    labels: [ "test" ]

I had one violating pod running on the cluster and I created violating pod for 3.16 and after the upgrade to 3.18 (both times the request got denied).

I was still not able to reproduce the similar issue. GK pods never went in crashloopbackoff status and I never got the Object 'Kind' is missing in 'null'" error.

Here is how I installed 3.16.3 -
helm install gatekeeper/gatekeeper --name-template=gatekeeper --namespace gatekeeper-system --create-namespace --version 3.16.3

And here is how I upgraded to 3.18.0 -
helm upgrade gatekeeper gatekeeper/gatekeeper --version 3.18.0 -n gatekeeper-system

If you can, please share the Constraint Template and Constraints you are using along with anyother information that I can use to reproduce this issue to debug it further.

cc: @alex-berger

@alex-berger
Copy link
Contributor Author

@JaydipGabani Unfortunately, we have way too many Constraint Templates and Constraints and I cannot narrow down which of those might cause this.

However, I suspect that req.Object.Raw can be nil under certain circumstances.

  • When the webhook is invoked for a DELETE operation, the Object.Raw field is not set because the object no longer exists in the API server’s store at that point.
  • When the webhook is invoked for operations on subresources (e.g., /status or /scale), the AdmissionRequest.Object field may not be set depending on the subresource type.
  • If the request is a dry-run, Kubernetes may skip populating Object.Raw in some cases.
  • Maybe certain Kubernetes system-generated events might trigger webhooks without providing an actual object payload.

Maybe, we should extends the logging in https://github.com/open-policy-agent/gatekeeper/blob/v3.18.0/pkg/webhook/policy.go#L172 to log some details about the AdmissionRequest (in trace or debug mode).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants