Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Kafka Pod crashes due to insufficient permission when removing stale log files after runAsUser is changed #11085

Open
kos-team opened this issue Jan 28, 2025 · 1 comment

Comments

@kos-team
Copy link

Bug Description

When we set the spec.kafka.template.pod.securityContext.runAsUser to 1000 on an existing cluster, the Kafka operator restarts the Pods with the updated the security context. However, the restarted Kafka Pods crash directly due to the following error:

2025-01-23 20:03:53,386 INFO Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation (org.apache.zookeeper.common.X509Util) [main]
All of the log directories are already formatted.
KRaft storage formatting is done
Removing quorum-state file
rm: cannot remove '/var/lib/kafka/data-0/kafka-log0/__cluster_metadata-0/quorum-state': Permission denied

We found this is because the file that Kafka is trying to remove is owned by kafka:root with permission bits set as rw-r--r--, and the kafka user has the uid of 1001. This causes the restarted Kafka Pod (with UID of 1000) to not have the sufficient permission to delete the file.

Steps to reproduce

  1. Deploy the Kafka operator, node pool, and the Kafka CR without any SecurityContext set
  2. Change the SecurityContext with runAsUser set as 1000

Expected behavior

The Kafka operator should be able correctly reconfigure the Kafka cluster with new security context.

Strimzi version

quay.io/strimzi/operator:0.45.0

Kubernetes version

v1.28.0

Installation method

YAML

Infrastructure

kind v0.21.0 go1.22.6 linux/amd64

Configuration files and logs

  1. Initial Kafka CR file
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  annotations:
    strimzi.io/kraft: enabled
    strimzi.io/node-pools: enabled
  name: test-cluster
spec:
  entityOperator:
    topicOperator: {}
    userOperator: {}
  kafka:
    config:
      default.replication.factor: 3
      min.insync.replicas: 2
      offsets.topic.replication.factor: 3
      transaction.state.log.min.isr: 2
      transaction.state.log.replication.factor: 3
    listeners:
    - name: plain
      port: 9092
      tls: false
      type: internal
    - name: tls
      port: 9093
      tls: true
      type: internal
    metadataVersion: 3.9-IV0
    version: 3.9.0
  1. Kafka CR file to change the securityContext
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  annotations:
    strimzi.io/kraft: enabled
    strimzi.io/node-pools: enabled
  name: test-cluster
spec:
  entityOperator:
    topicOperator: {}
    userOperator: {}
  kafka:
    config:
      default.replication.factor: 3
      min.insync.replicas: 2
      offsets.topic.replication.factor: 3
      transaction.state.log.min.isr: 2
      transaction.state.log.replication.factor: 3
    listeners:
    - name: plain
      port: 9092
      tls: false
      type: internal
    - name: tls
      port: 9093
      tls: true
      type: internal
    metadataVersion: 3.9-IV0
    template:
      pod:
        securityContext:
          runAsUser: 1000
    version: 3.9.0

Additional context

No response

@scholzj
Copy link
Member

scholzj commented Jan 28, 2025

This is not a Strimzi bug. It is your responsibility to make sure the user you choose to use in the security context has the required access to the files stored on the volumes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants