-
Notifications
You must be signed in to change notification settings - Fork 623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Add build-in TLS support #3368
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,137 @@ | ||
# RFC-NNNN Built-in TLS support for Source and Notification Controllers | ||
|
||
**Status:** provisional | ||
|
||
**Creation date:** 2022-12-05 | ||
|
||
**Last update:** 2022-12-05 | ||
|
||
## Summary | ||
|
||
Create built-in TLS support for both Source and Notification controllers, | ||
so that in-cluster cross-component communications are always encrypted. | ||
|
||
## Motivation | ||
|
||
Due to internal company policies, Data Protection laws and regulations, | ||
Flux users may be required to enforce encryption of data whilst in-transit. | ||
As of `v0.37.0`, there is no built-in way to provide TLS for Flux' endpoints. | ||
|
||
The existing endpoints being: | ||
|
||
1. Source Controller: artifacts storage, used by Kustomize and HelmRelease | ||
controllers. | ||
2. Notification Controller: events receivers, used by all other Flux components | ||
to submit events. | ||
3. Notification Controller: webhooks for external services to contact Flux. | ||
|
||
The webhooks endpoing would be typically used for external use only, and in | ||
that use case the recommended approach is to enable TLS at ingress. | ||
However, this proposal covers the first two endpoints to allow always encrypted | ||
in-cluster communications. | ||
|
||
### Goals | ||
|
||
- Add TLS support for Notification and Source controllers. | ||
- Establish a migration process when enabling TLS on existing clusters. | ||
- Define best practices on enabling the feature. | ||
|
||
### Non-Goals | ||
|
||
- Change how CA trust is established across Flux containers. | ||
- Define TLS rotation mechanisms. | ||
|
||
## Proposal | ||
|
||
Controllers hosting web servers will introduce a new flag to configure the | ||
[TLS secret]: `--tls-secret`. The Secret type must be `kubernetes.io/tls`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why take dependency on a specific K8s secret hosting approach directly in the components? Can you please use a file reference instead. Then it will be easier to support other secret hosting approaches such as cloud key vaults. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How are you using Flux then? All our APIs rely on Kubernetes Secrets, you can’t connect to Git/OCI/Helm repositories unless you store the auth in a secret. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
So you're using Secrets Store CSI Driver and mounting them directly into containers? I see why this might be a reasonable policy but as Stefan said, Flux doesn't work without Secrets at the moment and likely never will. Otherwise you'd have to mount all the Secrets any user might ever need into the Flux controller containers. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @makkes - correct, we use Secret CSI drivers, or you can create a wrapper/starter script to launch the binaries and this starter script pulls the secrets from key vaults and places them on tempfs volumes. @stefanprodan - based on the documentation I see that Flux can use cloud identities to access artifact storages. We can still use K8s secrets as a config store for not secret material - such as public keys, application IDs, etc. But we cannot use K8s secrets for storing secret material like private keys. Consuming secret material directly from K8s secret resources is a gap that must be fixed for Flux to be used in different deployments that rely on consuming secrets as files. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consuming secrets as files breaks Flux multi-tenancy. We rely on Kubernetes RBAC to isolate tenant, reading files from the container FS is not subject to RBAC while reading secrets from the API is. Have you considered enabling encryption at rest for etcd? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I might be confused here. But wouldn't the TLS k8s secret also mounted as a file in the pod. The certwatcher project suggested allows file based certs only. Thus not sure how giving an option for a TLS on file-based key inside the pod can be less secure as even with secrets we need to do the same. I understand going default with secret to not add any additional dependencies for the default secure path. But if the customer is willing to do the work to use a Secret CSI drivers to mount a file directly, shouldn't that be allowed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Secrets aren't mounted into the Flux Pods but read into memory through the Kubernetes API at runtime. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could add a flag for pointing to a cert on disk, then it's up to the Flux admin to figure out the mounts and so on. With this flag, we'll read the cert at startup only, the Flux admin can bump some pod annotation in Git to restart the controllers to reload the cert. This flag e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks , this should resolve the cases where admin want to mount the cert on their own. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Ah I haven't looked into that, if it supports it, then sure. |
||
|
||
Invalid secret types, or content, will cause the controller to fail to start. | ||
|
||
### Source Controller | ||
|
||
When TLS is enabled, the File Storage endpoint will serve both HTTP and HTTPS. | ||
Existing source objects previously reconciled won't be patched to a new | ||
`.status.artifact.url`, but all new reconciliations will cause that field to | ||
start with `https://`. | ||
|
||
The HTTP endpoint will only serve a permanent redirect (301) to the HTTPS endpoint, | ||
enabling a smooth transition when users enable this feature in existing clusters. | ||
|
||
The HTTPS endpoint will be served on port `9443` by default, which will be | ||
configurable via new flag `--storage-https-addr` or environment variable | ||
`STORAGE_HTTPS_ADDR`. The existing `source-controller` service will bind port | ||
`443` to `9443`. | ||
|
||
### Notification Controller | ||
|
||
When TLS is enabled, the HTTP endpoints will be disabled. | ||
|
||
#### Events endpoint | ||
|
||
The events HTTPS endpoint will be served on port `9443` by default, which will be | ||
configurable via new flag `--events-https-addr`. The existing `notification-controller` | ||
service will bind port `443` to `9443`. | ||
|
||
All flux components will need to be started with the (existing) flag below, for the | ||
TLS only endpoint to be used: | ||
`--events-addr=https://notification-controller.flux-system.svc.cluster.local./` | ||
|
||
### User Stories | ||
|
||
<!-- | ||
Optional if existing discussions and/or issues are linked in the motivation section. | ||
--> | ||
|
||
### Alternatives | ||
|
||
#### Delegate in-trasit data encryption to CNI or Service Mesh | ||
|
||
Instead of built-in support, it was considered the option of delegating this to | ||
external components that support the enablement of end-to-end encryption via TLS | ||
or mTLS. One of the issues with this approach is that the data would still come | ||
out of Flux unencrypted, and therefore privileged co-located components in the | ||
same worker node could be privy to the exchanged plain-text. | ||
|
||
Another point considered for not pursuing this alternative was that, being one of | ||
the first workloads inside a cluster, Flux should be self-reliant in assuring the | ||
confidentiality and integrity of its communications. | ||
|
||
#### Auto-detection of TLS events endpoint | ||
|
||
Instead of having to change the `--event-addr` on each Flux component, an auto | ||
detection mechanism could be implemented to prefer the TLS endpoint whenever the | ||
notification controller's service has one working. | ||
|
||
The challenge with such approach is that the security of cross component | ||
communication would be non deterministic from a client perspective, and would depend | ||
on the Notification controller having an active TLS endpoint at the time in which each | ||
Flux component started. The current approach requires additional configuration, | ||
but fails safe. | ||
|
||
## Design Details | ||
|
||
### TLS Versions and Ciphers Supported | ||
|
||
For reduced complexity and increased security, only TLS 1.3 is supported. | ||
|
||
### Detect TLS Certificate changes | ||
|
||
Controllers will watch the TLS secret for changes, once detected it will trigger a | ||
pod restart to refresh the certificates being used. | ||
Comment on lines
+120
to
+121
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can use https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/certwatcher/certwatcher.go and reload the cert without a pod restart. Especially for source-controller, a restart is very expensive and should be avoided. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like the idea. This would cause all mid-flight requests to be dropped, which is a minor impact compared to rebuilding the storage. Besides, the mid-fight requests would be automatically retried within seconds anyway. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, definitely that will be a considerable saving over the expensive restarts. |
||
|
||
### Expand TLS support for the webhooks endpoint | ||
|
||
The TLS support could be expanded to the Notification controller webhooks endpoint | ||
to enable secure in-cluster communications. | ||
Comment on lines
+125
to
+126
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The main issue I see with supporting this is the hybrid scenario in which we need to support in-cluster and external (via ingress). In such scenarios, the ingress would need to trust the CA used by Flux, which can be problematic from a security stand-point. |
||
|
||
## Implementation History | ||
|
||
<!-- | ||
Major milestones in the lifecycle of the RFC such as: | ||
- The first Flux release where an initial version of the RFC was available. | ||
- The version of Flux where the RFC graduated to general availability. | ||
- The version of Flux where the RFC was retired or superseded. | ||
--> | ||
|
||
[TLS secret]: https://kubernetes.io/docs/concepts/configuration/secret/#tls-secrets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the proposed architecture be compatible with SPIFFE/SPIRE? I can envisage a scenario where I would like to manage identity using SPIFFE SVIDs delivered via SPIRE rather than via Kubernetes TLS secrets. In practice I think this would mean adding a unix domain socket flag to the controllers with an accompanying listener to receive SVID bundles from the SPIRE agent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question @phoban01. I see the built-in TLS support as orthogonal to the support of SPIFFE/SPIRE. The idea here is that users would be able to provision Flux with secure comms without requiring any additional components.
On that point, some service meshes should be able to implement that (mTLS based on SPIFFE/SPIRE) transparently on top of Flux, without any changes required to the source code. However, if there are good use cases for built-in support, that should be a RFC on its own right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spire support is nice, but will require for the Spire itself to be managed outside Flux. I think using native PKI is a good way of avoiding this as mentioned in the Alternatives section.