-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Cluster API deployment method for TAS #108
base: master
Are you sure you want to change the base?
Conversation
I've seen that in the meantime you uploaded a CONTRIBUTORS.md. It asks to sign-off using our real name, but I signed-off using my pseudonym "criscola". Should I amend/resubmit the PR? |
Hi! |
FYI I stumbled upon CAPD, or CAPI in Docker. It's different because it uses the new |
903efc3
to
63ea8d3
Compare
Avoids failures when applying the TAS Service Account through ClusterResourceSet. Relevant error message was: "failed to create object /v1, Kind=ServiceAccount /telemetry-aware-scheduling-service-account: an empty namespace may not be set during creation". Signed-off-by: Cristiano Colangelo <git@criscola.dev>
Signed-off-by: Cristiano Colangelo <git@criscola.dev>
Signed-off-by: Madalina Lazar <madalina.lazar@intel.com>
a15fd67
to
4d7d6df
Compare
I amended my commits and signed-off with my name. Let me know if there is anything else. I managed to make CAPD work as well but discovered something that will likely require a PR on CAPD side, so I'm holding it off for now as I could be adding outdated information by then. |
Signed-off-by: Madalina Lazar <madalina.lazar@intel.com>
Hi, |
Quick update: I will add also the instructions on the CAPD (Cluster API Docker) deployment for local testing/development soon. |
Hi! I added the specific notes for a Docker deployment, it's very similar to a deployment on a generic provider like GCP, but there are some caveats because it uses an experimental feature called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had issues setting up TAS with cluster API after following this wiki. There are one/two issues that might be the problem (i.e. naming of capi-quickstart.yaml / your-manifest.yaml).
The rest of these comments are regarding the readability/structure of the wiki.
telemetry-aware-scheduling/deploy/cluster-api/docker/capi-docker.md
Outdated
Show resolved
Hide resolved
telemetry-aware-scheduling/deploy/cluster-api/docker/capi-docker.md
Outdated
Show resolved
Hide resolved
telemetry-aware-scheduling/deploy/cluster-api/docker/kubeadmcontrolplanetemplate-patch.yaml
Outdated
Show resolved
Hide resolved
telemetry-aware-scheduling/deploy/cluster-api/docker/capi-docker.md
Outdated
Show resolved
Hide resolved
telemetry-aware-scheduling/deploy/cluster-api/docker/capi-docker.md
Outdated
Show resolved
Hide resolved
ClusterResourceSets resources are already given to you in `clusterresourcesets.yaml`. | ||
Apply them to the management cluster with `kubectl apply -f clusterresourcesets.yaml` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this refer to docker/clusterresourcesets.yaml? And for generic, it also refers to it's own clusterresourcesets.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CRS resources are actually the same. Do you think it would make more sense to have only one file for both vs duplicating it, if yes any idea where to put the CRS file in the folder tree?
Edit: I created a shared folder and referenced the common resources in generic and docker guides. Should be more maintainable. If they go out of sync we can always move them out of the shared folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having the shared folder and just referencing the files is a good idea. Do we still need the clusterresourcessets.yaml file in both (docker, generic) folders?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope I think we can just have it in the shared folder, it's also linked in both guides so users should find it easily.
Add Calico CRS resources for simplicity. Update docs relative to Calico CRS.
Move cluster-patch.yaml to shared folder and rename docs.
Forgot to rename capi-quickstart somewhere.
@madalazar I addressed all the comments - thanks for the detailed feedback. I also improved a few parts/solved a few problems. Feel free to continue with the review. |
> capi-quickstart.yaml | ||
``` | ||
|
||
If Kind was running correctly, and the Docker provider was initialized with the previous command, the command will return nothing to indicate success. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is needed, seems like a remnant from the Docker/Kind installation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be something I added after this comment, should it be removed?
I want to go through the installation steps in the Readme one last time just to check that everything is in order. I'm planning to do that this week/Tuesday the latest. I will update the PR |
@criscola HI, sorry it took so long for me to get back to you with a reply. Between then and now I've been looking at the security side of this solution (the new components that are brought in are they secure(d), if not what's missing, the infrastructure that we are creating is that secure etc.). The biggest problem that I found (and that is because I have been able to test only the CAPI Docker provider part of the solution) is with a component that CAPD is introducing: capi-quickstart-lb. This component opens up a port to the outside world on the host that it's installed on and even though the connection is secure, the cyphers use are out-of-date. There are 5 more issues, but I think they should be more manageable to address:
With most of the issues being the presence of 'secret', 'key' in the config maps:
For this particular issues, could we instead of config maps just install the components as they come:
I'm in the process of releasing a new version of TAS and tried to get this PR added to it, but because of these issues it might not happen. Let me know what you think of my suggestions and apologies again for the delay, |
Hi! Yeah I see about the docker container. Keep in mind this is not supposed to be used for any production release, only development, so I wouldn't worry about it (but good you opened an issue, let's see if anything moves there). For the configmaps, those are necessary if you wanted to automate the deployment through CRS resources. The user could gitignore them if it wants. Not sure if I would remove them since the purpose of this is a bit to have everything in declarative configuration. Moreover I put a disclaimer on the docs to tell the user to not commit TLS secrets to their repository as it's bad security practice. |
Hi, here are a couple of updates from my side:
In the capi-quickstart.yaml, we need to make the following changes:
Apologies for the delay, |
Hi Madalina, thanks for your answer. Sorry I'm a bit tied up with another task right now but as soon as I'm done with it I want to address your points/update the PR. Thanks for the patience. |
Hi Madalina, one question about your last point:
In my opinion, this defeats a bit the purpose of the Cluster API deployment option, because it prevents a fully automatic deployment of the TAS. When we use ClusterResourceSet, we effectively store the resources to deploy in ConfigMap which will then used to deploy to workload clusters from the management cluster. Everything is then stored in git, in a true GitOps way. There are other ways to do this, but since CRS are from Kubernetes SIG, this seems a good default method to include here, in our case, we use ArgoCD to manage Prometheus/Adapter but we started from CRS. Is the concern mainly with an increased maintenance burden on the contributors' side? As long as helm charts are used, I believe the user will always need to render the resources in the same way, so there should be no additional work to be done in the future on our side. Of course if something changes to the other configs (like the scheduler's) then it needs to be changed in the CAPI manifests but the change should be well localized. |
This PR adds a new deployment method for the Telemetry Aware Scheduler using Cluster API.
It would be great if someone could try it out and let me know if there's any issue/suggestion to improve it. The deployment was tested on Kubernetes v1.25 with GCP provider.