Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flux diff artifact: Print the differences in human readable form. #4916

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

octo
Copy link

@octo octo commented Aug 7, 2024

I was hoping to use flux diff artifact as part of a CI pipeline to show the difference between the merge request and the currently deployed artifact. The existing implementation doesn't work for us, since it only compares the checksums.

This commit changes the output produced by flux diff artifact to a list of changes in human readable form. The code is using the dyff package to produce a semantic diff of the YAML files. That means, for example, that changes in the order of map fields are ignored, while changes in the order of lists are not.

Example output:

$ ./bin/flux diff artifact "oci://${IMAGE}" --path=example-service/

spec.replicas  (apps/v1/Deployment/example-service-t205j6/backend-production)
  ± value change
    - 1
    + 7

✗ "oci://registry.gitlab.com/${REDACTED}/example-service-t205j6/deploy:production" and "example-service/" differ

The new --brief / -q flag enables users to revert to the previous behavior of only printing a has changed/has not changed line.

Closes: #3839

@stefanprodan
Copy link
Member

@octo you fork is behind, please rebase with upstream main and force push, also please fix the diff test that's currently failing.

@stefanprodan stefanprodan added area/oci OCI related issues and pull requests area/ux In pursuit of a delightful user experience labels Aug 7, 2024
@octo
Copy link
Author

octo commented Aug 8, 2024

@stefanprodan

✅ rebased
✅ fixed unit test – I was quite confused by the fact that go test ./... passes but make test fails. I suspect that the test harness could be improved, but I haven't tracked down the problem.

@stefanprodan
Copy link
Member

@octo what happens if Helm charts are inside the artifact? Are the templated YAMLs excluded or the CLI will crash?

@stefanprodan
Copy link
Member

We can't assume an artifact has Kubernetes YAMLs, Flux artifacts are used for many other things besides YAML, for example Terraform modules that work with tofu-controller.

I think we need to put the YAML diff behind a flag e.g. --content-type=yaml, by default the diff shouldn't assume YAMLs, the logic in this PR should be opt-in by users via the flag.

@octo
Copy link
Author

octo commented Aug 8, 2024

@octo what happens if Helm charts are inside the artifact? Are the templated YAMLs excluded or the CLI will crash?

I don't know, since I haven't tried. My suspicion is that the YAML parsing will fail and that the command errors out.

In that case, we're probably better off using a non-semantic diff, something akin to using diff -u on the command line. I'll see if I can come up with something.

@stefanprodan
Copy link
Member

I think there is value in showing a semantic diff for YAML content, but we need to make it opt-in and exclude templated files, like those in Helm charts.

@octo
Copy link
Author

octo commented Aug 8, 2024

@stefanprodan I added code that uses diff -ur to compare the two directories, which is now the default.

The --semantic-diff yaml flag can be used to trigger the dyff based semantic diff.

I unified the handling of the two arguments, allowing each to be a directory, .tgz file, or OCI url. This allows users to diff in any direction they desire. With this change, it feld more natural to use two positional arguments. I left the --path flag in for backwards compatibility though.

Unfortunately, the --ignore-paths flag is no longer honored. I'm not sure I'll have the time to fix this before going on vacation. I'm happy to look into it, but I'll be out for three weeks.

Example: `# Check if local files differ from remote
flux diff artifact oci://ghcr.io/stefanprodan/manifests:podinfo:6.2.0 --path=./kustomize`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change, please undo it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous flux diff artifact <path1> --path=<path2> syntax is still supported for backwards compatibility. This simply makes flux diff artifact <path1> <path2> the documented syntax. If you don't like it, I can revert back to the --path syntax, but compatibility should not be broken.

cmd/flux/diff_artifact.go Outdated Show resolved Hide resolved
cmd/flux/diff_artifact.go Outdated Show resolved Hide resolved
args = append(args, c.flags...)
args = append(args, fromDir, toDir)

cmd := exec.CommandContext(ctx, c.name, args...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was discussed in the flux dev meeting recently. We have avoided executing external commands so far. Since this is part of the CLI, the default diff utility may not be present in platforms like Windows and the alternative tools may have their own ways of displaying diff. I think it would be better if we can have consistency when we can.
I believe for non-yaml content, we want to have text diff in unified format. Upstream Go has a private diff implementation https://github.com/golang/go/blob/master/src/internal/diff/diff.go which provides a unified diff result given two file contents. For an example, try running this snippet in the Go playground https://go.dev/play/p/TPPc4BAMKDp . The example shows a diff of some terraform configuration.

In the past, we have taken other such code from upstream Go and adapted it for our needs, for example https://github.com/fluxcd/pkg/blob/tar/v0.8.0/tar/tar.go#L6. We can do something similar for this. We can have a new diff package in https://github.com/fluxcd/pkg with some niceness for our use case. It would be nice to add an option to apply colors to added and removed lines. The same can be surfaced as a flag to toggle colored output.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ done – the --differ flag now takes one of three arguments:

  1. dyff: Use the dyff package to generate a semantic YAML diff.

  2. external: execute the command in the FLUX_EXTERNAL_DIFF environment variable. Using --differ external without setting FLUX_EXTERNAL_DIFF is now an error, there is no default command being executed. That means flux makes no assumptions about the system is runs on.

  3. unified: uses github.com/hexops/gotextdiff to generate a unified diff without invoking external commands.

    I looked into unified diffs and Golang quite a bit more than I wanted to — the situation is surprisingly bad. I opted to use github.com/hexops/gotextdiff, which is a clone of one of the internal-only packages by the Go authors. The repo is marked as archived as of earlier this year, but overall this seems like the least bad option.

In the past, we have taken other such code from upstream Go and adapted it for our needs

As a maintainer myself, I'd work hard to avoid that. Copying and adapting code rather than referencing it means that all the maintenance overhead is on the Flux maintainers, which can snowball quickly.

Copy link
Contributor

@darkowlzz darkowlzz Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the implementation. Sorry for the late reply, we were busy with a release and I wanted to discuss this in the dev meeting before further discussion. I'm adding section headers below to make it easier to see the topics discussed below, as I have heard my long replies are difficult to parse for some 😄

External command execution

We discussed about the external command execution today and we still would like to not execute an external command for the same reasons as before. We don't execute other binaries/commands anywhere in Flux yet and this can open up questions about similar considerations elsewhere. We can have a consistent diff result with our diff code independently. If there's a need to use any other diff program, users can always combine flux pull artifact with their program to achieve that. If needed, we can document it in our docs or even publish blog post about such use case.
We can discuss this further if you would like to.

Unified diff implementation

Regarding the unified diff implementation, in my initial research, I did come across gotextdiff which used the code from golang/tools and found it to be archived. Then I found a fork of it https://github.com/Shogoki/gotextdiff which uses the code from golang/go, but doesn't seem to have much going on, stale README from the original repo.
I did further research today around the diff algorithms, more specifically the Git diff algorithms, refer https://git-scm.com/docs/diff-options#Documentation/diff-options.txt---diff-algorithmpatienceminimalhistogrammyers. Found this nice article that describes the differences between the algorithms https://luppeng.wordpress.com/2020/10/10/when-to-use-each-of-the-git-diff-algorithms/.
The default Git diff and GNU diff algorithm is myers diff algorithm. The algorithm used in golang/tools also implements myers diff algorithm. The algorithm in golang/go implements anchored diff algorithm, based on the patience diff algorithm. Patience diff algorithm seems to produce more clear and efficient result based on multiple sources I have read. This finding will be used below.

In today's meeting, we discussed about depending on an archived package. Since the package is archived, we have no option than to take the ownership of the code. If we own the code, we can fix any bugs we find and also add improvements to it as we need them. Hence, we can't depend on the archived gotextdiff package. In addition, it may be better to use patience diff algorithm over myers diff algorithm. Since we have to copy the diff code due to different reasons, it seems better to use the one from golang/go, which implements patience diff algorithm, and add the modifications I mentioned above regarding coloring of the unified diff results.
I hope this provides enough details about our considerations. We can discuss more if you have any other concerns and ideas about it.

Diff result presentation

In the meeting, we also discussed about the presentation of the diff result. The current output looks like the following

fromFiles = {"kustomize/deployment.yaml", "kustomize/hpa.yaml", "kustomize/kustomization.yaml", "kustomize/service.yaml"}
toFiles = {"kustomize/hpa.yaml"}
Only in /tmp/flux-diff-artifact410276048: kustomize/deployment.yaml
...

I did notice that fromFiles and toFiles are written to stderr, maybe to separate the debug output from the actual result.

I don't have a good example to share of what we would like it to be yet, I can work on that and share later. But it's along the lines of a summary of which files were modified, added, deleted, similar to what git status shows, as this would be more relevant for the use case of a Flux OCI user, in addition to the actual diff output. Maybe the it would involve another flag to show the full diff and only show a summary by default. I'll see if I can share some good examples after thinking more about it. If this gives you any idea, please feel free to share and we can discuss about it.

Copy link
Contributor

@darkowlzz darkowlzz Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I came up with for the output. I have not discussed the details of this with anyone, so this is more as a proposal to gather some feedback.

I took some liberty in what constitutes a diff result, as mentioned above about presenting a simplified summary by default and using an additional option to show the full diff of what changed. I think it's okay for a flux user who may just need a brief summary of which files changed.

In the following, I used oci://ghcr.io/stefanprodan/manifests/podinfo:6.2.0 as the artifact. I deleted service.yaml, introduced ingress.yaml and modified hpa.yaml.

For the default diff result, we can show something like the following:

$ flux diff artifact oci://ghcr.io/stefanprodan/manifests/podinfo:6.2.0 ./test-manifests/
new file: kustomize/ingress.yaml
deleted:  kustomize/service.yaml
modified: kustomize/hpa.yaml

Maybe we can color these too, green for new file and red for deleted, and maybe for modified too. Or maybe there's no need for coloring at all here.
Unmodified files are not listed.

Update: An advantage of showing a summary with just the file names is that it can also list binary files which can't be diffed.

To show details of the changes, a new flag can be passed, I'm calling this --full, --long may also work, but we can discuss and change it if we do things this way.

$ flux diff artifact oci://ghcr.io/stefanprodan/manifests/podinfo:6.2.0 ./test-manifests/ --full
--- kustomize/hpa.yaml
+++ kustomize/hpa.yaml
@@ -2,6 +2,7 @@
 kind: HorizontalPodAutoscaler
 metadata:
   name: podinfo
+  namespace: default
 spec:
   scaleTargetRef:
     apiVersion: apps/v1
--- /dev/null
+++ kustomize/ingress.yaml
@@ -1 +1,6 @@
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  name: minimal-ingress
+  annotations:
+    nginx.ingress.kubernetes.io/rewrite-target: /
...

This is the detailed diff. Only changes are shown. Unmodified files are not shown. New file content have all the file content with + prefix and deleted file content have - prefix.
For a modified file, the diff starts with the relative name of the file with respect to the artifact:

--- kustomize/hpa.yaml
+++ kustomize/hpa.yaml
...

For new file or deleted file, similar to what git diff does, we can show the empty part as /dev/null at the top:

--- /dev/null
+++ kustomize/ingress.yaml
...
...
--- kustomize/service.yaml
+++ /dev/null
...

For binary files, we can either skip it as we can't show any diff of it or, we can also do what git diff does, print a line about it, something like:

...
Binary files /dev/null and b/abinaryfile differ
...

This may be overkill but this is what I think of it as of now. All these are just for the plain text diff. YAML manifests will just use what dyff provides. But maybe we can modify that output too if we agree on some enhancements we find here. Update: Thinking more about it, I think regardless of the differ, doing the summary by default seems better and only use dyff or text diff for detailed diff.

Open for any suggestion and discussion about it.

octo added a commit to octo/fluxcd-pkg that referenced this pull request Sep 19, 2024
This option allows to exclude certain files from extraction.

This is going to be used by `flux diff artifact` to only extract "interesting"
files from an archive for comparison with another source.

See also: fluxcd/flux2#4916

Signed-off-by: Florian Forster <fforster@gitlab.com>
octo added 10 commits September 23, 2024 11:08
I was hoping to use `flux diff artifact` as part of a CI pipeline to show the
difference between the merge request and the currently deployed artifact. The
existing implementation doesn't work for us, since it only compares the
checksums.

This commit changes the output produced by `flux diff artifact` to a list of
changes in human readable form. The code is using the `dyff` package to produce
a semantic diff of the YAML files. That means, for example, that changes in the
order of map fields are ignored, while changes in the order of lists are not.

Example output:

```
$ ./bin/flux diff artifact "oci://${IMAGE}" --path=example-service/

spec.replicas  (apps/v1/Deployment/example-service-t205j6/backend-production)
  ± value change
    - 1
    + 7

✗ "oci://registry.gitlab.com/${REDACTED}/example-service-t205j6/deploy:production" and "example-service/" differ
```

The new `--brief` / `-q` flag enables users to revert to the previous behavior
of only printing a has changed/has not changed line.

Signed-off-by: Florian Forster <fforster@gitlab.com>
This fixes the `TestDiffArtifact` unit test.

Signed-off-by: Florian Forster <fforster@gitlab.com>
Also updates the list of options passed to `dyff.CompareInputFiles` to be the
same as in the internal `build` package.

Signed-off-by: Florian Forster <fforster@gitlab.com>
Signed-off-by: Florian Forster <fforster@gitlab.com>
Signed-off-by: Florian Forster <fforster@gitlab.com>
Artifacts may contain other files types, not just YAML files, meaning the
semantic YAML diff provided by `dyff` is not a safe default.

This change implements purely textual diffing using the `diff` command line
tool. This tool can be overridden by users using the `FLUX_EXTERNAL_DIFF`
environment variable.

Users that store Kubernetes resource manifests in the artifact can re-enable
the semantic YAML diff behavior using the `--semantic-diff yaml` flag.

The arguments to the diff subcommand may be:

* A directory
* A .tar.gz or .tgz file
* An OCI url
* An individual file

The two arguments to the command are treated the same way, allowing users to
diff in either direction.

Signed-off-by: Florian Forster <fforster@gitlab.com>
Signed-off-by: Florian Forster <fforster@gitlab.com>
Signed-off-by: Florian Forster <fforster@gitlab.com>
Signed-off-by: Florian Forster <fforster@gitlab.com>
Signed-off-by: Florian Forster <fforster@gitlab.com>
@octo
Copy link
Author

octo commented Sep 23, 2024

@stefanprodan @darkowlzz Thank you so much for your reviews! I have addressed all review comments and rebased my branch on top of main.

@@ -6,6 +6,7 @@ go 1.22.4
replace gopkg.in/yaml.v3 => gopkg.in/yaml.v3 v3.0.1

require (
bitbucket.org/creachadair/stringset v0.0.14
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This project is not Apache 2.0 licensed, so we can't use it.

Copy link
Contributor

@darkowlzz darkowlzz Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my reading, it seems set union operation is the primary need for this package in the code, which is implemented in https://bitbucket.org/creachadair/stringset/src/v0.0.14/stringset.go#lines-156:168. I believe we should be able to use Set from apimachinery https://github.com/kubernetes/apimachinery/blob/v0.31.0/pkg/util/sets/set.go#L143, or implement the same using the their data types.

@@ -157,6 +159,7 @@ require (
github.com/hashicorp/go-version v1.7.0 // indirect
github.com/hashicorp/golang-lru/arc/v2 v2.0.5 // indirect
github.com/hashicorp/golang-lru/v2 v2.0.5 // indirect
github.com/hexops/gotextdiff v1.0.3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This package is no longer maintained, so we can't use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/oci OCI related issues and pull requests area/ux In pursuit of a delightful user experience
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow diffing of two OCI artifacts
3 participants