-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus is not fully compatible with OpenMetrics tests #56
Comments
Thank you for this detailed report. The Prometheus openmetrics parser is optimized for performance rather than correctness. It is not "OpenMetrics compatible" in the sense of rejecting every bad exposition format. My understanding of the OpenMetrics test suite is that it is made for exposing libraries. |
Nice (and unexpected) find, thanks @alolita! I see several paths forward, will need some debate; I will also add it to the backlog of the dev summits. |
Super nice to see more usage of the compliance suite. |
This is expected, for the reasons Julien gave. It's designed to accept all valid inputs, and reject what it efficiently can . |
Aye; I still didn't actively have this on my radar any more. Subjective unexpectedness, not objective one. Agreed that emitting is different from parsing, that's one of the foundations of IETF and TCP/IP as per https://en.wikipedia.org/wiki/Robustness_principle To keep the playing field level, we need to make it clearer what MUST/SHOULD/MAY in the suite; I created a list of all tests for convenience and that needs to be split out into more specific lists. |
As suggested by @brian-brazil in our last OpenTelemetry Prometheus Workgroup meeting, tested with All positive test cases from OpenMetrics pass with an For negative test cases from OpenMetrics, there are 59 test cases that are scraped successfully and have
cc: @alolita |
From a quick look, bad_timestamp_4, bad_timestamp_5,and bad_timestamp_7 are all within a single line so could be reasonably caught by the parser efficiently. The rest require handling metadata across lines, which would be difficult to code maintainable and efficiently given that the code is shared with the Prometheus text format parser which allows lines in completely random order. If that wasn't an issue, we could probably do sufficient metadata stuff for validation without having to generate a ton of garbage on every scrape. |
We discussed this topic yesterday at the dev summit. We intend on making sure that emitting libraries are fully compatible on what they emit, and scraping libraries will be able to me more liberal to ensure good performance. From the rolling document:
|
As per previous comment, I am moving this issue to the compliance repository. |
For future reference, this is not true anymore with: prometheus/prometheus#11982 |
What did you do?
We want to ensure OpenMetrics / Prometheus compatibility in the OpenTelemetry Collector. We have been building compatibility tests to verify the OpenMetrics spec is fully supported on both the OpenTelemetry Collector Prometheus receiver and PRW exporter as well as in Prometheus itself.
We used the OpenMetrics metrics test data available at https://github.com/OpenObservability/OpenMetrics/tree/main/tests/testdata/parsers
Out of a total of 161 negative tests in OpenMetrics,
94 tests pass (these tests are dropped) with an 'up' value of 0;
67 tests are not dropped and have an 'up' value of 1 and 22 tests have incorrectly ingested metrics.
In order to test Prometheus itself, we set up a metrics HTTP endpoint that exposes invalid/bad metrics from the OpenMetrics tests. We then configured Prometheus 2.31.0 to scrape the metrics endpoint.
What did you expect to see?
Expected result:
The scrape should fail since the target has invalid metric and the appropriate error should be reported.
For e.g with following metric data: bad_counter_values_1 (https://raw.githubusercontent.com/OpenObservability/OpenMetrics/main/tests/testdata/parsers/bad_counter_values_1/metrics)
What did you see instead? Under which circumstances?
Current behavior:
Scrape is successful. There are multiple bad test cases that are scraped successfully by Prometheus.
For example - Using bad_counter_values_1 (#5 listed below) does not show an error even though it is an negative counter value. According to OpenMetrics tests, this metric should not be parsed.
You can see no error has been reported and the scrape is successful.
Similar to bad_counter_values_1 test case, there are multiple bad test cases where the scrape is successful and metrics are ingested by Prometheus:
Environment
Darwin 20.6.0 x86_64
version=2.31.0
cc: @PaurushGarg @mustafain117
The text was updated successfully, but these errors were encountered: