Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus is not fully compatible with OpenMetrics tests #56

Open
alolita opened this issue Nov 8, 2021 · 10 comments
Open

Prometheus is not fully compatible with OpenMetrics tests #56

alolita opened this issue Nov 8, 2021 · 10 comments

Comments

@alolita
Copy link

alolita commented Nov 8, 2021

What did you do?

We want to ensure OpenMetrics / Prometheus compatibility in the OpenTelemetry Collector. We have been building compatibility tests to verify the OpenMetrics spec is fully supported on both the OpenTelemetry Collector Prometheus receiver and PRW exporter as well as in Prometheus itself.

We used the OpenMetrics metrics test data available at https://github.com/OpenObservability/OpenMetrics/tree/main/tests/testdata/parsers

Out of a total of 161 negative tests in OpenMetrics,
94 tests pass (these tests are dropped) with an 'up' value of 0;
67 tests are not dropped and have an 'up' value of 1 and 22 tests have incorrectly ingested metrics.

In order to test Prometheus itself, we set up a metrics HTTP endpoint that exposes invalid/bad metrics from the OpenMetrics tests. We then configured Prometheus 2.31.0 to scrape the metrics endpoint.

What did you expect to see?

Expected result:
The scrape should fail since the target has invalid metric and the appropriate error should be reported.

For e.g with following metric data: bad_counter_values_1 (https://raw.githubusercontent.com/OpenObservability/OpenMetrics/main/tests/testdata/parsers/bad_counter_values_1/metrics)

# TYPE a counter
a_total -1
# EOF

What did you see instead? Under which circumstances?

Current behavior:
Scrape is successful. There are multiple bad test cases that are scraped successfully by Prometheus.

For example - Using bad_counter_values_1 (#5 listed below) does not show an error even though it is an negative counter value. According to OpenMetrics tests, this metric should not be parsed.

Screenshot 2021-11-03 at 2 49 52 PM

You can see no error has been reported and the scrape is successful.

Screenshot 2021-11-03 at 2 50 20 PM

Similar to bad_counter_values_1 test case, there are multiple bad test cases where the scrape is successful and metrics are ingested by Prometheus:

  1. bad_missing_or_extra_commas_0
  2. bad_metadata_in_wrong_place_1
  3. bad_counter_values_18
  4. bad_grouping_or_ordering_9
  5. bad_counter_values_1
  6. bad_histograms_2
  7. bad_counter_values_16
  8. bad_value_1
  9. bad_missing_or_extra_commas_2
  10. bad_invalid_labels_6
  11. bad_grouping_or_ordering_8
  12. bad_metadata_in_wrong_place_0
  13. bad_grouping_or_ordering_10
  14. bad_grouping_or_ordering_0
  15. bad_value_2
  16. bad_metadata_in_wrong_place_2
  17. bad_text_after_eof_1
  18. bad_value_3
  19. bad_counter_values_0
  20. bad_grouping_or_ordering_3
  21. bad_histograms_3
  22. bad_blank_line

Environment

  • System information:

Darwin 20.6.0 x86_64

  • Prometheus version:

version=2.31.0

  • Prometheus configuration file:
global:
  scrape_interval: 5s

scrape_configs:
  - job_name: "open-metrics-scrape"
    static_configs:
      - targets: ["localhost:3000"]

cc: @PaurushGarg @mustafain117

@roidelapluie
Copy link
Member

roidelapluie commented Nov 8, 2021

Thank you for this detailed report.

The Prometheus openmetrics parser is optimized for performance rather than correctness. It is not "OpenMetrics compatible" in the sense of rejecting every bad exposition format.
We should have a fully compliant openmetrics parser in the future (prometheus/common#321) but it is unlikely that we use it in Prometheus itself for scraping. It would be used e.g. in node exporter text-collector, and potentially in promtool check metrics, which are out of the hot paths.

My understanding of the OpenMetrics test suite is that it is made for exposing libraries.

@RichiH
Copy link
Member

RichiH commented Nov 10, 2021

Nice (and unexpected) find, thanks @alolita!

I see several paths forward, will need some debate; I will also add it to the backlog of the dev summits.

@RichiH
Copy link
Member

RichiH commented Nov 10, 2021

Super nice to see more usage of the compliance suite.

@brian-brazil
Copy link

Nice (and unexpected) find, thanks @alolita!

This is expected, for the reasons Julien gave. It's designed to accept all valid inputs, and reject what it efficiently can .

@RichiH
Copy link
Member

RichiH commented Nov 10, 2021

Aye; I still didn't actively have this on my radar any more. Subjective unexpectedness, not objective one.

Agreed that emitting is different from parsing, that's one of the foundations of IETF and TCP/IP as per https://en.wikipedia.org/wiki/Robustness_principle

To keep the playing field level, we need to make it clearer what MUST/SHOULD/MAY in the suite; I created a list of all tests for convenience and that needs to be split out into more specific lists.

@mustafain117
Copy link

mustafain117 commented Nov 12, 2021

As suggested by @brian-brazil in our last OpenTelemetry Prometheus Workgroup meeting, tested with Content-Type Header set to application/openmetrics-text.

All positive test cases from OpenMetrics pass with an up metric value of 1.

For negative test cases from OpenMetrics, there are 59 test cases that are scraped successfully and have up metric value of 1:

  1. bad_clashing_names_0
  2. bad_clashing_names_1
  3. bad_clashing_names_2
  4. bad_counter_values_0
  5. bad_counter_values_1
  6. bad_counter_values_10
  7. bad_counter_values_11
  8. bad_counter_values_12
  9. bad_counter_values_13
  10. bad_counter_values_14
  11. bad_counter_values_15
  12. bad_counter_values_16
  13. bad_counter_values_17
  14. bad_counter_values_18
  15. bad_counter_values_19
  16. bad_counter_values_2
  17. bad_counter_values_3
  18. bad_counter_values_5
  19. bad_counter_values_6
  20. bad_exemplars_on_unallowed_samples_2
  21. bad_grouping_or_ordering_0
  22. bad_grouping_or_ordering_10
  23. bad_grouping_or_ordering_2
  24. bad_grouping_or_ordering_3
  25. bad_grouping_or_ordering_4
  26. bad_grouping_or_ordering_5
  27. bad_grouping_or_ordering_6
  28. bad_grouping_or_ordering_7
  29. bad_grouping_or_ordering_8
  30. bad_grouping_or_ordering_9
  31. bad_histograms_0
  32. bad_histograms_1
  33. bad_histograms_2
  34. bad_histograms_3
  35. bad_histograms_6
  36. bad_histograms_7
  37. bad_histograms_8
  38. bad_info_and_stateset_values_0
  39. bad_info_and_stateset_values_1
  40. bad_metadata_in_wrong_place_0
  41. bad_metadata_in_wrong_place_1
  42. bad_metadata_in_wrong_place_2
  43. bad_missing_or_invalid_labels_for_a_type_1
  44. bad_missing_or_invalid_labels_for_a_type_3
  45. bad_missing_or_invalid_labels_for_a_type_4
  46. bad_missing_or_invalid_labels_for_a_type_6
  47. bad_missing_or_invalid_labels_for_a_type_7
  48. bad_repeated_metadata_0
  49. bad_repeated_metadata_1
  50. bad_repeated_metadata_3
  51. bad_stateset_info_values_0
  52. bad_stateset_info_values_1
  53. bad_stateset_info_values_2
  54. bad_stateset_info_values_3
  55. bad_timestamp_4
  56. bad_timestamp_5
  57. bad_timestamp_7
  58. bad_unit_6
  59. bad_unit_7

cc: @alolita

@brian-brazil
Copy link

From a quick look, bad_timestamp_4, bad_timestamp_5,and bad_timestamp_7 are all within a single line so could be reasonably caught by the parser efficiently. The rest require handling metadata across lines, which would be difficult to code maintainable and efficiently given that the code is shared with the Prometheus text format parser which allows lines in completely random order. If that wasn't an issue, we could probably do sufficient metadata stuff for validation without having to generate a ton of garbage on every scrape.

@roidelapluie
Copy link
Member

We discussed this topic yesterday at the dev summit. We intend on making sure that emitting libraries are fully compatible on what they emit, and scraping libraries will be able to me more liberal to ensure good performance.

From the rolling document:

CONSENSUS: As per RFC 761, Postel's law, “be conservative in what you do, be liberal in what you accept from others”. We will make certain that the compliance tests treat all implementations the same way and fairly.

@roidelapluie
Copy link
Member

roidelapluie commented Nov 19, 2021

As per previous comment, I am moving this issue to the compliance repository.

@roidelapluie roidelapluie transferred this issue from prometheus/prometheus Nov 19, 2021
@gouthamve
Copy link
Member

We intend on making sure that emitting libraries are fully compatible on what they emit, and scraping libraries will be able to me more liberal to ensure good performance.

For future reference, this is not true anymore with: prometheus/prometheus#11982

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants