Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/awsfirehose] Refactor to use pdata unmarshaler interfaces #37361

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

axw
Copy link
Contributor

@axw axw commented Jan 21, 2025

Description

Refactor unmarshallers to fit into the encoding framework.

The internal unmarshallers now implement plog.Unmarshaler and pmetric.Unmarshaler. This will enable us to use encoding extensions in a followup, and will enable us to extract the unmarshallers later as encoding extensions.

As a result of the interface change, the unmarshallers now unmarshal a single record at a time, which means we cannot merge resources/metrics as we go, and only within each record. This impacts performance, so to offset that we implement various optimisations:

  • Use json-iterator for decoding JSON
  • Use klauspost/compress for decompressing gzip
  • Pool gzip readers
  • Remove pointer type from cwMetricValue to avoid allocation
  • Don't read the whole request body into memory
  • Reuse buffer for decoding base64; decode as we go

There are more optimisations we can make to reduce memory allocations, e.g. avoid reflection when decoding JSON.

There's a fix for a subtle bug in the cwmetrics unmarshaller where the unit of a metric was not considered part of its identity, and so two metrics that differed only by unit would be merged.

Link to tracking issue

Preparation for #37113

Testing

  • Added tests for consuming requests containing multiple records.
  • Added benchmarks for the full request/response flow, for cwlogs and cwmetrics record types. There's an increase in memory usage for cwmetrics unmarshalling, and decrease for cwlogs unmarshalling. CPU usage is better across the board.
goos: linux
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awsfirehosereceiver
cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics     
                                                              │ /tmp/old.txt  │            /tmp/new.txt             │
                                                              │    sec/op     │    sec/op     vs base               │
LogsConsumer_cwlogs/10resources_10records_1logs-16              142.03µ ±  8%   75.85µ ± 16%  -46.60% (p=0.002 n=6)
LogsConsumer_cwlogs/10resources_10records_10logs-16              413.1µ ±  5%   208.0µ ± 23%  -49.65% (p=0.002 n=6)
LogsConsumer_cwlogs/10resources_100records_1logs-16             1435.8µ ±  8%   728.0µ ± 11%  -49.30% (p=0.002 n=6)
LogsConsumer_cwlogs/10resources_100records_10logs-16             4.200m ±  2%   1.875m ±  4%  -55.34% (p=0.002 n=6)
MetricsConsumer_cwmetrics/10resources_10records_1metrics-16     104.30µ ± 16%   70.26µ ± 12%  -32.64% (p=0.002 n=6)
MetricsConsumer_cwmetrics/10resources_10records_10metrics-16     821.7µ ±  7%   505.5µ ±  5%  -38.48% (p=0.002 n=6)
MetricsConsumer_cwmetrics/10resources_100records_1metrics-16     780.2µ ±  5%   647.0µ ±  8%  -17.07% (p=0.002 n=6)
MetricsConsumer_cwmetrics/10resources_100records_10metrics-16    7.823m ±  5%   5.297m ± 11%  -32.29% (p=0.002 n=6)
geomean                                                          809.9µ         475.7µ        -41.26%

                                                              │ /tmp/old.txt  │             /tmp/new.txt             │
                                                              │     B/op      │     B/op      vs base                │
LogsConsumer_cwlogs/10resources_10records_1logs-16              437.78Ki ± 0%   61.70Ki ± 1%   -85.91% (p=0.002 n=6)
LogsConsumer_cwlogs/10resources_10records_10logs-16              533.7Ki ± 0%   124.9Ki ± 2%   -76.60% (p=0.002 n=6)
LogsConsumer_cwlogs/10resources_100records_1logs-16             4319.1Ki ± 0%   550.1Ki ± 0%   -87.26% (p=0.002 n=6)
LogsConsumer_cwlogs/10resources_100records_10logs-16             5.167Mi ± 0%   1.172Mi ± 1%   -77.31% (p=0.002 n=6)
MetricsConsumer_cwmetrics/10resources_10records_1metrics-16      47.88Ki ± 5%   83.71Ki ± 0%   +74.84% (p=0.002 n=6)
MetricsConsumer_cwmetrics/10resources_10records_10metrics-16     390.0Ki ± 1%   415.4Ki ± 1%    +6.51% (p=0.002 n=6)
MetricsConsumer_cwmetrics/10resources_100records_1metrics-16     358.9Ki ± 0%   772.9Ki ± 0%  +115.37% (p=0.002 n=6)
MetricsConsumer_cwmetrics/10resources_100records_10metrics-16    3.730Mi ± 0%   3.962Mi ± 0%    +6.23% (p=0.002 n=6)
geomean                                                          779.7Ki        391.8Ki        -49.76%

                                                              │ /tmp/old.txt │            /tmp/new.txt            │
                                                              │  allocs/op   │  allocs/op   vs base               │
LogsConsumer_cwlogs/10resources_10records_1logs-16                406.0 ± 2%    348.0 ± 2%  -14.29% (p=0.002 n=6)
LogsConsumer_cwlogs/10resources_10records_10logs-16              2.025k ± 0%   1.971k ± 1%   -2.64% (p=0.002 n=6)
LogsConsumer_cwlogs/10resources_100records_1logs-16              2.922k ± 0%   3.094k ± 0%   +5.89% (p=0.002 n=6)
LogsConsumer_cwlogs/10resources_100records_10logs-16             18.46k ± 0%   19.46k ± 1%   +5.45% (p=0.002 n=6)
MetricsConsumer_cwmetrics/10resources_10records_1metrics-16       548.5 ± 8%    653.0 ± 0%  +19.05% (p=0.002 n=6)
MetricsConsumer_cwmetrics/10resources_10records_10metrics-16     3.795k ± 3%   4.729k ± 1%  +24.63% (p=0.002 n=6)
MetricsConsumer_cwmetrics/10resources_100records_1metrics-16     3.281k ± 0%   6.161k ± 0%  +87.78% (p=0.002 n=6)
MetricsConsumer_cwmetrics/10resources_100records_10metrics-16    28.02k ± 0%   46.90k ± 0%  +67.39% (p=0.002 n=6)
geomean                                                          3.098k        3.722k       +20.16%

Documentation

N/A

Benchmark the entire HTTP request handling
to avoid fixating on irrelevant details.
The internal unmarshalers now implement plog.Unmarshaler
and pmetric.Unmarshaler. This will enable extracting them
later as encoding extensions; for now they remain embedded
within the receiver.

As a result of the interface change, the unmarshalers now
unmarshal a single record at a time, which means we cannot
merge resources/metrics as we go, but only after each record.

This also fixes a bug in the cwmetrics unmarshaller where
the unit of a metric was not considered part of its identity,
and so two metrics that differed only by unit would be merged.

Optimise all the things:

- Use json-iterator for decoding JSON
- Use klauspost/compress for decompressing gzip
- Pool gzip readers
- Remove pointer type from cwMetricValue to avoid allocation
- Don't read the whole request body into memory
- Reuse buffer for decoding base64; decode as we go
- Implement more efficient metrics merging
@axw axw force-pushed the firehose-unmarshal-record branch from 3e3d392 to 941cbef Compare January 21, 2025 06:49
@axw axw marked this pull request as ready for review January 21, 2025 07:23
@axw axw requested a review from a team as a code owner January 21, 2025 07:23
@axw axw requested a review from ChrsMark January 21, 2025 07:23
axw and others added 6 commits February 5, 2025 09:23
Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>
This would imply corruption at the gzip level,
since the gzip reader is working off an in-memory
buffer. Unlikely, but swallow the error to keep
it in line with existing code.
@axw axw requested a review from Aneurysm9 February 5, 2025 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants