You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The SymbolizationComplete mechanism still reflects old semantics regarding trace transmission, processing and timestamping that no longer exist.
As the documentation string describes, it's meant to be in essence an indicator that all Traces until a specific KTime have been processed, allowing for dependent events in userspace (such as cleaning up process metadata) to execute without loss of information.
However, when we switched away from batched Trace processing and introduced high resolution timestamps for each individual trace (as opposed to timestamping the entire batch of Traces in userspace) we broke that contract as the ordering of events delivered to userspace and subsequently processed is no longer guaranteed to be sequential in observed KTime. This is due to how events are delivered in userspace: our perf event processing loop will drain per-CPU buffers without taking KTime into account. For example, for a CPU that has generated trace events on two cores, we'll first process all events for core 0 and then all events for core 1 introducing the possibility of SymbolizationComplete seeing backward jumps in time.
Additionally, we're calling SymbolizationComplete for every trace received which can result in thousands of calls per second depending on sampling frequency and core count. This is not a big issue for the common case where SymbolizationComplete exits immediately, but could result in corner cases with increased processing load.
Finally, the SymbolizationComplete mechanism is currently tied to interpreted traces (even though we needlessly call it for native traces too) but the problem it solves is more generic than that and it'd be useful to have a mechanism that lets userspace know it's safe to perform actions that depend on Traces up to a specific KTime having been processed. For example, a generic mechanism that provides this guarantee can be used to fix #278.
The text was updated successfully, but these errors were encountered:
I'm looking into reworking the existing mechanism as such:
Generify and decouple SymbolizationComplete from traceHandler.HandleTrace. Instead, call it at fixed-reduced frequency (e.g. 2 times a second) from the tracer polling loop.
Instead of a current KTime, introduce artificial lag by keeping track of the previous, lowest-seen KTime in that loop. This is a simpler solution than batching and sorting trace events.
The artificial lag introduced based on the current polling interval of 250ms is expected to be less than 1s, making PID reuse even with a reduced PID space of 32768 values, highly unlikely.
The SymbolizationComplete mechanism still reflects old semantics regarding trace transmission, processing and timestamping that no longer exist.
As the documentation string describes, it's meant to be in essence an indicator that all
Traces
until a specificKTime
have been processed, allowing for dependent events in userspace (such as cleaning up process metadata) to execute without loss of information.However, when we switched away from batched
Trace
processing and introduced high resolution timestamps for each individual trace (as opposed to timestamping the entire batch ofTraces
in userspace) we broke that contract as the ordering of events delivered to userspace and subsequently processed is no longer guaranteed to be sequential in observedKTime
. This is due to how events are delivered in userspace: our perf event processing loop will drain per-CPU buffers without takingKTime
into account. For example, for a CPU that has generated trace events on two cores, we'll first process all events for core 0 and then all events for core 1 introducing the possibility ofSymbolizationComplete
seeing backward jumps in time.Additionally, we're calling
SymbolizationComplete
for every trace received which can result in thousands of calls per second depending on sampling frequency and core count. This is not a big issue for the common case whereSymbolizationComplete
exits immediately, but could result in corner cases with increased processing load.Finally, the
SymbolizationComplete
mechanism is currently tied to interpreted traces (even though we needlessly call it for native traces too) but the problem it solves is more generic than that and it'd be useful to have a mechanism that lets userspace know it's safe to perform actions that depend onTraces
up to a specificKTime
having been processed. For example, a generic mechanism that provides this guarantee can be used to fix #278.The text was updated successfully, but these errors were encountered: