-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
For many small stream writes, RecyclableMemoryStream is slower than other implementations #339
Comments
Ben surely has the definitive answer here.
I can share some of my findings - a few months ago I profiled a scenario
where multiple little chunks were written to the stream by a BinaryWriter
instance. The profiler signaled as the hottest path, the logic inside
RecyclableMemoryStream that determines which reusable memory block(s)
should receive the write operations, which in essence, involved a division
to find the block, followed by a modulus operation to find the offset in
the block (both operations are relatively expensive, even on modern CPUs).
At a later revision, this was modified to take advantage of a Math.DivRem
operation that apparently gives the division and the reminder on a single
operation; unfortunately, this is not as fast as it could be - behind the
scenes, some versions of the runtime still perform a division operation
followed by a modulus operation, while others are a little bit faster, as
they compute the reminder by performing a subtraction and a
multiplication. There is an open request for an optimization (e.g. in x86,
the IDIV operation returns the quotient and the reminder on a single
operation), however it seems that the design of the IL jitter makes taking
both results difficult to implement). Here is the link to that request: Can
"Math.DivRem" do division operation only once? · Issue #4155 ·
dotnet/runtime (github.com) <dotnet/runtime#4155>.
Perhaps this could be optimized for the case of sequential writes (by
tracking the current write position and the reminding available bytes?),
but in my use cases, the relative "slowness" of the write operations is
more than compensated by the massive drop in GC pressure.
Our internal RPC framework virtually became allocation-free (as well as
copy-free, as packet interleaving for channel multiplexing is directly
writing the recyclable blocks to the socket) on both ends, thanks to
RecyclableMemoryStream.
…On Wed, Mar 27, 2024 at 2:43 AM Martin Regen ***@***.***> wrote:
For a project which implements its own buffered memory stream I added the int
Read(Span<byte> buffer) and void Write(ReadOnlySpan<byte> buffer)
signatures and compared the performance of MemoryStream, ArraySegmentStream
(custom based on ArrayPool) and RecyclableMemoryStream.
To my surprise, RecyclableMemoryStream is always outperformed by
MemoryStream and ArraySegmentStream by a factor of two, as if the
Write(ReadOnlyMemory) is not properly implemented or disabled in
RecyclableMemoryStream?
see benchmark results here: OPCFoundation/UA-.NETStandard#2556
<OPCFoundation/UA-.NETStandard#2556>
see results of BinaryEncoderRecyclableMemoryStream vs
BinaryEncoderArraySegmentStream and BinaryEncoderMemoryStream.
Any idea what could be wrong in the benchmark or why
RecyclableMemoryStream.Write is always slower?
—
Reply to this email directly, view it on GitHub
<#339>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABRR7UQTHKOCAOOBKITDSTLY2J2BNAVCNFSM6AAAAABFKNPV62VHI2DSMVQWIX3LMV43ASLTON2WKOZSGIYTAMJZGY4TSNA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Thanks @sgorozco, this is in fact whats happening here, a BinaryWriter writes a lot of little chunks (e.g. int, byte, etc) for a OPC UA binary encoder. The Write implementation should only copy the small chunks into the buffer, keep two indexes (one for the actual buffer, one for the write pointer into the actual buffer). An overflow may split the write and allocates a new buffer or moves to the next buffer. Thats it. I would be interested to see why a division is necessary. Update -- I checked: --> GetBlockAndRelativeOffset uses a division. I guess its a design decision to make write faster or the calculation of the position, or what to use as the source of truth. |
As an additional sidenote, turns out that in my JSONEncoder Tests where writing uses StreamWriter in contrary to BinaryWriter above, RecyclableMemoryStream outperforms all other stream implementations... this is currently our main use case so technically I'm a happy camper with the current implementation. |
Yeah, the issue with lots of small writes is something that has come up a few times before. Unfortunately, it just wasn't optimized for that scenario. The workaround would be to operate directly on the buffer, but not all APIs will support that, and that does defeat the purpose of using a stream. A way to fix this would be to change the source of truth for the current I'd have to think about any negative performance ramifications (e.g., calling Position is much more expensive), but perhaps it's not so bad. I'll ponder it. |
For a project which implements its own buffered memory stream I added the
int Read(Span<byte> buffer)
andvoid Write(ReadOnlySpan<byte> buffer)
signatures and compared the performance of MemoryStream, ArraySegmentStream (custom based on ArrayPool) and RecyclableMemoryStream.To my surprise, RecyclableMemoryStream is always outperformed by MemoryStream and ArraySegmentStream by a factor of two, as if the
Write(ReadOnlyMemory<byte>)
is not properly implemented or disabled in RecyclableMemoryStream?see benchmark results here: OPCFoundation/UA-.NETStandard#2556
see results of
BinaryEncoderRecyclableMemoryStream
vsBinaryEncoderArraySegmentStream
andBinaryEncoderMemoryStream
.Any idea what could be wrong in the benchmark or why RecyclableMemoryStream.Write is always slower?
The text was updated successfully, but these errors were encountered: