Function calls become slow when the used memory grows #3591

hazae41 · 2023-09-01T13:18:45Z

hazae41
Sep 1, 2023

Let's suppose a function that takes some Box/Vec in...

#[wasm_bindgen]
pub fn f(bytes: Vec<u8>) -> () {
    
}

...and a custom glue code that just passes the bytes...

export function f(bytes) {
    const ptr0 = passArray8ToWasm0(bytes, wasm.__wbindgen_malloc);
    const len0 = WASM_VECTOR_LEN;
    wasm.f(ptr0, len0);
}

...that I run with the following code (using any benchmarking lib)

const bytes = crypto.getRandomValues(new Uint8Array(1024))

bench("f", () => f(bytes))

When benchmarking this, we get the following results

┌─────────┬──────────────────┬────────────┬────────────┐
│ (index) │     average      │  minimum   │  maximum   │
├─────────┼──────────────────┼────────────┼────────────┤
│  wasm   │ '143.31 ns/iter' │ '83.00 ns' │ '12.62 μs' │
└─────────┴──────────────────┴────────────┴────────────┘

If I change the function to return a boolean and keep the same glue code...

#[wasm_bindgen]
pub fn f(bytes: Vec<u8>) -> bool {
    true
}

...the benchmark is almost the same

┌─────────┬──────────────────┬────────────┬─────────────┐
│ (index) │     average      │  minimum   │   maximum   │
├─────────┼──────────────────┼────────────┼─────────────┤
│  wasm   │ '141.14 ns/iter' │ '83.00 ns' │ '19.83 μs'  │
└─────────┴──────────────────┴────────────┴─────────────┘

If I change the function to return the bytes as a custom "pointer" struct, and still keep the same glue code...

#[wasm_bindgen]
pub struct Pointer {
    ptr: *const u8,
    len: usize,
}

#[wasm_bindgen]
pub fn f(bytes: Vec<u8>) -> Pointer {
    Pointer {
        ptr: bytes.as_ptr(),
        len: bytes.len(),
    }
}

...the benchmark is still great

┌─────────┬──────────────────┬────────────┬─────────────┐
│ (index) │     average      │  minimum   │   maximum   │
├─────────┼──────────────────┼────────────┼─────────────┤
│  wasm   │ '162.39 ns/iter' │ '83.00 ns' │ '162.37 μs' │
└─────────┴──────────────────┴────────────┴─────────────┘

But if I return the bytes as a Vec or Box<[u8]>, and still keep the same glue code...

#[wasm_bindgen]
pub fn f(bytes: Vec<u8>) -> Vec<u8> {
    bytes
}

...the minimum is still great, but the maximum is almost 10x worse

┌─────────┬────────────────┬────────────┬─────────────┐
│ (index) │    average     │  minimum   │   maximum   │
├─────────┼────────────────┼────────────┼─────────────┤
│  wasm   │ '5.39 μs/iter' │ '84.00 ns' │  '1.36 ms'  │
└─────────┴────────────────┴────────────┴─────────────┘

WHY

hazae41 · 2023-09-01T14:17:43Z

hazae41
Sep 1, 2023
Author

I think it has to do with the way the memory is managed

When benchmarking, I originally did 100k samples; but I noticed the less samples I use, the less maximum I get; when using 1 sample, the maximum is low
When pinning the custom "Pointer" in memory with ManuallyDrop::new, I get the same issue than when returning Vec/Box

#[wasm_bindgen]
pub struct Pointer {
    ptr: *const u8,
    len: usize,
}

#[wasm_bindgen]
pub fn f(bytes: Vec<u8>) -> Pointer {
    let result = bytes.to_vec(); // low maximum
    let result = ManuallyDrop::new(bytes.to_vec()); // high maximum, like Vec/Box

    Pointer {
        ptr: result.as_ptr(),
        len: result.len(),
    }
}

0 replies

hazae41 · 2023-09-01T14:37:49Z

hazae41
Sep 1, 2023
Author

It also has the same issue when running on latest Chrome and Firefox, Node 20.3.1, and Deno 1.36.1

wasm-bindgen = { version = "0.2.87", default-features = false, features = ["std"] }

0 replies

hazae41 · 2023-09-01T16:05:47Z

hazae41
Sep 1, 2023
Author

I found out that when freeing the returned bytes, like wasm-bindgen does by default, the benchmarks are normal

try {
    const retptr = wasm.__wbindgen_add_to_stack_pointer(-16);
    const ptr0 = passArray8ToWasm0(bytes, wasm.__wbindgen_malloc);
    const len0 = WASM_VECTOR_LEN;
    const ptr1 = passArray8ToWasm0(mask, wasm.__wbindgen_malloc);
    const len1 = WASM_VECTOR_LEN;
    wasm.xor_mod_unsafe(retptr, ptr0, len0, ptr1, len1);
    var r0 = getInt32Memory0()[retptr / 4 + 0];
    var r1 = getInt32Memory0()[retptr / 4 + 1];
    wasm.__wbindgen_free(r0, r1 * 1);
} finally {
    wasm.__wbindgen_add_to_stack_pointer(16);
}

┌─────────┬──────────────────┬─────────────┬─────────────┐
│ (index) │     average      │   minimum   │   maximum   │
├─────────┼──────────────────┼─────────────┼─────────────┤
│  wasm   │ '448.74 ns/iter' │ '416.00 ns' │ '584.00 ns' │
└─────────┴──────────────────┴─────────────┴─────────────┘

So it seems the more objects are in memory, the function becomes slower, why is this happening?

0 replies

daxpedda · 2023-09-02T11:54:52Z

daxpedda
Sep 2, 2023
Maintainer

My first guess would be that it isn't slower because there are more objects in memory, but because it can't reuse memory, it has to constantly allocate new memory for every call. Allocation performance is common thing to optimize first in your application.

In theory your browser developer tools should be able to tell you exactly what it's spending time on, did you try that yet?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function calls become slow when the used memory grows #3591

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Function calls become slow when the used memory grows #3591

hazae41 Sep 1, 2023

Replies: 4 comments

hazae41 Sep 1, 2023 Author

hazae41 Sep 1, 2023 Author

hazae41 Sep 1, 2023 Author

daxpedda Sep 2, 2023 Maintainer

hazae41
Sep 1, 2023

hazae41
Sep 1, 2023
Author

hazae41
Sep 1, 2023
Author

hazae41
Sep 1, 2023
Author

daxpedda
Sep 2, 2023
Maintainer