Any example of online inference of S4 block? #158

traidn · 2024-12-27T12:36:08Z

Are there any examples of how to infer s4 block in recurrent mode? I tried using the step function, but it gives errors. I'm attaching my script. What could be the problem?

import torch
from s4 import S4
from sashimi import ResidualBlock

def s4_block(dim):
    layer = S4(
        d_model=dim,
        d_state=16,
        bidirectional=False,
        dropout=0.0,
        transposed=True,
    )
    return ResidualBlock(
        d_model=dim,
        layer=layer,
        dropout=0.0,
    )

model = s4_block(16)
for module in model.modules():
    if hasattr(module, 'setup_step'): module.setup_step(mode="diagonal")
model.eval()

input_seg = torch.randn(1, 16, 100)

full_out, _ = model(input_seg)
print(full_out)

s4_state = model.default_state()
stream_res = []
for i in range(input_seg.shape[-1]):
    part_input = input_seg[:, :, i]
    print(part_input.shape)
    part_res, s4_state = model.step(part_input, s4_state)
    stream_res.append(part_res)

stream_res = torch.cat(stream_res, dim=2)
print(stream_res)
print(torch.allclose(full_out, stream_res))

The text was updated successfully, but these errors were encountered:

sendeniz · 2025-01-04T02:06:56Z

Hi @traidn, I am working on something similar at the moment. Could you post your error ? I tried running your code. The code runs despite this error at the beginning:

Diagonalization error: tensor(0.2134, grad_fn=<DistBackward0>) Diagonalization error: tensor(0.2134, grad_fn=<DistBackward0>)

Is this the error you are referring to ? If yes you can find in the documentation of sashimi.py:

S4 recurrence mode. Using `diagonal` can speed up generation by 10-20%.
`linear` should be faster theoretically but is slow in practice since it
dispatches more operations (could benefit from fused operations).
Note that `diagonal` could potentially be unstable if the diagonalization is numerically unstable
(although we haven't encountered this case in practice), while `dense` should always be stable.

So setting it to module.setup_step(mode="linear"), instead of diagonal resolve this issues, even-though it is slower than diagonal. Please let me know if this worked for you.

Best Deniz

traidn · 2025-01-13T11:12:30Z

Hello, thanks for answer. But I still have problems. I try to run code above on Windows and on cpu. I guess it can be the reason. But I'm gonna inference model on cpu in future, therefore I need to check it now. My error is something like this:

CUDA extension for cauchy multiplication not found. Install by going to extensions/cauchy/ and running python setup.py install. This should speed up end-to-end training by 10-50%
[KeOps] Warning : 
    The default C++ compiler could not be found on your system.
    You need to either define the CXX environment variable or a symlink to the g++ command.
    For example if g++-8 is the command you can do
      import os
      os.environ['CXX'] = 'g++-8'
[KeOps] Warning : Cuda libraries were not detected on the system or could not be loaded ; using cpu only mode
Falling back on slow Cauchy kernel. Install at least one of pykeops or the CUDA extension for efficiency.
Falling back on slow Vandermonde kernel. Install pykeops for improved memory efficiency.
tensor([[[-1.7425, -0.1773, -0.7354,  ...,  1.5515, -0.7987, -0.0329],
         [-1.5545, -1.2401,  1.4183,  ...,  3.3138,  0.4148,  1.6980],
         [ 0.9791,  0.7177, -1.2344,  ..., -0.8843, -2.8430,  1.2774],
         ...,
         [-0.0802, -1.6018, -0.6896,  ..., -0.0973,  0.6955,  0.4760],
         [-0.3425,  0.6922,  0.0774,  ..., -0.2113,  1.2491,  0.0076],
         [-1.1684,  1.2272,  0.6717,  ...,  0.1415,  1.3729,  0.5358]]],
       grad_fn=<AddBackward0>)
torch.Size([1, 16])
Traceback (most recent call last):
  File "D:\Python_Projects\aTENNuate_test\S4M\dummy_func.py", line 34, in <module>
    part_res, s4_state = model.step(part_input, s4_state)
  File "D:\Python_Projects\aTENNuate_test\S4M\sashimi.py", line 196, in step
    z, state = self.layer.step(z, state, **kwargs)
  File "D:\Python_Projects\aTENNuate_test\S4M\s4.py", line 1557, in step
    y, next_state = self.kernel.step(u, state) # (B C H)
  File "D:\Python_Projects\aTENNuate_test\S4M\s4.py", line 1353, in step
    y, state = self.kernel.step(u, state, **kwargs)
  File "D:\Python_Projects\aTENNuate_test\S4M\s4.py", line 1021, in step
    new_state = self._step_state(u, state)
  File "D:\Python_Projects\aTENNuate_test\S4M\s4.py", line 928, in _step_state
    b = self.input_contraction(self.dB, u)
  File "D:\Python_Projects\aTENNuate_test\.venv\lib\site-packages\opt_einsum\contract.py", line 763, in __call__
    return self._contract(ops, out, backend, evaluate_constants=evaluate_constants)
  File "D:\Python_Projects\aTENNuate_test\.venv\lib\site-packages\opt_einsum\contract.py", line 693, in _contract
    return _core_contract(list(arrays),
  File "D:\Python_Projects\aTENNuate_test\.venv\lib\site-packages\opt_einsum\contract.py", line 591, in _core_contract
    new_view = _einsum(einsum_str, *tmp_operands, backend=backend, **einsum_kwargs)
  File "D:\Python_Projects\aTENNuate_test\.venv\lib\site-packages\opt_einsum\sharing.py", line 151, in cached_einsum
    return einsum(*args, **kwargs)
  File "D:\Python_Projects\aTENNuate_test\.venv\lib\site-packages\opt_einsum\contract.py", line 353, in _einsum
    return fn(einsum_str, *operands, **kwargs)
  File "D:\Python_Projects\aTENNuate_test\.venv\lib\site-packages\opt_einsum\backends\torch.py", line 45, in einsum
    return torch.einsum(equation, operands)
  File "D:\Python_Projects\aTENNuate_test\.venv\lib\site-packages\torch\functional.py", line 381, in einsum
    return einsum(equation, *_operands)
  File "D:\Python_Projects\aTENNuate_test\.venv\lib\site-packages\torch\functional.py", line 386, in einsum
    return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
RuntimeError: einsum(): the number of subscripts in the equation (1) does not match the number of dimensions (2) for operand 0 and no ellipsis was given

Do you have any idea? Or I have to switch to Linux and CUDA?

sendeniz · 2025-01-13T11:48:40Z

Hey @traidn you have one warning and one error.
I think that you can ignore the Keyops warning for now as it will only impact efficiency or speed. We can look into this later. The error that causes the code to abort, is the mismatch in dimensions. Its hard to debug unless seeing your code. Could you share how the S4 Model is defined and how you initialize it in your code ? I think I have an idea on how to solve it.

traidn · 2025-01-13T14:32:15Z

@sendeniz Oh, it seems like I found error - I used old version of S4, which I found in another repo. When I paste code from this repo code above works fine. I'll check one more time and close the issue.

sendeniz · 2025-01-25T00:04:36Z

@traidn any updates ? Did it work successful ? Performance is desirable ? Let me know.

traidn changed the title ~~Any example of online inferense of S4 block?~~ Any example of online inference of S4 block? Dec 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any example of online inference of S4 block? #158

Any example of online inference of S4 block? #158

traidn commented Dec 27, 2024 •

edited

Loading

sendeniz commented Jan 4, 2025 •

edited

Loading

traidn commented Jan 13, 2025

sendeniz commented Jan 13, 2025 •

edited

Loading

traidn commented Jan 13, 2025

sendeniz commented Jan 25, 2025

Any example of online inference of S4 block? #158

Any example of online inference of S4 block? #158

Comments

traidn commented Dec 27, 2024 • edited Loading

sendeniz commented Jan 4, 2025 • edited Loading

traidn commented Jan 13, 2025

sendeniz commented Jan 13, 2025 • edited Loading

traidn commented Jan 13, 2025

sendeniz commented Jan 25, 2025

traidn commented Dec 27, 2024 •

edited

Loading

sendeniz commented Jan 4, 2025 •

edited

Loading

sendeniz commented Jan 13, 2025 •

edited

Loading