You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Interesting model trained on only 4090s apparently, and 180M parameters, but good performance on forecasting and able to swap different encoders, decoders, and processors together. Trained all the way out to 6 days during training. Most processing is done in the latent space with pure transformer blocks, so there isn't a decode/encode step each time, to reduce error.
U-net encoding to the latent space. Different encoders for surface vs pressure level variables. Total latent space goes from low pressure to high pressure, with the bottom layer being surface variables.
Uses CPU checkpointing to move activations to CPU when doing long rollouts during training.
Context
Good, interesting performance from a relatively small model and seemingly relatively easy to train. Unfortunately, doesn't seem like the model will be open sourced, but an okay amount of detail in the report.
The text was updated successfully, but these errors were encountered:
Main change in the architecture (compared to WM-1) is using neighborhood attention instead of Swin transformers.
I have a draft implementation of the key parts of the architecture based on the blog Brayden-Zhang/WeatherMesh, although I'm not totally sure about the correctness.
Arxiv/Blog/Paper Link
https://windbornesystems.com/blog/weathermesh-2-technical-blog
Detailed Description
Interesting model trained on only 4090s apparently, and 180M parameters, but good performance on forecasting and able to swap different encoders, decoders, and processors together. Trained all the way out to 6 days during training. Most processing is done in the latent space with pure transformer blocks, so there isn't a decode/encode step each time, to reduce error.
U-net encoding to the latent space. Different encoders for surface vs pressure level variables. Total latent space goes from low pressure to high pressure, with the bottom layer being surface variables.
Uses CPU checkpointing to move activations to CPU when doing long rollouts during training.
Context
Good, interesting performance from a relatively small model and seemingly relatively easy to train. Unfortunately, doesn't seem like the model will be open sourced, but an okay amount of detail in the report.
The text was updated successfully, but these errors were encountered: