Suitability and parameters for frequently updated data storage #477
Unanswered
alexander0042
asked this question in
Q&A
Replies: 1 comment 7 replies
-
PR here adds a docs page to the website + improved switches: #482 |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm the developer behind Pirate Weather, a weather API that takes NOAA forecasts and shares them through a free/ open/ documented API. Currently, my back-end works by overwriting the weather data (saved as Zarr arrays) on disk; however, because there are two processes (a sync process that keeps the array updated and a second process to handle queries and return data), I'm hitting occasional issues with file locking and would rather switch to a database solution.
The speed, licensing, and hybrid storage make Garnet look ideal for this sort of task. The data I'm storing consists of ~50 GB of ~1 MB pages, updating roughly few hours, and stored on AWS direct attached NVME drives. I don't need any persistence or recovery, since all the data is saved elsewhere, but I do need to make sure that old values aren't kept on disk very long after new values are written.
I'm trialling running Garnet in docker using these commands:
I've read through the docs in detail and am a little confused about the difference between compaction types (I want to delete anything that's been overwritten), and the difference between the
--memory
,--compaction-freq
, and--compaction-max-segments
parameters, which one drives compaction? Are checkpoints required for compaction, since I wasn't planning on using that feature.I know this is a little bit of an unusual case, but as Zarr becomes more popular I could imagine it becoming a larger one! Thank you in advance for any help, and I'm happy to clarify anything about my setup
Beta Was this translation helpful? Give feedback.
All reactions