-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization: Provide a way to reduce kuzu db disk file (data.kz) size (VACUUM) #4798
Comments
hi @bakkiaraj , may I know in your use case are you performing lots of |
@ray6080 , At work, I am working on application which will determine how the network of the IPs (inside SoC), This data then gets stored in Kuzu as nodes and relations , relation being the IP bus types like AXI ... The resulting kuzu DB is a asset that will be used by multiple downstream build jobs to generate the RTL code , do some more analysis like power consumption , security etc .. I do not have too many nodes , They are in the order of <1000 . As of now, we are not deleting / updating the nodes but this is in the plan. To give perspective, I have a table for nodes (nodes has few properties including STRUCT properties) , Python code , in a loop CREATE the nodes then CREATE relations , For 72 nodes , the data.kz file is ~13MB (Linux OS). This I feel too much. This is a reason I was asking for compact / vaccum facility. |
hi @bakkiaraj may I know your table schema? I'd like to try reproducing on my side. The unexpected data.kz file size might be due to that we preserve some extra space when there are few tuples, as we usually expect there will be more tuples coming to amortize the space usage. but maybe we can optimize this better to be less aggressive. |
@ray6080 , Here is the representational schema ,
There are more REL schemas but they are similar to the above AXIBUS_REL |
@bakkiaraj thanks for sharing this! I think this is mainly due to that we optimistically reserve pages for few tuples. Will see if we can have a better way to handle this. |
@ray6080 awesome. Thanks. Will wait for the update. |
Description
Is there a way to vaccum the Kuzu DB data files? It takes more size than expected in my simple usecases , I am looking for a "VACUUM" kind of command to repack / reduce the db size, So i can commit the DB into Git (in my usecase, The DB is a collateral for downstream jobs to refer and This DB needs to be safe guarded in version control for build reproduction).
P.S: I am using git LFS for now, But I feel, we should have way to reduce the disk size of kuzu DB
The text was updated successfully, but these errors were encountered: