Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add shards to valid_encodings to enable sharded Zarr writing #9948

Merged
merged 10 commits into from
Jan 27, 2025

Conversation

jacobbieker
Copy link
Contributor

@jacobbieker jacobbieker commented Jan 13, 2025

Adds shards to the list of valid_encodings in the zarr backend, so that sharded Zarr V3s can be written.

Copy link

welcome bot commented Jan 13, 2025

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

@jhamman
Copy link
Member

jhamman commented Jan 17, 2025

Thanks @jacobbieker for opening a PR. Do you have time to add a roundtrip test for this? This test would be a good test to emulate:

def test_chunk_encoding(self) -> None:
# These datasets have no dask chunks. All chunking specified in
# encoding
data = create_test_data()
chunks = (5, 5)
data["var2"].encoding.update({"chunks": chunks})
with self.roundtrip(data) as actual:
assert chunks == actual["var2"].encoding["chunks"]

@jacobbieker
Copy link
Contributor Author

jacobbieker commented Jan 17, 2025

Thanks @jacobbieker for opening a PR. Do you have time to add a roundtrip test for this? This test would be a good test to emulate:

def test_chunk_encoding(self) -> None:
# These datasets have no dask chunks. All chunking specified in
# encoding
data = create_test_data()
chunks = (5, 5)
data["var2"].encoding.update({"chunks": chunks})
with self.roundtrip(data) as actual:
assert chunks == actual["var2"].encoding["chunks"]

Yep! I think I've added one that works for that, and updated the Zarr V3 loading to include the shards. It seems to pass the tests locally so far at least.

Edit: Ah it seems to cause some issues with threads on 3.12 potentially?

@jhamman
Copy link
Member

jhamman commented Jan 21, 2025

I seem to remember fixing this in the Dask test suite by manually cleaning up thread resources. This may help: https://github.com/dask/dask/blob/e04734b4d8959ba259801f2e2a490cb4ee8d891f/dask/tests/test_distributed.py#L338-L358

@kmuehlbauer
Copy link
Contributor

kmuehlbauer commented Jan 21, 2025

@jhamman The dask distributed zarr integration test is also failing on main. Would be great to have a fix applied.

@kmuehlbauer
Copy link
Contributor

I seem to remember fixing this in the Dask test suite by manually cleaning up thread resources. This may help: https://github.com/dask/dask/blob/e04734b4d8959ba259801f2e2a490cb4ee8d891f/dask/tests/test_distributed.py#L338-L358

See #9967

@jacobbieker
Copy link
Contributor Author

I seem to remember fixing this in the Dask test suite by manually cleaning up thread resources. This may help: https://github.com/dask/dask/blob/e04734b4d8959ba259801f2e2a490cb4ee8d891f/dask/tests/test_distributed.py#L338-L358

See #9967

@jhamman yeah, this change in #9967 fixes the tests and the failing ones pass.

Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @jacobbieker !

@jacobbieker
Copy link
Contributor Author

Thanks for this @jacobbieker !

No problem! Excited to use this a lot! I am not able to merge this PR, is it possible you can? Its saying I'm not authorized.

@jhamman jhamman requested a review from dcherian January 24, 2025 14:21
@jhamman
Copy link
Member

jhamman commented Jan 24, 2025

We'll let @dcherian do a final review and/or merge.

@dcherian
Copy link
Contributor

Looks great. Thanks @jacobbieker, welcome to Xarray!

@dcherian dcherian enabled auto-merge (squash) January 27, 2025 20:05
@dcherian dcherian added the plan to merge Final call for comments label Jan 27, 2025
@dcherian dcherian merged commit 2870710 into pydata:main Jan 27, 2025
34 checks passed
Copy link

welcome bot commented Jan 27, 2025

Congratulations on completing your first pull request! Welcome to Xarray! We are proud of you, and hope to see you again! celebration gif

@jhamman
Copy link
Member

jhamman commented Jan 27, 2025

Thanks @jacobbieker for the contribution here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug plan to merge Final call for comments topic-zarr Related to zarr storage library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to write to sharded Zarr V3 because shards not in valid_encodings
5 participants