Skip to content

Add more unit tests for GPU buffer #2978

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 8, 2025

Conversation

maxrjones
Copy link
Member

This PR adds some tests for the GPU buffer prototype to unlock #2738.

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Apr 11, 2025
@maxrjones
Copy link
Member Author

@TomAugspurger would you expect the sharding codec to work with the gpu buffer prototype? If so, I could debug the failure more but don't want to spend time setting up an environment with cuda unnecessarily

@TomAugspurger
Copy link
Contributor

I haven't used or looked at the sharding code, so I don't know offhand sorry.

If setting up a dev end with cupy is a hassle I can take a look sometime.

Just looking at the traceback, maybe one thing to check: try setting with zarr.config.enable_gpu() in the tests? Maybe we're grabbing a buffer type from the default config somewhere where we should be basing it off another Buffer class?

@jakirkham
Copy link
Member

would you expect the sharding codec to work with the gpu buffer prototype?

cc @madsbk @akshaysubr (in case either of you have thoughts on this question)

@akshaysubr
Copy link
Contributor

Its been a while since I looked at the sharding codec, but it was not expected to work with the GPU buffer prototype. Might be worth another closer look but it seemed like some things in the sharding codec need to still be generalized to work with arbitrary buffer prototypes.

@d-v-b
Copy link
Contributor

d-v-b commented Apr 23, 2025

the sharding codec is unique because it does IO, and thus interacts with a store, which might be the root of the complication here. (I would argue it is not really a codec in the conventional sense, and is rather a special implementation of an array).

@TomAugspurger
Copy link
Contributor

TomAugspurger commented May 7, 2025

Looking into this a bit today. For some reason the get_ndbuffer_class is returning a CPU buffer in sharding.py:

-> await get_pipeline_class()
(Pdb) ll
650         async def _encode_shard_index(self, index: _ShardIndex) -> Buffer:
651             index_bytes = next(
652                 iter(
653  ->                 await get_pipeline_class()
654                     .from_codecs(self.index_codecs)
655                     .encode(
656                         [
657                             (
658                                 get_ndbuffer_class().from_numpy_array(index.offsets_and_lengths),
659                                 self._get_index_chunk_spec(index.chunks_per_shard),
660                             )
661                         ],
662                     )
663                 )
664             )
665             assert index_bytes is not None
666             assert isinstance(index_bytes, Buffer)
667             return index_bytes
(Pdb) pp get_ndbuffer_class()
<class 'zarr.core.buffer.cpu.NDBuffer'>

despite us setting it to gpu.NDBuffer in the test setup. I'll push a fix later.

@TomAugspurger
Copy link
Contributor

188e501 seems to fix that test.

I notice a bunch of other uses of numpy_buffer_prototype() in that file, which AFAICT is using host memory by design. We'll might need to audit this a bit further to see what all can / should be using default_buffer_prototype() instead.

@TomAugspurger
Copy link
Contributor

@maxrjones is this in a good state now from your side?

@maxrjones
Copy link
Member Author

@maxrjones is this in a good state now from your side?

Thanks for looking into this @TomAugspurger and pushing a fix! Yes, seems good to me. I agree with looking into the other numpy_buffer_prototypes() calls separately.

@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label May 8, 2025
@TomAugspurger TomAugspurger enabled auto-merge (squash) May 8, 2025 00:25
@TomAugspurger TomAugspurger merged commit 2609748 into zarr-developers:main May 8, 2025
29 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants