Add more unit tests for GPU buffer #2978

maxrjones · 2025-04-11T19:09:34Z

This PR adds some tests for the GPU buffer prototype to unlock #2738.

maxrjones · 2025-04-11T19:33:40Z

@TomAugspurger would you expect the sharding codec to work with the gpu buffer prototype? If so, I could debug the failure more but don't want to spend time setting up an environment with cuda unnecessarily

TomAugspurger · 2025-04-11T21:04:54Z

I haven't used or looked at the sharding code, so I don't know offhand sorry.

If setting up a dev end with cupy is a hassle I can take a look sometime.

Just looking at the traceback, maybe one thing to check: try setting with zarr.config.enable_gpu() in the tests? Maybe we're grabbing a buffer type from the default config somewhere where we should be basing it off another Buffer class?

jakirkham · 2025-04-18T23:12:53Z

would you expect the sharding codec to work with the gpu buffer prototype?

cc @madsbk @akshaysubr (in case either of you have thoughts on this question)

akshaysubr · 2025-04-23T17:01:29Z

Its been a while since I looked at the sharding codec, but it was not expected to work with the GPU buffer prototype. Might be worth another closer look but it seemed like some things in the sharding codec need to still be generalized to work with arbitrary buffer prototypes.

d-v-b · 2025-04-23T17:14:41Z

the sharding codec is unique because it does IO, and thus interacts with a store, which might be the root of the complication here. (I would argue it is not really a codec in the conventional sense, and is rather a special implementation of an array).

TomAugspurger · 2025-05-07T19:21:45Z

Looking into this a bit today. For some reason the get_ndbuffer_class is returning a CPU buffer in sharding.py:

-> await get_pipeline_class()
(Pdb) ll
650         async def _encode_shard_index(self, index: _ShardIndex) -> Buffer:
651             index_bytes = next(
652                 iter(
653  ->                 await get_pipeline_class()
654                     .from_codecs(self.index_codecs)
655                     .encode(
656                         [
657                             (
658                                 get_ndbuffer_class().from_numpy_array(index.offsets_and_lengths),
659                                 self._get_index_chunk_spec(index.chunks_per_shard),
660                             )
661                         ],
662                     )
663                 )
664             )
665             assert index_bytes is not None
666             assert isinstance(index_bytes, Buffer)
667             return index_bytes
(Pdb) pp get_ndbuffer_class()
<class 'zarr.core.buffer.cpu.NDBuffer'>

despite us setting it to gpu.NDBuffer in the test setup. I'll push a fix later.

TomAugspurger · 2025-05-07T19:46:21Z

188e501 seems to fix that test.

I notice a bunch of other uses of numpy_buffer_prototype() in that file, which AFAICT is using host memory by design. We'll might need to audit this a bit further to see what all can / should be using default_buffer_prototype() instead.

TomAugspurger · 2025-05-07T21:32:05Z

@maxrjones is this in a good state now from your side?

maxrjones · 2025-05-07T22:05:35Z

@maxrjones is this in a good state now from your side?

Thanks for looking into this @TomAugspurger and pushing a fix! Yes, seems good to me. I agree with looking into the other numpy_buffer_prototypes() calls separately.

Add more unit tests for GPU buffer

f90f606

github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Apr 11, 2025

Set GPU config in test

22741dd

This was referenced Apr 14, 2025

Improve support for imagecodecs in VirtualiZarr NASA-IMPACT/veda-odd#143

Open

Use unsigned bytes to back Buffer #2738

Merged

Merge branch 'main' into gpu-test

5df2f6f

TomAugspurger added 2 commits May 7, 2025 12:41

Use default_buffer_prototype

188e501

Merge remote-tracking branch 'upstream/main' into gpu-test

36a14af

TomAugspurger approved these changes May 7, 2025

View reviewed changes

release note

38f7e03

github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label May 8, 2025

TomAugspurger enabled auto-merge (squash) May 8, 2025 00:25

TomAugspurger merged commit 2609748 into zarr-developers:main May 8, 2025
29 of 30 checks passed

weiji14 mentioned this pull request May 8, 2025

Xarray GPU optimization xarray-contrib/xarray.dev#771

Open

weiji14 mentioned this pull request May 24, 2025

Bump zarr from 3.0.6 to 3.0.8 pangeo-data/ncar-hackathon-xarray-on-gpus#35

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add more unit tests for GPU buffer #2978

Add more unit tests for GPU buffer #2978

Uh oh!

maxrjones commented Apr 11, 2025

Uh oh!

maxrjones commented Apr 11, 2025

Uh oh!

TomAugspurger commented Apr 11, 2025

Uh oh!

jakirkham commented Apr 18, 2025

Uh oh!

akshaysubr commented Apr 23, 2025

Uh oh!

d-v-b commented Apr 23, 2025

Uh oh!

TomAugspurger commented May 7, 2025 •

edited

Loading

Uh oh!

TomAugspurger commented May 7, 2025

Uh oh!

TomAugspurger commented May 7, 2025

Uh oh!

maxrjones commented May 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add more unit tests for GPU buffer #2978

Add more unit tests for GPU buffer #2978

Uh oh!

Conversation

maxrjones commented Apr 11, 2025

Uh oh!

maxrjones commented Apr 11, 2025

Uh oh!

TomAugspurger commented Apr 11, 2025

Uh oh!

jakirkham commented Apr 18, 2025

Uh oh!

akshaysubr commented Apr 23, 2025

Uh oh!

d-v-b commented Apr 23, 2025

Uh oh!

TomAugspurger commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomAugspurger commented May 7, 2025

Uh oh!

TomAugspurger commented May 7, 2025

Uh oh!

maxrjones commented May 7, 2025

Uh oh!

Uh oh!

Uh oh!

TomAugspurger commented May 7, 2025 •

edited

Loading