feat: docker gpu image CI builds #3103

canardleteer · 2023-09-09T23:25:44Z

Enables the GPU enabled container images to be built and pushed alongside the CPU containers, and liberates the GPU Container images from local building for casual experimenters. This also generally addresses some of my CI concerns raised in #1461 & #3044

This doesn't validate the GPU enabled binary in the container, just that the declarations in place to build the
container and binary is functional, so it doesn't need any GPU infrastructure, and can be run as a Github Action. This generally
normalizes the delivery of GPU containers to match the CPU only ones.

As I'm not the maintainer of the primary repository, nor the owner of the DockerHub repository, I cannot run a full
validation on these as it only runs for master on @ggerganov's DockerHub credentials, I have a slightly different
variation of this Action (with push: false) that confirms the changes generally. You can view that validation in my repository: Branch with similar change, Action Validation.

Not Addressed By This Pull Request:

Multiple {CUDA,ROCm} Library Version Support
Tailored GPU Architecture Support
Pipeline support for validating the binaries in the images work
- This is true of the current CPU image as well.

Known Issues:

The linux/arm64 build for CUDA is really slow, but hasn't timed out on me (yet). I don't know why, but I don't find the pipeline delay acceptable so have it disabled for now.

The value of opening up these builds and pushes:

Making sure the Dockerfiles don't go out of date, and changes don't break builds.
Containers tagged with a version:
- Generally can be now be used from any GPU Cloud provider without a consumer having to build & push their own.
  - This is a huge value proposition for project popularity & adoption by GPU Cloud users.
Containers tagged with a commit hash / branch name from an MR (not done in this MR):
- Generally opens doors for much more robust CI infrastructure in/on containers, which I'd love to help with, but don't have time to at the moment (but feel free to loop me into conversations).
- CAN be made available for testing via GPU enabled k8s clusters via an API trigger.
- Tests COULD be launched in these GPU enabled containers via an API call before a merge.
  - Depending on how the GPU Cluster Access infrastructure & Grants evolve.
- CAN reduce the need for VM with "always acquired" GPU infrastructure and/or maintenance of the GPU Docker runtimes, to validate GPU builds.

I don't think I will have time to help set up the third item on this list, but that shouldn't stop us from gaining value from the first 2. Most of the effort is setting up additional infrastructure for a third party (like myself) to validate the process, the code changes are just a matter of finessing the tags and triggers.

Enables the GPU enabled container images to be built and pushed alongside the CPU containers.

canardleteer · 2023-09-10T00:06:32Z

Here's an example of how slow the Action is with linux/arm64, which is why it's disabled. Seems to take about 1 hour and 22 minutes, which isn't acceptable (imo), and why it's disabled:

Slow Arm Action.

Enables the GPU enabled container images to be built and pushed alongside the CPU containers. Co-authored-by: canardleteer <[email protected]>

feat: docker gpu image CI builds

2e974cf

Enables the GPU enabled container images to be built and pushed alongside the CPU containers.

ggerganov approved these changes Sep 14, 2023

View reviewed changes

ggerganov merged commit 980ab41 into ggml-org:master Sep 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: docker gpu image CI builds #3103

feat: docker gpu image CI builds #3103

Uh oh!

canardleteer commented Sep 9, 2023 •

edited

Loading

Uh oh!

canardleteer commented Sep 10, 2023

Uh oh!

Uh oh!

feat: docker gpu image CI builds #3103

feat: docker gpu image CI builds #3103

Uh oh!

Conversation

canardleteer commented Sep 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

canardleteer commented Sep 10, 2023

Uh oh!

Uh oh!

canardleteer commented Sep 9, 2023 •

edited

Loading