Skip to content

feat: docker gpu image CI builds #3103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 14, 2023

Conversation

canardleteer
Copy link
Contributor

@canardleteer canardleteer commented Sep 9, 2023

Enables the GPU enabled container images to be built and pushed alongside the CPU containers, and liberates the GPU Container images from local building for casual experimenters. This also generally addresses some of my CI concerns raised in #1461 & #3044

This doesn't validate the GPU enabled binary in the container, just that the declarations in place to build the
container and binary is functional, so it doesn't need any GPU infrastructure, and can be run as a Github Action. This generally
normalizes the delivery of GPU containers to match the CPU only ones.

As I'm not the maintainer of the primary repository, nor the owner of the DockerHub repository, I cannot run a full
validation on these as it only runs for master on @ggerganov's DockerHub credentials, I have a slightly different
variation of this Action (with push: false) that confirms the changes generally. You can view that validation in my repository: Branch with similar change, Action Validation.

Not Addressed By This Pull Request:

  • Multiple {CUDA,ROCm} Library Version Support
  • Tailored GPU Architecture Support
  • Pipeline support for validating the binaries in the images work
    • This is true of the current CPU image as well.

Known Issues:

  • The linux/arm64 build for CUDA is really slow, but hasn't timed out on me (yet). I don't know why, but I don't find the pipeline delay acceptable so have it disabled for now.

The value of opening up these builds and pushes:

  1. Making sure the Dockerfiles don't go out of date, and changes don't break builds.
  2. Containers tagged with a version:
    • Generally can be now be used from any GPU Cloud provider without a consumer having to build & push their own.
      • This is a huge value proposition for project popularity & adoption by GPU Cloud users.
  3. Containers tagged with a commit hash / branch name from an MR (not done in this MR):
    • Generally opens doors for much more robust CI infrastructure in/on containers, which I'd love to help with, but don't have time to at the moment (but feel free to loop me into conversations).
    • CAN be made available for testing via GPU enabled k8s clusters via an API trigger.
    • Tests COULD be launched in these GPU enabled containers via an API call before a merge.
      • Depending on how the GPU Cluster Access infrastructure & Grants evolve.
    • CAN reduce the need for VM with "always acquired" GPU infrastructure and/or maintenance of the GPU Docker runtimes, to validate GPU builds.

I don't think I will have time to help set up the third item on this list, but that shouldn't stop us from gaining value from the first 2. Most of the effort is setting up additional infrastructure for a third party (like myself) to validate the process, the code changes are just a matter of finessing the tags and triggers.

Enables the GPU enabled container images to be built and pushed
alongside the CPU containers.
@canardleteer
Copy link
Contributor Author

Here's an example of how slow the Action is with linux/arm64, which is why it's disabled. Seems to take about 1 hour and 22 minutes, which isn't acceptable (imo), and why it's disabled:

@ggerganov ggerganov merged commit 980ab41 into ggml-org:master Sep 14, 2023
pkrmf pushed a commit to morlockstudios-com/llama.cpp that referenced this pull request Sep 26, 2023
Enables the GPU enabled container images to be built and pushed
alongside the CPU containers.

Co-authored-by: canardleteer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants