Skip to content

ci : switch cudatoolkit install on windows to networked #3236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 18, 2023

Conversation

Green-Sky
Copy link
Collaborator

@Green-Sky Green-Sky commented Sep 17, 2023

as proposed in #3232 , this switches to the networked installer that should reduce the traffic. In my testing it reduces the time from 15-20min down to ~10min.
Even if it does not reduce the time by much, it reduces the cache size significantly.

run: https://github.com/Green-Sky/llama.cpp/actions/runs/6215910973/job/16869261559

@staviq
Copy link
Contributor

staviq commented Sep 17, 2023

I think if you manage to get the install files cached it will be even faster ( the network method downloads about 230M of additional files besides that already cached ~30M )

@Green-Sky Green-Sky merged commit 7ddf185 into ggml-org:master Sep 18, 2023
@staviq
Copy link
Contributor

staviq commented Sep 18, 2023

It's still using the ~3G files from cache, perhaps the cache should be cleaned ?

This 100% worked on my self hosted runner and I didn't have cache there

image

image

@Green-Sky
Copy link
Collaborator Author

@staviq they should all be pullrequests. remind me in a week to clear the caches :)

@Green-Sky Green-Sky deleted the ci_cuda_window_network branch September 18, 2023 12:11
@staviq
Copy link
Contributor

staviq commented Sep 18, 2023

@staviq they should all be pullrequests. remind me in a week to clear the caches :)

I was testing a PR for this but you were faster :)

I don't think the cache problem can be solved via PR, that seems to be a GitHub thing.

@Green-Sky
Copy link
Collaborator Author

I was testing a PR for this but you were faster :)

:)

I don't think the cache problem can be solved via PR, that seems to be a GitHub thing.

no, I meant, the old cudatoolkit caches are still used by PRs that have not pulled the workflow changes from master yet.

@staviq
Copy link
Contributor

staviq commented Sep 18, 2023

Ok, it works now. The actual cuda installer just takes ~13mins according to CI logs.

At this point, the only thing that can speed this up, is a runner with preinstalled cuda.

I might be able to set one up (an actual HP server), but I'll know for sure tomorrow.

@staviq
Copy link
Contributor

staviq commented Sep 20, 2023

In the meantime, cuda CI, and most CI jobs in fact, run cmake --build non-parallel.

Adding -j (without value) here: https://github.com/ggerganov/llama.cpp/blob/7eb41179edc56083ef4eb2df7967ac9ff38b34fb/.github/workflows/build.yml#L418

Speeds up cuda CI build time by ~25% ( on my runner ), though github runners are only dual core from what I've found, so it might not make a difference with github hosted runners.

I have my runner server set up with Windows Server 2022, and I'm testing it, but it turns out, even if I pre-install cuda, that Jimver/cuda-toolkit action is a bit dumb, and always reinstalls cuda either way. Even though it only takes ~4 min to reinstall cuda on my runner, that is still 4 min completly wasted.

I tried disabling cuda-toolkit action to use only the preinstalled cuda, and together with -j this brought down the time for that entire CI to ~3 min.

Except, that workflow uses env variables provided by cuda-toolkit action, and the final packaging step fails.

So that's my current progress:

image

pkrmf pushed a commit to morlockstudios-com/llama.cpp that referenced this pull request Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Github CI: CUDA build times can be significantly reduced ( roughly by 80-90% )
3 participants