Skip to content

ci : add LoRA test to CI #2650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 27, 2023
Merged

ci : add LoRA test to CI #2650

merged 5 commits into from
Aug 27, 2023

Conversation

slaren
Copy link
Member

@slaren slaren commented Aug 18, 2023

Downloads a LoRA trained on shakespeare.txt and compares the perplexity on this dataset with and without applying the LoRA.

Only for 3B and f16 currently, if it looks ok I can try training another LoRA for 7B, and possibly add tests for quantized models.

Fixes #2634

@slaren
Copy link
Member Author

slaren commented Aug 18, 2023

It would be good to test LoRA with quantized models as well, both with and without a f16 --lora-base model, but it looks like we are already very close to the 20 minute time limit. Can we do anything about that?

@ggerganov
Copy link
Member

ggerganov commented Aug 18, 2023

I can easily increase it, but the runs will start to take forever eventually as we add more tests.

I can deploy more nodes, so another solution is to group the tests into groups and have different nodes run different groups. To do that, we have to make run.sh check an env variable to determine which group the node is serving:

Here are the current env variables on the CUDA node for example:

https://github.com/ggml-org/ci/tree/results/llama.cpp/f6/03b287bec853b69f6e963377626f26ec560d92/ggml-4-x86-cuda-v100#environment

We can add GG_BUILD_GROUP and use it in the script to run or skip tests.

@slaren
Copy link
Member Author

slaren commented Aug 18, 2023

I have added a test with q8_0 only for now, hopefully it is not too slow. This is with CPU only, the CUDA backend only supports LoRA with f16 models.

@slaren slaren marked this pull request as ready for review August 18, 2023 17:04
@slaren
Copy link
Member Author

slaren commented Aug 18, 2023

Looks like it didn't timeout. This should be good enough for now, we can add the rest of the quantized models once we figure the build groups.

Some things to review:

  • I am not sure if I am following the naming convention of the files very well
  • I noticed that the 7B CUDA perplexity tests have -t 1, but not the generation tests, so I added it to these as well

@ggerganov
Copy link
Member

Let's update this PR after #2398 merge and updating the convert-lora-to-ggml.py script to export .gguf

@ggerganov
Copy link
Member

I've bumped the CI timeout to 30 minutes.

For now, we can keep just the F16 and Q8_0 LoRAs as I think this covers large portion of the functionality and keeps the time slot small. Will merge this if the CI passes

@ggerganov ggerganov merged commit 789c8c9 into master Aug 27, 2023
@ggerganov ggerganov deleted the lora-ci branch August 27, 2023 07:03
akawrykow pushed a commit to akawrykow/llama.cpp that referenced this pull request Aug 29, 2023
* ci : add lora test

ggml-ci

* move lora summary to the top, add lora logs

ggml-ci

* ci : decrease CPU ppl runs to 2 to avoide 20 min timeout

ggml-ci

* add 7b lora test

use 1 thread for CUDA generation tests

ggml-ci

* add test with q8_0 (cpu only)

ggml-ci

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

llama : add LoRA test to CI
2 participants