-
Notifications
You must be signed in to change notification settings - Fork 12.1k
tests : add test-model-random #14139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This generates random models and then tests different concurrencies of batches to check if the output is consistent. This can detect when e.g. the recurrent cache has been broken, or anything else which would affect the consistency of the output when inferencing multiple distinct sequences. More architectures will be added, but for now this starts with Mamba. Eventually, consistency of pooled embeddings will also be tested. The goal is to reduce accidental regressions by making it easy to quickly test a lot of edge cases on the supported architectures, without having to download any model.
868471b
to
9cd402c
Compare
* tests : fix integer types in test-model-random
I had a discussion a while ago with @ggerganov about a similar idea, it's nice to see you doing this! My idea was to have a full pipeline of:
The random model can be stored on ggml-org. For example, we can easily generate a tiny random HF model like this one I think this PR can be an interesting step toward that idea. Lmk if I can help! |
@ngxson I was going to say that I feel like regressions are less likely to happen with conversion, but that's not true given the recent change of using
We might want to start adding a way to list links of models of a given architecture so that either they can be tested directly or their config can be used to generate random models to test. Not sure yet how to make the link lists look less like endorsement, and more like "here are some reference models for this architecture, which are expected to always convert properly". We might need a way to make sparse model files (with holes where the tensors data would be) and load that with mmap, to at least make sure the shapes load properly, or some other way to test that.
I would prefer the random models to be generated locally, so that the tests don't need fast network on every new environment they are put on. But I agree testing Since that would depend on Python and Some things are more convenient to test separately, though, like tokenizers, chat templates, quantization, backend ops, etc. My intention here is mainly to test batch splits and consistency of Since this will test that multi-user batches work correctly, then testing for the correctness of model graphs will really only need to handle trivial single-user batches. In a way, this will reduce the scope of logits correctness testing (and make it simpler). I will keep this in mind, but for now the scope of |
This generates random models and then tests different concurrencies of batches to check if the output is consistent.
This can detect when e.g. the recurrent cache has been broken (which has been a problem in the last months, see #13834 (comment)), or anything else which would affect the consistency of the output when inferencing multiple distinct sequences.
More architectures will be added, but for now this starts with Mamba.
Eventually, consistency of pooled embeddings will also be tested.
The goal is to reduce accidental regressions by making it easy to quickly test a lot of edge cases on the supported architectures, without having to download any model.
Draft for now because it's very much a work-in-progress, although it's kind of usable.
Example output
(takes around 11 seconds on a fast laptop, which means the sizes might need to be reduced once more architectures are added to the tests)
TODO
arch
to test from command-line argstmpfile()
because the file needs to have a known name.seq_cp
andseq_rm
tests/test-backend-ops.cpp
LLM_ARCH_LLAMA
LLM_ARCH_LLAMA4
LLM_ARCH_GEMMA2
LLM_ARCH_RWKV6
LLM_ARCH_RWKV7
LLM_ARCH_MAMBA
LLM_ARCH_MAMBA2
Make sure to read the contributing guidelines before submitting a PR