kompute : disable GPU offload for Mixtral

cebtenzzre · cebtenzzre · commit 04d7e6595c84 · 2025-01-28T13:18:15.000-05:00
We haven't implemented the necessary GPU kernels yet.

Fixes this crash:

ggml_vk_graph_compute: error: unsupported op 'ARGSORT'
GGML_ASSERT: /home/jared/src/forks/gpt4all/gpt4all-backend/llama.cpp-mainline/ggml-kompute.cpp:1508: !"unsupported op"
diff --git a/src/llama.cpp b/src/llama.cpp
@@ -9050,6 +9050,7 @@ static int llama_model_load(const std::string & fname, llama_model & model, llam
             model.using_gpu = false;
         } else if (
             !(model.arch == LLM_ARCH_LLAMA || model.arch == LLM_ARCH_FALCON)
+            || model.hparams.n_expert > 0
             || !(
                 model.ftype == LLAMA_FTYPE_ALL_F32 ||
                 model.ftype == LLAMA_FTYPE_MOSTLY_F16 ||