Skip to content

Commit 51b3b56

Browse files
committed
Prevent offloading of more than 33 layers
1 parent 27d0c11 commit 51b3b56

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

llama.cpp

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3020,6 +3020,14 @@ static void llm_load_tensors(
30203020
ggml_backend_type backend_norm;
30213021
ggml_backend_type backend_output;
30223022

3023+
// Don't allow for offloading of more than 33 layers.
3024+
// Offloading 34 layers causes model to respond with letter 'E'
3025+
// Offloading 35 layers doesn't work because of missing cuda implementation for rope:
3026+
// GGML_ASSERT: ggml-cuda.cu:6402: ne00 == n_dims && "ne00 != n_dims is not implemented for CUDA yet"
3027+
if (n_gpu_layers > 33) {
3028+
n_gpu_layers = 33;
3029+
}
3030+
30233031
if (n_gpu_layers > int(n_layer)) {
30243032
// norm is not performance relevant on its own but keeping it in VRAM reduces data copying
30253033
// on Windows however this is detrimental unless everything is on the GPU

0 commit comments

Comments
 (0)