Skip to content

Commit 8f0272c

Browse files
author
Lorenzo Toniazzi
committed
update branch notes
1 parent 284e665 commit 8f0272c

File tree

2 files changed

+54
-8
lines changed

2 files changed

+54
-8
lines changed

BRANCH_SETUP.md renamed to _BRANCH_SETUP.md

Lines changed: 49 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -32,13 +32,14 @@ Run main with base model and lora adapter to hot-swap
3232
```bash
3333
./main -m ./models/open-llama/ggml-model-f16.gguf \
3434
--hot-lora models/open-llama/lora-ggml-model-q8_0-hot-lora-LATEST.bin \
35-
-ngl 0 \
35+
-ngl 99 \
36+
-n 128
37+
```
38+
```bash
39+
./main -m ./models/open-llama/ggml-model-f16.gguf \
40+
-ngl 99 \
3641
-n 128
3742
```
38-
39-
Working but `ggml_metal_get_buffer: error: tensor 'blk.16.attn_v.weight.loraB' buffer is nil`
40-
41-
With `ngl > 0` the code breaks. Probably because the Lora tensors try to interact with the base tensors (as in `lora_mul_mat`), but the lora tensors are not moved to the gpu buffer of the base tensors.
4243

4344
# Logic
4445

@@ -299,4 +300,46 @@ int main() {
299300
//
300301
301302
}
302-
```
303+
```
304+
305+
306+
307+
```bash
308+
# Convert base model to gguf
309+
python3 convert-hf-to-gguf.py models/open-llama/ && \
310+
# Quantize base model
311+
./quantize ./models/open-llama/ggml-model-f16.gguf ./models/open-llama/ggml-model-q4.gguf Q4_K && \
312+
# Obtain Lora adapter
313+
./finetune --model-base models/open-llama/ggml-model-q4.gguf \
314+
--checkpoint-in models/open-llama/chk-lora-ggml-model-q4-hot-lora-LATEST.gguf \
315+
--checkpoint-out models/open-llama/chk-lora-ggml-model-q4-hot-lora-ITERATION.gguf \
316+
--lora-out models/open-llama/lora-ggml-model-q4-hot-lora-ITERATION.bin \
317+
--train-data "data/hot-lora.txt" \
318+
--save-every 1 \
319+
--threads 1 \
320+
--adam-iter 1 \
321+
--batch 1 \
322+
--ctx 16 \
323+
--use-checkpointing
324+
```
325+
326+
</details>
327+
328+
## 1. Run main with adapter
329+
330+
- Run main with base model and lora adapter to hot-swap
331+
```bash
332+
./main -m ./models/open-llama/ggml-model-q4.gguf \
333+
--hot-lora models/open-llama/lora-ggml-model-q4-hot-lora-LATEST.bin \
334+
-ngl 99 \
335+
-n 128
336+
```
337+
338+
- Do not pass the flag `--hot-lora` and the adapter is ignored:
339+
```bash
340+
./main -m ./models/open-llama/ggml-model-q4.gguf \
341+
-ngl 99 \
342+
-n 128
343+
```
344+
345+
make clean && make -j 8 LLAMA_DEBUG=1

llama.cpp

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9731,8 +9731,11 @@ struct llm_build_context {
97319731
ggml_tensor * loraB = it->second.loraB;
97329732

97339733
ggml_tensor * t_lora = ggml_mul_mat(ctx0,
9734-
ggml_mul_mat(ctx0, loraA, loraB),
9735-
cur
9734+
loraA,
9735+
ggml_mul_mat(ctx0,
9736+
ggml_transpose(ctx0, loraB),
9737+
cur
9738+
)
97369739
);
97379740

97389741
if (lctx.lora_scale != 1.0f) {

0 commit comments

Comments
 (0)