Skip to content

Commit 17bb928

Browse files
committed
readme : remove --memory-f32 references (#9925)
1 parent 9f45fc1 commit 17bb928

File tree

2 files changed

+3
-7
lines changed

2 files changed

+3
-7
lines changed

examples/main/README.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -297,10 +297,6 @@ These options help improve the performance and memory usage of the LLaMA models.
297297

298298
These flags attempt optimizations that help on some systems with non-uniform memory access. This currently consists of one of the above strategies, and disabling prefetch and readahead for mmap. The latter causes mapped pages to be faulted in on first access instead of all at once, and in combination with pinning threads to NUMA nodes, more of the pages end up on the NUMA node where they are used. Note that if the model is already in the system page cache, for example because of a previous run without this option, this will have little effect unless you drop the page cache first. This can be done by rebooting the system or on Linux by writing '3' to '/proc/sys/vm/drop_caches' as root.
299299

300-
### Memory Float 32
301-
302-
- `--memory-f32`: Use 32-bit floats instead of 16-bit floats for memory key+value. This doubles the context memory requirement and cached prompt file size but does not appear to increase generation quality in a measurable way. Not recommended.
303-
304300
### Batch Size
305301

306302
- `-b N, --batch-size N`: Set the batch size for prompt processing (default: `2048`). This large batch size benefits users who have BLAS installed and enabled it during the build. If you don't have BLAS enabled ("BLAS=0"), you can use a smaller number, such as 8, to see the prompt progress as it's evaluated in some situations.

scripts/run-with-preset.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
"export", "file", "frequency-penalty", "grammar", "grammar-file", "hellaswag",
1616
"hellaswag-tasks", "ignore-eos", "in-prefix", "in-prefix-bos", "in-suffix",
1717
"interactive", "interactive-first", "keep", "logdir", "logit-bias", "lora", "lora-base",
18-
"low-vram", "main-gpu", "memory-f32", "mirostat", "mirostat-ent", "mirostat-lr", "mlock",
18+
"low-vram", "main-gpu", "mirostat", "mirostat-ent", "mirostat-lr", "mlock",
1919
"model", "multiline-input", "n-gpu-layers", "n-predict", "no-mmap", "no-mul-mat-q",
2020
"np-penalize-nl", "numa", "ppl-output-type", "ppl-stride", "presence-penalty", "prompt",
2121
"prompt-cache", "prompt-cache-all", "prompt-cache-ro", "repeat-last-n",
@@ -25,12 +25,12 @@
2525
]
2626

2727
CLI_ARGS_LLAMA_BENCH = [
28-
"batch-size", "memory-f32", "low-vram", "model", "mul-mat-q", "n-gen", "n-gpu-layers",
28+
"batch-size", "low-vram", "model", "mul-mat-q", "n-gen", "n-gpu-layers",
2929
"n-prompt", "output", "repetitions", "tensor-split", "threads", "verbose"
3030
]
3131

3232
CLI_ARGS_LLAMA_SERVER = [
33-
"alias", "batch-size", "ctx-size", "embedding", "host", "memory-f32", "lora", "lora-base",
33+
"alias", "batch-size", "ctx-size", "embedding", "host", "lora", "lora-base",
3434
"low-vram", "main-gpu", "mlock", "model", "n-gpu-layers", "n-probs", "no-mmap", "no-mul-mat-q",
3535
"numa", "path", "port", "rope-freq-base", "timeout", "rope-freq-scale", "tensor-split",
3636
"threads", "verbose"

0 commit comments

Comments
 (0)