Skip to content

Commit fc690b0

Browse files
docs: fix links in development docs [no ci] (#8481)
Fixes a few links to within the repo that were broken in the reorganization of the documentation in #8325.
1 parent 16bdfa4 commit fc690b0

File tree

2 files changed

+9
-9
lines changed

2 files changed

+9
-9
lines changed

docs/development/HOWTO-add-model.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,15 @@ Adding a model requires few steps:
99
After following these steps, you can open PR.
1010

1111
Also, it is important to check that the examples and main ggml backends (CUDA, METAL, CPU) are working with the new architecture, especially:
12-
- [main](../examples/main)
13-
- [imatrix](../examples/imatrix)
14-
- [quantize](../examples/quantize)
15-
- [server](../examples/server)
12+
- [main](/examples/main/)
13+
- [imatrix](/examples/imatrix/)
14+
- [quantize](/examples/quantize/)
15+
- [server](/examples/server/)
1616

1717
### 1. Convert the model to GGUF
1818

1919
This step is done in python with a `convert` script using the [gguf](https://pypi.org/project/gguf/) library.
20-
Depending on the model architecture, you can use either [convert_hf_to_gguf.py](../convert_hf_to_gguf.py) or [examples/convert_legacy_llama.py](../examples/convert_legacy_llama.py) (for `llama/llama2` models in `.pth` format).
20+
Depending on the model architecture, you can use either [convert_hf_to_gguf.py](/convert_hf_to_gguf.py) or [examples/convert_legacy_llama.py](/examples/convert_legacy_llama.py) (for `llama/llama2` models in `.pth` format).
2121

2222
The convert script reads the model configuration, tokenizer, tensor names+data and converts them to GGUF metadata and tensors.
2323

@@ -31,7 +31,7 @@ class MyModel(Model):
3131
model_arch = gguf.MODEL_ARCH.GROK
3232
```
3333

34-
2. Define the layout of the GGUF tensors in [constants.py](../gguf-py/gguf/constants.py)
34+
2. Define the layout of the GGUF tensors in [constants.py](/gguf-py/gguf/constants.py)
3535

3636
Add an enum entry in `MODEL_ARCH`, the model human friendly name in `MODEL_ARCH_NAMES` and the GGUF tensor names in `MODEL_TENSORS`.
3737

@@ -54,7 +54,7 @@ Example for `falcon` model:
5454

5555
As a general rule, before adding a new tensor name to GGUF, be sure the equivalent naming does not already exist.
5656

57-
Once you have found the GGUF tensor name equivalent, add it to the [tensor_mapping.py](../gguf-py/gguf/tensor_mapping.py) file.
57+
Once you have found the GGUF tensor name equivalent, add it to the [tensor_mapping.py](/gguf-py/gguf/tensor_mapping.py) file.
5858

5959
If the tensor name is part of a repetitive layer/block, the key word `bid` substitutes it.
6060

@@ -100,7 +100,7 @@ Have a look at existing implementation like `build_llama`, `build_dbrx` or `buil
100100

101101
When implementing a new graph, please note that the underlying `ggml` backends might not support them all, support for missing backend operations can be added in another PR.
102102

103-
Note: to debug the inference graph: you can use [llama-eval-callback](../examples/eval-callback).
103+
Note: to debug the inference graph: you can use [llama-eval-callback](/examples/eval-callback/).
104104

105105
## GGUF specification
106106

docs/development/token_generation_performance_tips.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Token generation performance troubleshooting
22

33
## Verifying that the model is running on the GPU with CUDA
4-
Make sure you compiled llama with the correct env variables according to [this guide](../README.md#CUDA), so that llama accepts the `-ngl N` (or `--n-gpu-layers N`) flag. When running llama, you may configure `N` to be very large, and llama will offload the maximum possible number of layers to the GPU, even if it's less than the number you configured. For example:
4+
Make sure you compiled llama with the correct env variables according to [this guide](/docs/build.md#cuda), so that llama accepts the `-ngl N` (or `--n-gpu-layers N`) flag. When running llama, you may configure `N` to be very large, and llama will offload the maximum possible number of layers to the GPU, even if it's less than the number you configured. For example:
55
```shell
66
./llama-cli -m "path/to/model.gguf" -ngl 200000 -p "Please sir, may I have some "
77
```

0 commit comments

Comments
 (0)