Skip to content

Commit cb2c688

Browse files
committed
Update doc for MUSA
Signed-off-by: Xiaodong Ye <[email protected]>
1 parent a977c11 commit cb2c688

File tree

2 files changed

+14
-0
lines changed

2 files changed

+14
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -405,6 +405,7 @@ Please refer to [Build llama.cpp locally](./docs/build.md)
405405
| [BLAS](./docs/build.md#blas-build) | All |
406406
| [BLIS](./docs/backend/BLIS.md) | All |
407407
| [SYCL](./docs/backend/SYCL.md) | Intel and Nvidia GPU |
408+
| [MUSA](./docs/build.md#musa) | Moore Threads GPU |
408409
| [CUDA](./docs/build.md#cuda) | Nvidia GPU |
409410
| [hipBLAS](./docs/build.md#hipblas) | AMD GPU |
410411
| [Vulkan](./docs/build.md#vulkan) | GPU |

docs/build.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -181,6 +181,19 @@ The environment variable [`CUDA_VISIBLE_DEVICES`](https://docs.nvidia.com/cuda/c
181181
| GGML_CUDA_PEER_MAX_BATCH_SIZE | Positive integer | 128 | Maximum batch size for which to enable peer access between multiple GPUs. Peer access requires either Linux or NVLink. When using NVLink enabling peer access for larger batch sizes is potentially beneficial. |
182182
| GGML_CUDA_FA_ALL_QUANTS | Boolean | false | Compile support for all KV cache quantization type (combinations) for the FlashAttention CUDA kernels. More fine-grained control over KV cache size but compilation takes much longer. |
183183
184+
### MUSA
185+
186+
- Using `make`:
187+
```bash
188+
make GGML_MUSA=1
189+
```
190+
- Using `CMake`:
191+
192+
```bash
193+
cmake -B build -DGGML_MUSA=ON
194+
cmake --build build --config Release
195+
```
196+
184197
### hipBLAS
185198
186199
This provides BLAS acceleration on HIP-supported AMD GPUs.

0 commit comments

Comments
 (0)