You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -247,27 +247,27 @@ Instructions for adding support for new models: [HOWTO-add-model.md](./docs/deve
247
247
248
248
| Backend | Target devices |
249
249
| --- | --- |
250
-
|[Metal](./docs/build.md#metal-build)| Apple Silicon |
251
-
|[BLAS](./docs/build.md#blas-build)| All |
252
-
|[BLIS](./docs/backend/BLIS.md)| All |
253
-
|[SYCL](./docs/backend/SYCL.md)| Intel and Nvidia GPU |
254
-
|[MUSA](./docs/build.md#musa)| Moore Threads MTT GPU |
255
-
|[CUDA](./docs/build.md#cuda)| Nvidia GPU |
256
-
|[hipBLAS](./docs/build.md#hipblas)| AMD GPU |
257
-
|[Vulkan](./docs/build.md#vulkan)| GPU |
258
-
|[CANN](./docs/build.md#cann)| Ascend NPU |
259
-
260
-
## Building and usage
250
+
|[Metal](docs/build.md#metal-build)| Apple Silicon |
251
+
|[BLAS](docs/build.md#blas-build)| All |
252
+
|[BLIS](docs/backend/BLIS.md)| All |
253
+
|[SYCL](docs/backend/SYCL.md)| Intel and Nvidia GPU |
254
+
|[MUSA](docs/build.md#musa)| Moore Threads MTT GPU |
255
+
|[CUDA](docs/build.md#cuda)| Nvidia GPU |
256
+
|[hipBLAS](docs/build.md#hipblas)| AMD GPU |
257
+
|[Vulkan](docs/build.md#vulkan)| GPU |
258
+
|[CANN](docs/build.md#cann)| Ascend NPU |
259
+
260
+
## Building the project
261
261
262
262
The main product of this project is the `llama` library. Its C-style interface can be found in [include/llama.h](include/llama.h).
263
263
The project also includes many example programs and tools using the `llama` library. The examples range from simple, minimal code snippets to sophisticated sub-projects such as an OpenAI-compatible HTTP server. Possible methods for obtaining the binaries:
264
264
265
-
- Clone this repository and build locally, see [how to build](./docs/build.md)
266
-
- On MacOS or Linux, install `llama.cpp` via [brew, flox or nix](./docs/install.md)
267
-
- Use a Docker image, see [documentation for Docker](./docs/docker.md)
265
+
- Clone this repository and build locally, see [how to build](docs/build.md)
266
+
- On MacOS or Linux, install `llama.cpp` via [brew, flox or nix](docs/install.md)
267
+
- Use a Docker image, see [documentation for Docker](docs/docker.md)
268
268
- Download pre-built binaries from [releases](https://github.com/ggerganov/llama.cpp/releases)
269
269
270
-
###Obtaining and quantizing models
270
+
## Obtaining and quantizing models
271
271
272
272
The [Hugging Face](https://huggingface.co) platform hosts a [number of LLMs](https://huggingface.co/models?library=gguf&sort=trending) compatible with `llama.cpp`:
273
273
@@ -285,79 +285,204 @@ The Hugging Face platform provides a variety of online tools for converting, qua
285
285
- Use the [GGUF-editor space](https://huggingface.co/spaces/CISCai/gguf-editor) to edit GGUF meta data in the browser (more info: https://github.com/ggerganov/llama.cpp/discussions/9268)
286
286
- Use the [Inference Endpoints](https://ui.endpoints.huggingface.co/) to directly host `llama.cpp` in the cloud (more info: https://github.com/ggerganov/llama.cpp/discussions/9669)
287
287
288
-
To learn more about model quantization, [read this documentation](./examples/quantize/README.md)
288
+
To learn more about model quantization, [read this documentation](examples/quantize/README.md)
289
289
290
-
### Using the `llama-cli` tool
290
+
##[`llama-cli`](examples/main)
291
291
292
-
Run a basic text completion:
292
+
#### A CLI tool for accessing and experimenting with most of `llama.cpp`'s functionality.
293
293
294
-
```bash
295
-
llama-cli -m your_model.gguf -p "I believe the meaning of life is" -n 128
294
+
- <detailsopen>
295
+
<summary>Run simple text completion</summary>
296
296
297
-
# Output:
298
-
# I believe the meaning of life is to find your own truth and to live in accordance with it. For me, this means being true to myself and following my passions, even if they don't align with societal expectations. I think that's what I love about yoga – it's not just a physical practice, but a spiritual one too. It's about connecting with yourself, listening to your inner voice, and honoring your own unique journey.
299
-
```
297
+
```bash
298
+
llama-cli -m model.gguf -p "I believe the meaning of life is" -n 128
300
299
301
-
See [this page](./examples/main/README.md) for a full list of parameters.
300
+
# I believe the meaning of life is to find your own truth and to live in accordance with it. For me, this means being true to myself and following my passions, even if they don't align with societal expectations. I think that's what I love about yoga – it's not just a physical practice, but a spiritual one too. It's about connecting with yourself, listening to your inner voice, and honoring your own unique journey.
301
+
```
302
302
303
-
### Conversation mode
303
+
</details>
304
304
305
-
Run `llama-cli` in conversation/chat mode by passing the `-cnv` parameter:
305
+
- <details>
306
+
<summary>Run in conversation mode</summary>
306
307
307
-
```bash
308
-
llama-cli -m your_model.gguf -p "You are a helpful assistant" -cnv
308
+
```bash
309
+
llama-cli -m model.gguf -p "You are a helpful assistant" -cnv
309
310
310
-
# Output:
311
-
# > hi, who are you?
312
-
# Hi there! I'm your helpful assistant! I'm an AI-powered chatbot designed to assist and provide information to users like you. I'm here to help answer your questions, provide guidance, and offer support on a wide range of topics. I'm a friendly and knowledgeable AI, and I'm always happy to help with anything you need. What's on your mind, and how can I assist you today?
313
-
#
314
-
# > what is 1+1?
315
-
# Easy peasy! The answer to 1+1 is... 2!
316
-
```
311
+
# > hi, who are you?
312
+
# Hi there! I'm your helpful assistant! I'm an AI-powered chatbot designed to assist and provide information to users like you. I'm here to help answer your questions, provide guidance, and offer support on a wide range of topics. I'm a friendly and knowledgeable AI, and I'm always happy to help with anything you need. What's on your mind, and how can I assist you today?
313
+
#
314
+
# > what is 1+1?
315
+
# Easy peasy! The answer to 1+1 is... 2!
316
+
```
317
317
318
-
By default, the chat template will be taken from the input model. If you want to use another chat template, pass `--chat-template NAME` as a parameter. See the list of [supported templates](https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template)
318
+
</details>
319
319
320
-
```bash
321
-
llama-cli -m your_model.gguf -p "You are a helpful assistant" -cnv --chat-template chatml
322
-
```
320
+
- <details>
321
+
<summary>Run with custom chat template</summary>
323
322
324
-
You can also use your own template via in-prefix, in-suffix and reverse-prompt parameters:
323
+
```bash
324
+
# use the "chatml" template
325
+
llama-cli -m model.gguf -p "You are a helpful assistant" -cnv --chat-template chatml
325
326
326
-
```bash
327
-
llama-cli -m your_model.gguf -p "You are a helpful assistant" -cnv --in-prefix 'User: ' --reverse-prompt 'User:'
328
-
```
327
+
# use a custom template
328
+
llama-cli -m model.gguf -p "You are a helpful assistant" -cnv --in-prefix 'User: ' --reverse-prompt 'User:'
`llama.cpp` can constrain the output of the model via custom grammars. For example, you can force the model to output only JSON:
333
+
</details>
333
334
334
-
```bash
335
-
llama-cli -m your_model.gguf -n 256 --grammar-file grammars/json.gbnf -p 'Request: schedule a call at 8pm; Command:'
336
-
```
335
+
- <details>
336
+
<summary>Constrain the output with a custom grammar</summary>
337
337
338
-
The `grammars/` folder contains a handful of sample grammars. To write your own, check out the [GBNF Guide](./grammars/README.md).
338
+
```bash
339
+
llama-cli -m model.gguf -n 256 --grammar-file grammars/json.gbnf -p 'Request: schedule a call at 8pm; Command:'
339
340
340
-
For authoring more complex JSON grammars, check out https://grammar.intrinsiclabs.ai/
341
+
# {"appointmentTime": "8pm", "appointmentDetails": "schedule a a call"}
342
+
```
341
343
342
-
### Web server (`llama-server`)
344
+
The [grammars/](grammars/) folder contains a handful of sample grammars. To write your own, check out the [GBNF Guide](grammars/README.md).
343
345
344
-
The [llama-server](./examples/server/README.md) is a lightweight [OpenAI API](https://github.com/openai/openai-openapi) compatible HTTP server that can be used to serve local models and easily connect them to existing clients.
346
+
For authoring more complex JSON grammars, check out https://grammar.intrinsiclabs.ai/
345
347
346
-
Example usage:
348
+
</details>
347
349
348
-
```bash
349
-
llama-server -m your_model.gguf --port 8080
350
350
351
-
# Basic web UI can be accessed via browser: http://localhost:8080
If your issue is with model generation quality, then please at least scan the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT:
390
515
- LLaMA:
@@ -395,3 +520,6 @@ If your issue is with model generation quality, then please at least scan the fo
395
520
- GPT-3.5 / InstructGPT / ChatGPT:
396
521
- [Aligning language models to follow instructions](https://openai.com/research/instruction-following)
397
522
- [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
0 commit comments