Skip to content

Commit 93a9019

Browse files
committed
Merge branch 'main' of github.com:abetlen/llama_cpp_python into Maximilian-Winter/main
2 parents f315b82 + 7499fc1 commit 93a9019

File tree

18 files changed

+1107
-565
lines changed

18 files changed

+1107
-565
lines changed

.github/ISSUE_TEMPLATE/bug_report.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
---
2+
name: Bug report
3+
about: Create a report to help us improve
4+
title: ''
5+
labels: ''
6+
assignees: ''
7+
8+
---
9+
10+
# Prerequisites
11+
12+
Please answer the following questions for yourself before submitting an issue.
13+
14+
- [ ] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
15+
- [ ] I carefully followed the [README.md](https://github.com/abetlen/llama-cpp-python/blob/main/README.md).
16+
- [ ] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
17+
- [ ] I reviewed the [Discussions](https://github.com/abetlen/llama-cpp-python/discussions), and have a new bug or useful enhancement to share.
18+
19+
# Expected Behavior
20+
21+
Please provide a detailed written description of what you were trying to do, and what you expected `llama-cpp-python` to do.
22+
23+
# Current Behavior
24+
25+
Please provide a detailed written description of what `llama-cpp-python` did, instead.
26+
27+
# Environment and Context
28+
29+
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
30+
31+
* Physical (or virtual) hardware you are using, e.g. for Linux:
32+
33+
`$ lscpu`
34+
35+
* Operating System, e.g. for Linux:
36+
37+
`$ uname -a`
38+
39+
* SDK version, e.g. for Linux:
40+
41+
```
42+
$ python3 --version
43+
$ make --version
44+
$ g++ --version
45+
```
46+
47+
# Failure Information (for bugs)
48+
49+
Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.
50+
51+
# Steps to Reproduce
52+
53+
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
54+
55+
1. step 1
56+
2. step 2
57+
3. step 3
58+
4. etc.
59+
60+
**Note: Many issues seem to be regarding performance issues / differences with `llama.cpp`. In these cases we need to confirm that you're comparing against the version of `llama.cpp` that was built with your python package, and which parameters you're passing to the context.**
61+
62+
# Failure Logs
63+
64+
Please include any relevant log snippets or files. If it works under one configuration but not under another, please provide logs for both configurations and their corresponding outputs so it is easy to see where behavior changes.
65+
66+
Also, please try to **avoid using screenshots** if at all possible. Instead, copy/paste the console output and use [Github's markdown](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) to cleanly format your logs for easy readability.
67+
68+
Example environment info:
69+
```
70+
llama-cpp-python$ git log | head -1
71+
commit 47b0aa6e957b93dbe2c29d53af16fbae2dd628f2
72+
73+
llama-cpp-python$ python3 --version
74+
Python 3.10.10
75+
76+
llama-cpp-python$ pip list | egrep "uvicorn|fastapi|sse-starlette"
77+
fastapi 0.95.0
78+
sse-starlette 1.3.3
79+
uvicorn 0.21.1
80+
```
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
---
2+
name: Feature request
3+
about: Suggest an idea for this project
4+
title: ''
5+
labels: ''
6+
assignees: ''
7+
8+
---
9+
10+
**Is your feature request related to a problem? Please describe.**
11+
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
12+
13+
**Describe the solution you'd like**
14+
A clear and concise description of what you want to happen.
15+
16+
**Describe alternatives you've considered**
17+
A clear and concise description of any alternative solutions or features you've considered.
18+
19+
**Additional context**
20+
Add any other context or screenshots about the feature request here.

README.md

Lines changed: 49 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ This package provides:
1515
- OpenAI-like API
1616
- LangChain compatibility
1717

18-
## Installation
18+
## Installation from PyPI (recommended)
1919

2020
Install from PyPI (requires a c compiler):
2121

@@ -26,11 +26,37 @@ pip install llama-cpp-python
2626
The above command will attempt to install the package and build build `llama.cpp` from source.
2727
This is the recommended installation method as it ensures that `llama.cpp` is built with the available optimizations for your system.
2828

29-
This method defaults to using `make` to build `llama.cpp` on Linux / MacOS and `cmake` on Windows.
30-
You can force the use of `cmake` on Linux / MacOS setting the `FORCE_CMAKE=1` environment variable before installing.
29+
30+
### Installation with OpenBLAS / cuBLAS / CLBlast
31+
32+
`llama.cpp` supports multiple BLAS backends for faster processing.
33+
Use the `FORCE_CMAKE=1` environment variable to force the use of `cmake` and install the pip package for the desired BLAS backend.
34+
35+
To install with OpenBLAS, set the `LLAMA_OPENBLAS=1` environment variable before installing:
36+
37+
```bash
38+
LLAMA_OPENBLAS=1 FORCE_CMAKE=1 pip install llama-cpp-python
39+
```
40+
41+
To install with cuBLAS, set the `LLAMA_CUBLAS=1` environment variable before installing:
42+
43+
```bash
44+
LLAMA_CUBLAS=1 FORCE_CMAKE=1 pip install llama-cpp-python
45+
```
46+
47+
To install with CLBlast, set the `LLAMA_CLBLAST=1` environment variable before installing:
48+
49+
```bash
50+
LLAMA_CLBLAST=1 FORCE_CMAKE=1 pip install llama-cpp-python
51+
```
52+
3153

3254
## High-level API
3355

56+
The high-level API provides a simple managed interface through the `Llama` class.
57+
58+
Below is a short example demonstrating how to use the high-level API to generate text:
59+
3460
```python
3561
>>> from llama_cpp import Llama
3662
>>> llm = Llama(model_path="./models/7B/ggml-model.bin")
@@ -64,18 +90,9 @@ This allows you to use llama.cpp compatible models with any OpenAI compatible cl
6490

6591
To install the server package and get started:
6692

67-
Linux/MacOS
6893
```bash
6994
pip install llama-cpp-python[server]
70-
export MODEL=./models/7B/ggml-model.bin
71-
python3 -m llama_cpp.server
72-
```
73-
74-
Windows
75-
```cmd
76-
pip install llama-cpp-python[server]
77-
SET MODEL=..\models\7B\ggml-model.bin
78-
python3 -m llama_cpp.server
95+
python3 -m llama_cpp.server --model models/7B/ggml-model.bin
7996
```
8097

8198
Navigate to [http://localhost:8000/docs](http://localhost:8000/docs) to see the OpenAPI documentation.
@@ -90,8 +107,25 @@ docker run --rm -it -p8000:8000 -v /path/to/models:/models -eMODEL=/models/ggml-
90107

91108
## Low-level API
92109

93-
The low-level API is a direct `ctypes` binding to the C API provided by `llama.cpp`.
94-
The entire API can be found in [llama_cpp/llama_cpp.py](https://github.com/abetlen/llama-cpp-python/blob/master/llama_cpp/llama_cpp.py) and should mirror [llama.h](https://github.com/ggerganov/llama.cpp/blob/master/llama.h).
110+
The low-level API is a direct [`ctypes`](https://docs.python.org/3/library/ctypes.html) binding to the C API provided by `llama.cpp`.
111+
The entire lowe-level API can be found in [llama_cpp/llama_cpp.py](https://github.com/abetlen/llama-cpp-python/blob/master/llama_cpp/llama_cpp.py) and directly mirrors the C API in [llama.h](https://github.com/ggerganov/llama.cpp/blob/master/llama.h).
112+
113+
Below is a short example demonstrating how to use the low-level API to tokenize a prompt:
114+
115+
```python
116+
>>> import llama_cpp
117+
>>> import ctypes
118+
>>> params = llama_cpp.llama_context_default_params()
119+
# use bytes for char * params
120+
>>> ctx = llama_cpp.llama_init_from_file(b"./models/7b/ggml-model.bin", params)
121+
>>> max_tokens = params.n_ctx
122+
# use ctypes arrays for array params
123+
>>> tokens = (llama_cppp.llama_token * int(max_tokens))()
124+
>>> n_tokens = llama_cpp.llama_tokenize(ctx, b"Q: Name the planets in the solar system? A: ", tokens, max_tokens, add_bos=llama_cpp.c_bool(True))
125+
>>> llama_cpp.llama_free(ctx)
126+
```
127+
128+
Check out the [examples folder](examples/low_level_api) for more examples of using the low-level API.
95129

96130

97131
# Documentation

0 commit comments

Comments
 (0)