You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+32-61Lines changed: 32 additions & 61 deletions
Original file line number
Diff line number
Diff line change
@@ -9,26 +9,50 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
9
9
10
10
## [0.2.2]
11
11
12
-
- Fix bug in pip install of v0.2.1 due to scikit-build-core removing all `.metal` files in the source distribution
12
+
- Fix bug in pip install of v0.2.1 due to scikit-build-core removing all `.metal` files in the source distribution (see #701)
13
13
14
14
## [0.2.1]
15
15
16
-
- Fix bug in pip install of v0.2.0 due to .git folder being included in the source distribution
16
+
- Fix bug in pip install of v0.2.0 due to .git folder being included in the source distribution (see #701)
17
17
18
18
## [0.2.0]
19
19
20
-
- Migrated to scikit-build-core for building llama.cpp from source
20
+
- Migrated to scikit-build-core build system by @abetlen in #499
21
+
- Use `numpy` views for `LogitsProcessor` and `StoppingCriteria` instead of python lists by @abetlen in #499
22
+
- Drop support for end-of-life Python3.7 by @abetlen in #499
23
+
- Convert low level `llama.cpp` constants to use basic python types instead of `ctypes` types by @abetlen in #499
21
24
22
-
## [0.1.79]
25
+
## [0.1.85]
26
+
27
+
- Add `llama_cpp.__version__` attribute by @janvdp in #684
28
+
- Fix low level api examples by @jbochi in #680
29
+
30
+
## [0.1.84]
31
+
32
+
- Update llama.cpp
33
+
34
+
## [0.1.83]
35
+
36
+
- Update llama.cpp
37
+
38
+
## [0.1.82]
39
+
40
+
- Update llama.cpp
41
+
42
+
## [0.1.81]
23
43
24
-
### Added
44
+
- Update llama.cpp
45
+
46
+
## [0.1.80]
47
+
48
+
- Update llama.cpp
49
+
50
+
## [0.1.79]
25
51
26
52
- GGUF Support (breaking change requiring new model format)
27
53
28
54
## [0.1.78]
29
55
30
-
### Added
31
-
32
56
- Grammar based sampling via LlamaGrammar which can be passed to completions
33
57
- Make n_gpu_layers == -1 offload all layers
34
58
@@ -47,152 +71,99 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
47
71
48
72
## [0.1.74]
49
73
50
-
### Added
51
-
52
74
- (server) OpenAI style error responses
53
75
54
76
## [0.1.73]
55
77
56
-
### Added
57
-
58
78
- (server) Add rope parameters to server settings
59
79
60
80
## [0.1.72]
61
81
62
-
### Added
63
-
64
82
- (llama.cpp) Update llama.cpp added custom_rope for extended context lengths
65
83
66
84
## [0.1.71]
67
85
68
-
### Added
69
-
70
86
- (llama.cpp) Update llama.cpp
71
87
72
-
### Fixed
73
-
74
88
- (server) Fix several pydantic v2 migration bugs
75
89
76
90
## [0.1.70]
77
91
78
-
### Fixed
79
-
80
92
- (Llama.create_completion) Revert change so that `max_tokens` is not truncated to `context_size` in `create_completion`
81
93
- (server) Fixed changed settings field names from pydantic v2 migration
82
94
83
95
## [0.1.69]
84
96
85
-
### Added
86
-
87
97
- (server) Streaming requests can are now interrupted pre-maturely when a concurrent request is made. Can be controlled with the `interrupt_requests` setting.
88
98
- (server) Moved to fastapi v0.100.0 and pydantic v2
89
99
- (docker) Added a new "simple" image that builds llama.cpp from source when started.
90
-
91
-
## Fixed
92
-
93
100
- (server) performance improvements by avoiding unnecessary memory allocations during sampling
94
101
95
102
## [0.1.68]
96
103
97
-
### Added
98
-
99
104
- (llama.cpp) Update llama.cpp
100
105
101
106
## [0.1.67]
102
107
103
-
### Fixed
104
-
105
108
- Fix performance bug in Llama model by pre-allocating memory tokens and logits.
106
109
- Fix bug in Llama model where the model was not free'd after use.
107
110
108
111
## [0.1.66]
109
112
110
-
### Added
111
-
112
113
- (llama.cpp) New model API
113
114
114
-
### Fixed
115
-
116
115
- Performance issue during eval caused by looped np.concatenate call
117
116
- State pickling issue when saving cache to disk
118
117
119
118
## [0.1.65]
120
119
121
-
### Added
122
-
123
120
- (llama.cpp) Fix struct misalignment bug
124
121
125
122
## [0.1.64]
126
123
127
-
### Added
128
-
129
124
- (llama.cpp) Update llama.cpp
130
125
- Fix docs for seed. Set -1 for random.
131
126
132
127
## [0.1.63]
133
128
134
-
### Added
135
-
136
129
- (llama.cpp) Add full gpu utilisation in CUDA
137
130
- (llama.cpp) Add get_vocab
138
131
- (llama.cpp) Add low_vram parameter
139
132
- (server) Add logit_bias parameter
140
133
141
134
## [0.1.62]
142
135
143
-
### Fixed
144
-
145
136
- Metal support working
146
137
- Cache re-enabled
147
138
148
139
## [0.1.61]
149
140
150
-
### Fixed
151
-
152
141
- Fix broken pip installation
153
142
154
143
## [0.1.60]
155
144
156
-
### NOTE
157
-
158
-
- This release was deleted due to a bug with the packaging system that caused pip installations to fail.
159
-
160
-
### Fixed
145
+
NOTE: This release was deleted due to a bug with the packaging system that caused pip installations to fail.
161
146
162
147
- Truncate max_tokens in create_completion so requested tokens doesn't exceed context size.
163
148
- Temporarily disable cache for completion requests
164
149
165
150
## [v0.1.59]
166
151
167
-
### Added
168
-
169
152
- (llama.cpp) k-quants support
170
153
- (server) mirostat sampling parameters to server
171
-
172
-
### Fixed
173
-
174
154
- Support both `.so` and `.dylib` for `libllama` on MacOS
175
155
176
156
## [v0.1.58]
177
157
178
-
### Added
179
-
180
158
- (llama.cpp) Metal Silicon support
181
159
182
160
## [v0.1.57]
183
161
184
-
### Added
185
-
186
162
- (llama.cpp) OpenLlama 3B support
187
163
188
164
## [v0.1.56]
189
165
190
-
### Added
191
-
192
166
- (misc) Added first version of the changelog
193
167
- (server) Use async routes
194
168
- (python-api) Use numpy for internal buffers to reduce memory usage and improve performance.
195
-
196
-
### Fixed
197
-
198
169
- (python-api) Performance bug in stop sequence check slowing down streaming.
0 commit comments