Skip to content

Commit be5b9b4

Browse files
authored
Merge branch 'main' into remove_gptq
2 parents b16a606 + a51389c commit be5b9b4

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

docs/ADVANCED-USERS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -382,6 +382,7 @@ embedding table (symmetric) | fp32, fp16, bf16 | 8b (group/channel), 4b (group/c
382382
linear operator (symmetric) | fp32, fp16, bf16 | 8b (group/channel) | n/a |
383383
linear operator (asymmetric) | n/a | 4b (group), a6w4dq | a8w4dq (group) |
384384

385+
385386
## Model precision (dtype precision setting)
386387
On top of quantizing models with quantization schemes mentioned above, models can be converted
387388
to lower precision floating point representations to reduce the memory bandwidth requirement and

0 commit comments

Comments
 (0)