You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[GGMLQuantizationType.F32]: "32-bit standard IEEE 754 single-precision floating-point number.",// src: https://en.wikipedia.org/wiki/Single-precision_floating-point_format
5
5
[GGMLQuantizationType.F16]: "16-bit standard IEEE 754 half-precision floating-point number.",// src: https://en.wikipedia.org/wiki/Half-precision_floating-point_format
6
6
[GGMLQuantizationType.Q4_0]:
7
-
"4-bit round-to-nearest quantization (q). Each block has 32 weights. Weight formula: w = q * block_scale. Legacy quantization method (not used widely as of today)",
7
+
"4-bit round-to-nearest quantization (q). Each block has 32 weights. Weight formula: w = q * block_scale. Legacy quantization method (not used widely as of today)",// src: https://github.com/huggingface/huggingface.js/pull/615#discussion_r1557654249
8
8
[GGMLQuantizationType.Q4_1]:
9
-
"4-bit round-to-nearest quantization (q). Each block has 32 weights. Weight formula: w = q * block_scale + block_minimum. Legacy quantization method (not used widely as of today)",
9
+
"4-bit round-to-nearest quantization (q). Each block has 32 weights. Weight formula: w = q * block_scale + block_minimum. Legacy quantization method (not used widely as of today)",// src: https://github.com/huggingface/huggingface.js/pull/615#discussion_r1557682290
10
10
[GGMLQuantizationType.Q5_0]:
11
-
"5-bit round-to-nearest quantization (q). Each block has 32 weights. Weight formula: w = q * block_scale. Legacy quantization method (not used widely as of today)",
11
+
"5-bit round-to-nearest quantization (q). Each block has 32 weights. Weight formula: w = q * block_scale. Legacy quantization method (not used widely as of today)",// src: https://github.com/huggingface/huggingface.js/pull/615#discussion_r1557654249
12
12
[GGMLQuantizationType.Q5_1]:
13
-
"5-bit round-to-nearest quantization (q). Each block has 32 weights. Weight formula: w = q * block_scale + block_minimum. Legacy quantization method (not used widely as of today)",
13
+
"5-bit round-to-nearest quantization (q). Each block has 32 weights. Weight formula: w = q * block_scale + block_minimum. Legacy quantization method (not used widely as of today)",// src: https://github.com/huggingface/huggingface.js/pull/615#discussion_r1557682290
14
14
[GGMLQuantizationType.Q8_0]:
15
-
"8-bit round-to-nearest quantization (q). Each block has 32 weights. Weight formula: w = q * block_scale. Legacy quantization method (not used widely as of today)",
15
+
"8-bit round-to-nearest quantization (q). Each block has 32 weights. Weight formula: w = q * block_scale. Legacy quantization method (not used widely as of today)",// src: https://github.com/huggingface/huggingface.js/pull/615#discussion_r1557654249
16
16
[GGMLQuantizationType.Q8_1]:
17
-
"8-bit round-to-nearest quantization (q). Each block has 32 weights. Weight formula: w = q * block_scale + block_minimum. Legacy quantization method (not used widely as of today)",
17
+
"8-bit round-to-nearest quantization (q). Each block has 32 weights. Weight formula: w = q * block_scale + block_minimum. Legacy quantization method (not used widely as of today)",// src: https://github.com/huggingface/huggingface.js/pull/615#discussion_r1557682290
18
18
[GGMLQuantizationType.Q2_K]: `2-bit quantization (q). Super-blocks with 16 blocks, each block has 16 weight. Weight formula: w = q * block_scale(4-bit) + block_min(4-bit), resulting in 2.5625 bits-per-weight.`,// src: https://github.com/ggerganov/llama.cpp/pull/1684#issue-1739619305
19
19
[GGMLQuantizationType.Q3_K]: `3-bit quantization (q). Super-blocks with 16 blocks, each block has 16 weights. Weight formula: w = q * block_scale(6-bit), resulting. 3.4375 bits-per-weight`,// src: https://github.com/ggerganov/llama.cpp/pull/1684#issue-1739619305
20
20
[GGMLQuantizationType.Q4_K]: `4-bit quantization (q). Super-blocks with 8 blocks, each block has 32 weights. Weight formula: w = q * block_scale(6-bit) + block_min(6-bit), resulting in 4.5 bits-per-weight.`,// src: https://github.com/ggerganov/llama.cpp/pull/1684#issue-1739619305
0 commit comments