@@ -30,7 +30,48 @@ In addition to the KL divergence the following statistics are calculated with `-
30
30
* The root mean square of the change in token probabilities. If you were to assume that the quantization simply causes Gaussian noise on the token probabilities then this would be the standard deviation of said noise. The uncertainty on the value is calculated that the change in token probabilities follows a Gaussian distribution. Related discussion: https://github.com/ggerganov/llama.cpp/discussions/2875 .
31
31
* Same top p: Percentage of how often the token was assigned the highest probabilites by both models. The uncertainty is calculated from the Gaussian approximation of the binomial distribution.
32
32
33
- ## Sample results
33
+ ## LLaMA 3 8b Scoreboard
34
+
35
+ Results are sorted by Kullback-Leibler divergence relative to FP16.
36
+ The "WT 2.7m" importance matrix was created using 2.7 million Uncyclotext tokens and can be found [ here] ( https://huggingface.co/JohannesGaessler/llama.cpp_importance_matrices/blob/main/imatrix-llama_3-8b-f16-2.7m_tokens.dat ) .
37
+
38
+ | Quantization | imatrix | Model size [ GiB] | PPL | ΔPPL | KLD | RMS Δp |
39
+ | --------------| ---------| ------------------| -------------------| ----------------------| ---------------------| ------------------|
40
+ | f16 | None | 14.97 | 6.7684 ± 0.04278 | - | - | - |
41
+ | q8_0 | None | 7.96 | 6.7687 ± 0.04277 | 0.005872 ± 0.001347 | 0.001391 ± 0.000007 | 1.210 ± 0.007 % |
42
+ | q6_K | None | 6.14 | 6.8007 ± 0.04312 | 0.037777 ± 0.002294 | 0.005669 ± 0.000046 | 2.343 ± 0.026 % |
43
+ | q5_K_M | None | 5.33 | 6.8308 ± 0.04330 | 0.067952 ± 0.003060 | 0.011093 ± 0.000086 | 3.173 ± 0.030 % |
44
+ | q5_K_S | None | 5.21 | 6.8877 ± 0.04378 | 0.124777 ± 0.003891 | 0.017177 ± 0.000135 | 3.947 ± 0.037 % |
45
+ | q5_1 | None | 5.65 | 6.8888 ± 0.04373 | 0.125879 ± 0.004015 | 0.018485 ± 0.000141 | 4.089 ± 0.039 % |
46
+ | q5_0 | None | 5.21 | 6.8988 ± 0.04373 | 0.135923 ± 0.004525 | 0.022964 ± 0.000170 | 4.631 ± 0.042 % |
47
+ | q4_K_M | WT 2.7m | 4.58 | 6.9164 ± 0.04390 | 0.153559 ± 0.005115 | 0.029126 ± 0.000256 | 5.270 ± 0.050 % |
48
+ | q4_K_M | None | 4.58 | 6.9593 ± 0.04415 | 0.196383 ± 0.005343 | 0.032032 ± 0.000248 | 5.531 ± 0.050 % |
49
+ | q4_K_S | WT 2.7m | 4.37 | 6.9393 ± 0.04396 | 0.176470 ± 0.005377 | 0.032768 ± 0.000266 | 5.630 ± 0.052 % |
50
+ | iq4_NL | WT 2.7m | 4.35 | 7.0114 ± 0.04468 | 0.248562 ± 0.005915 | 0.036482 ± 0.000286 | 5.965 ± 0.053 % |
51
+ | iq4_XS | WT 2.7m | 4.14 | 7.0091 ± 0.04459 | 0.246254 ± 0.005918 | 0.037087 ± 0.000292 | 6.009 ± 0.053 % |
52
+ | q4_K_S | None | 4.37 | 7.0545 ± 0.04481 | 0.291578 ± 0.006429 | 0.044040 ± 0.000320 | 6.511 ± 0.055 % |
53
+ | q4_1 | None | 4.78 | 7.2571 ± 0.04658 | 0.494238 ± 0.009036 | 0.072530 ± 0.000507 | 8.368 ± 0.062 % |
54
+ | q4_0 | None | 4.34 | 7.2927 ± 0.04665 | 0.529800 ± 0.009048 | 0.073598 ± 0.000486 | 8.395 ± 0.061 % |
55
+ | q3_K_L | WT 2.7m | 4.03 | 7.2330 ± 0.04666 | 0.470087 ± 0.009268 | 0.074345 ± 0.000530 | 8.577 ± 0.064 % |
56
+ | q3_K_M | WT 2.7m | 3.74 | 7.2941 ± 0.04699 | 0.531254 ± 0.010144 | 0.085849 ± 0.000596 | 9.236 ± 0.065 % |
57
+ | q3_K_L | None | 4.03 | 7.3483 ± 0.04729 | 0.585400 ± 0.010379 | 0.088558 ± 0.000611 | 9.333 ± 0.066 % |
58
+ | q3_K_M | None | 3.74 | 7.4524 ± 0.04789 | 0.689517 ± 0.011427 | 0.103797 ± 0.000675 | 10.111 ± 0.068 % |
59
+ | iq3_M | WT 2.7m | 3.53 | 7.5051 ± 0.04715 | 0.742584 ± 0.010752 | 0.104464 ± 0.000676 | 10.383 ± 0.066 % |
60
+ | iq3_S | WT 2.7m | 3.42 | 7.5693 ± 0.04794 | 0.806473 ± 0.011620 | 0.113201 ± 0.000719 | 10.669 ± 0.067 % |
61
+ | iq3_XS | WT 2.7m | 3.28 | 7.8058 ± 0.04967 | 1.042930 ± 0.013767 | 0.140704 ± 0.000846 | 11.979 ± 0.070 % |
62
+ | iq3_XXS | WT 2.7m | 3.05 | 8.0537 ± 0.05169 | 1.290849 ± 0.016815 | 0.187044 ± 0.001042 | 13.722 ± 0.073 % |
63
+ | q3_K_S | WT 2.7m | 3.41 | 8.4003 ± 0.05409 | 1.637409 ± 0.018650 | 0.208394 ± 0.001018 | 15.201 ± 0.070 % |
64
+ | q3_K_S | None | 3.41 | 8.6701 ± 0.05627 | 1.907244 ± 0.020902 | 0.236401 ± 0.001084 | 15.601 ± 0.069 % |
65
+ | iq2_M | WT 2.7m | 2.74 | 9.4260 ± 0.06254 | 2.663082 ± 0.028667 | 0.331202 ± 0.001611 | 18.368 ± 0.079 % |
66
+ | q2_K | WT 2.7m | 2.96 | 9.4737 ± 0.06303 | 2.710844 ± 0.029119 | 0.342129 ± 0.001565 | 18.996 ± 0.078 % |
67
+ | iq2_S | WT 2.7m | 2.56 | 10.6301 ± 0.07237 | 3.867287 ± 0.039162 | 0.446305 ± 0.001972 | 21.324 ± 0.082 % |
68
+ | q2_K | None | 2.96 | 10.6450 ± 0.07158 | 3.882171 ± 0.038471 | 0.457258 ± 0.001851 | 21.416 ± 0.078 % |
69
+ | iq2_XS | WT 2.7m | 2.43 | 11.8063 ± 0.08064 | 5.043388 ± 0.048007 | 0.556747 ± 0.002136 | 23.752 ± 0.082 % |
70
+ | iq2_XXS | WT 2.7m | 2.24 | 15.6064 ± 0.11301 | 8.843541 ± 0.081477 | 0.830947 ± 0.002749 | 28.363 ± 0.084 % |
71
+ | iq1_M | WT 2.7m | 2.01 | 28.6561 ± 0.21012 | 21.893176 ± 0.180729 | 1.413517 ± 0.003550 | 37.785 ± 0.084 % |
72
+ | iq1_S | WT 2.7m | 1.88 | 69.6303 ± 0.56051 | 62.867391 ± 0.535295 | 2.290167 ± 0.004882 | 45.826 ± 0.086 % |
73
+
74
+ ## LLaMA 2 vs. LLaMA 3 Quantization comparison
34
75
35
76
| Metric | L2 7b q2_K | L3 8b q2_K | L2 7b q4_K_M | L3 8b q4_K_M | L2 7b q6_K | L3 8b q6_K | L2 7b q8_0 | L3 8b q8_0 |
36
77
| -----------------| ---------------------| ----------------------| ---------------------| ---------------------| ---------------------| ---------------------| ---------------------| ---------------------|
@@ -50,10 +91,28 @@ In addition to the KL divergence the following statistics are calculated with `-
50
91
| RMS Δp | 9.762 ± 0.053 % | 21.393 ± 0.078 % | 3.252 ± 0.024 % | 5.429 ± 0.051 % | 1.339 ± 0.010 % | 2.096 ± 0.029 % | 0.618 ± 0.011 % | 0.867 ± 0.007 % |
51
92
| Same top p | 85.584 ± 0.086 % | 70.419 ± 0.120 % | 94.665 ± 0.055 % | 92.162 ± 0.071 % | 97.520 ± 0.038 % | 96.586 ± 0.048 % | 98.846 ± 0.026 % | 98.467 ± 0.032 % |
52
93
53
- <details >
54
- <summary >Old numbers</summary >
94
+ | Metric | L2 70b q2_K | L3 70b q2_K | L2 70b q4_K_M | L3 70b q4_K_M | L2 70b q6_K | L3 70b q6_K | L2 70b q8_0 | L3 70b q8_0 |
95
+ | -----------------| ---------------------| ---------------------| ---------------------| ---------------------| ---------------------| ---------------------| ---------------------| ---------------------|
96
+ | Mean PPL | 4.172530 ± 0.020805 | 5.902798 ± 0.035278 | 3.475398 ± 0.016580 | 3.193431 ± 0.016621 | 3.440612 ± 0.016372 | 3.052153 ± 0.015746 | 3.434686 ± 0.016346 | 3.039482 ± 0.015687 |
97
+ | Mean PPL ratio | 1.215161 ± 0.002103 | 1.942461 ± 0.007686 | 1.012136 ± 0.000413 | 1.050877 ± 0.001032 | 1.002006 ± 0.000193 | 1.004386 ± 0.000413 | 1.000280 ± 0.000119 | 1.000217 ± 0.000264 |
98
+ | Mean ΔPPL | 0.738805 ± 0.007888 | 2.863974 ± 0.025573 | 0.041672 ± 0.001433 | 0.154607 ± 0.003206 | 0.006887 ± 0.000664 | 0.013329 ± 0.001256 | 0.000961 ± 0.000408 | 0.000658 ± 0.000803 |
99
+ | PPL correlation | 93.80% | 75.67% | 99.63% | 98.21% | 99.92% | 99.68% | 99.97% | 99.87% |
100
+ | Mean KLD | 0.186386 ± 0.001134 | 0.674716 ± 0.003267 | 0.013168 ± 0.000095 | 0.055418 ± 0.000506 | 0.002736 ± 0.000018 | 0.009148 ± 0.000100 | 0.000878 ± 0.000006 | 0.003088 ± 0.000040 |
101
+ | Mean Δp | -5.417 ± 0.040 % | -17.236 ± 0.078 % | -0.350 ± 0.010 % | -1.678 ± 0.026 % | -0.076 ± 0.005 % | -0.202 ± 0.010 % | -0.005 ± 0.003 % | -0.007 ± 0.006 % |
102
+ | Maximum Δp | 95.064% | 95.799% | 80.018% | 91.140% | 28.193% | 63.263% | 25.395% | 50.187% |
103
+ | 99.9% Δp | 46.526% | 60.640% | 23.562% | 47.583% | 10.424% | 24.634% | 6.548% | 14.033% |
104
+ | 99.0% Δp | 21.251% | 26.948% | 10.161% | 18.666% | 5.339% | 10.273% | 3.337% | 6.323% |
105
+ | Median Δp | -0.447% | -3.780% | -0.004% | -0.022% | -0.001% | -0.002% | -0.000% | 0.000% |
106
+ | 1.0% Δp | -81.379% | -98.506% | -15.142% | -47.638% | -5.866% | -13.230% | -3.333% | -6.609% |
107
+ | 0.1% Δp | -97.547% | -99.873% | -37.914% | -82.914% | -13.351% | -30.683% | -6.096% | -15.564% |
108
+ | Minimum Δp | -99.965% | -99.993% | -81.378% | -98.505% | -46.213% | -82.746% | -34.335% | -63.634% |
109
+ | RMS Δp | 17.237 ± 0.077 % | 34.361 ± 0.094 % | 4.154 ± 0.032 % | 9.915 ± 0.067 % | 1.899 ± 0.015 % | 3.721 ± 0.030 % | 1.085 ± 0.007 % | 2.124 ± 0.018 % |
110
+ | Same top p | 85.001 ± 0.087 % | 71.991 ± 0.118 % | 95.632 ± 0.050 % | 92.881 ± 0.068 % | 97.651 ± 0.037 % | 96.538 ± 0.048 % | 98.502 ± 0.030 % | 97.825 ± 0.038 % |
111
+
112
+ ## Old Numbers
55
113
56
- ## Llama 2 70B Scorechart
114
+ <details >
115
+ <summary >Llama 2 70B Scorechart</summary >
57
116
58
117
| Quantization | Model size (GiB) | Perplexity | Delta to fp16 |
59
118
| --------------| ------------------| ------------| ---------------|
0 commit comments