Skip to content

Commit 0b8d056

Browse files
add LLaMA 3 8b scoreboard
1 parent 931d4aa commit 0b8d056

File tree

1 file changed

+63
-4
lines changed

1 file changed

+63
-4
lines changed

examples/perplexity/README.md

Lines changed: 63 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,48 @@ In addition to the KL divergence the following statistics are calculated with `-
3030
* The root mean square of the change in token probabilities. If you were to assume that the quantization simply causes Gaussian noise on the token probabilities then this would be the standard deviation of said noise. The uncertainty on the value is calculated that the change in token probabilities follows a Gaussian distribution. Related discussion: https://github.com/ggerganov/llama.cpp/discussions/2875 .
3131
* Same top p: Percentage of how often the token was assigned the highest probabilites by both models. The uncertainty is calculated from the Gaussian approximation of the binomial distribution.
3232

33-
## Sample results
33+
## LLaMA 3 8b Scoreboard
34+
35+
Results are sorted by Kullback-Leibler divergence relative to FP16.
36+
The "WT 2.7m" importance matrix was created using 2.7 million Uncyclotext tokens and can be found [here](https://huggingface.co/JohannesGaessler/llama.cpp_importance_matrices/blob/main/imatrix-llama_3-8b-f16-2.7m_tokens.dat).
37+
38+
| Quantization | imatrix | Model size [GiB] | PPL | ΔPPL | KLD | RMS Δp |
39+
|--------------|---------|------------------|-------------------|----------------------|---------------------|------------------|
40+
| f16 | None | 14.97 | 6.7684 ± 0.04278 | - | - | - |
41+
| q8_0 | None | 7.96 | 6.7687 ± 0.04277 | 0.005872 ± 0.001347 | 0.001391 ± 0.000007 | 1.210 ± 0.007 % |
42+
| q6_K | None | 6.14 | 6.8007 ± 0.04312 | 0.037777 ± 0.002294 | 0.005669 ± 0.000046 | 2.343 ± 0.026 % |
43+
| q5_K_M | None | 5.33 | 6.8308 ± 0.04330 | 0.067952 ± 0.003060 | 0.011093 ± 0.000086 | 3.173 ± 0.030 % |
44+
| q5_K_S | None | 5.21 | 6.8877 ± 0.04378 | 0.124777 ± 0.003891 | 0.017177 ± 0.000135 | 3.947 ± 0.037 % |
45+
| q5_1 | None | 5.65 | 6.8888 ± 0.04373 | 0.125879 ± 0.004015 | 0.018485 ± 0.000141 | 4.089 ± 0.039 % |
46+
| q5_0 | None | 5.21 | 6.8988 ± 0.04373 | 0.135923 ± 0.004525 | 0.022964 ± 0.000170 | 4.631 ± 0.042 % |
47+
| q4_K_M | WT 2.7m | 4.58 | 6.9164 ± 0.04390 | 0.153559 ± 0.005115 | 0.029126 ± 0.000256 | 5.270 ± 0.050 % |
48+
| q4_K_M | None | 4.58 | 6.9593 ± 0.04415 | 0.196383 ± 0.005343 | 0.032032 ± 0.000248 | 5.531 ± 0.050 % |
49+
| q4_K_S | WT 2.7m | 4.37 | 6.9393 ± 0.04396 | 0.176470 ± 0.005377 | 0.032768 ± 0.000266 | 5.630 ± 0.052 % |
50+
| iq4_NL | WT 2.7m | 4.35 | 7.0114 ± 0.04468 | 0.248562 ± 0.005915 | 0.036482 ± 0.000286 | 5.965 ± 0.053 % |
51+
| iq4_XS | WT 2.7m | 4.14 | 7.0091 ± 0.04459 | 0.246254 ± 0.005918 | 0.037087 ± 0.000292 | 6.009 ± 0.053 % |
52+
| q4_K_S | None | 4.37 | 7.0545 ± 0.04481 | 0.291578 ± 0.006429 | 0.044040 ± 0.000320 | 6.511 ± 0.055 % |
53+
| q4_1 | None | 4.78 | 7.2571 ± 0.04658 | 0.494238 ± 0.009036 | 0.072530 ± 0.000507 | 8.368 ± 0.062 % |
54+
| q4_0 | None | 4.34 | 7.2927 ± 0.04665 | 0.529800 ± 0.009048 | 0.073598 ± 0.000486 | 8.395 ± 0.061 % |
55+
| q3_K_L | WT 2.7m | 4.03 | 7.2330 ± 0.04666 | 0.470087 ± 0.009268 | 0.074345 ± 0.000530 | 8.577 ± 0.064 % |
56+
| q3_K_M | WT 2.7m | 3.74 | 7.2941 ± 0.04699 | 0.531254 ± 0.010144 | 0.085849 ± 0.000596 | 9.236 ± 0.065 % |
57+
| q3_K_L | None | 4.03 | 7.3483 ± 0.04729 | 0.585400 ± 0.010379 | 0.088558 ± 0.000611 | 9.333 ± 0.066 % |
58+
| q3_K_M | None | 3.74 | 7.4524 ± 0.04789 | 0.689517 ± 0.011427 | 0.103797 ± 0.000675 | 10.111 ± 0.068 % |
59+
| iq3_M | WT 2.7m | 3.53 | 7.5051 ± 0.04715 | 0.742584 ± 0.010752 | 0.104464 ± 0.000676 | 10.383 ± 0.066 % |
60+
| iq3_S | WT 2.7m | 3.42 | 7.5693 ± 0.04794 | 0.806473 ± 0.011620 | 0.113201 ± 0.000719 | 10.669 ± 0.067 % |
61+
| iq3_XS | WT 2.7m | 3.28 | 7.8058 ± 0.04967 | 1.042930 ± 0.013767 | 0.140704 ± 0.000846 | 11.979 ± 0.070 % |
62+
| iq3_XXS | WT 2.7m | 3.05 | 8.0537 ± 0.05169 | 1.290849 ± 0.016815 | 0.187044 ± 0.001042 | 13.722 ± 0.073 % |
63+
| q3_K_S | WT 2.7m | 3.41 | 8.4003 ± 0.05409 | 1.637409 ± 0.018650 | 0.208394 ± 0.001018 | 15.201 ± 0.070 % |
64+
| q3_K_S | None | 3.41 | 8.6701 ± 0.05627 | 1.907244 ± 0.020902 | 0.236401 ± 0.001084 | 15.601 ± 0.069 % |
65+
| iq2_M | WT 2.7m | 2.74 | 9.4260 ± 0.06254 | 2.663082 ± 0.028667 | 0.331202 ± 0.001611 | 18.368 ± 0.079 % |
66+
| q2_K | WT 2.7m | 2.96 | 9.4737 ± 0.06303 | 2.710844 ± 0.029119 | 0.342129 ± 0.001565 | 18.996 ± 0.078 % |
67+
| iq2_S | WT 2.7m | 2.56 | 10.6301 ± 0.07237 | 3.867287 ± 0.039162 | 0.446305 ± 0.001972 | 21.324 ± 0.082 % |
68+
| q2_K | None | 2.96 | 10.6450 ± 0.07158 | 3.882171 ± 0.038471 | 0.457258 ± 0.001851 | 21.416 ± 0.078 % |
69+
| iq2_XS | WT 2.7m | 2.43 | 11.8063 ± 0.08064 | 5.043388 ± 0.048007 | 0.556747 ± 0.002136 | 23.752 ± 0.082 % |
70+
| iq2_XXS | WT 2.7m | 2.24 | 15.6064 ± 0.11301 | 8.843541 ± 0.081477 | 0.830947 ± 0.002749 | 28.363 ± 0.084 % |
71+
| iq1_M | WT 2.7m | 2.01 | 28.6561 ± 0.21012 | 21.893176 ± 0.180729 | 1.413517 ± 0.003550 | 37.785 ± 0.084 % |
72+
| iq1_S | WT 2.7m | 1.88 | 69.6303 ± 0.56051 | 62.867391 ± 0.535295 | 2.290167 ± 0.004882 | 45.826 ± 0.086 % |
73+
74+
## LLaMA 2 vs. LLaMA 3 Quantization comparison
3475

3576
| Metric | L2 7b q2_K | L3 8b q2_K | L2 7b q4_K_M | L3 8b q4_K_M | L2 7b q6_K | L3 8b q6_K | L2 7b q8_0 | L3 8b q8_0 |
3677
|-----------------|---------------------|----------------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|
@@ -50,10 +91,28 @@ In addition to the KL divergence the following statistics are calculated with `-
5091
| RMS Δp | 9.762 ± 0.053 % | 21.393 ± 0.078 % | 3.252 ± 0.024 % | 5.429 ± 0.051 % | 1.339 ± 0.010 % | 2.096 ± 0.029 % | 0.618 ± 0.011 % | 0.867 ± 0.007 % |
5192
| Same top p | 85.584 ± 0.086 % | 70.419 ± 0.120 % | 94.665 ± 0.055 % | 92.162 ± 0.071 % | 97.520 ± 0.038 % | 96.586 ± 0.048 % | 98.846 ± 0.026 % | 98.467 ± 0.032 % |
5293

53-
<details>
54-
<summary>Old numbers</summary>
94+
| Metric | L2 70b q2_K | L3 70b q2_K | L2 70b q4_K_M | L3 70b q4_K_M | L2 70b q6_K | L3 70b q6_K | L2 70b q8_0 | L3 70b q8_0 |
95+
|-----------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|
96+
| Mean PPL | 4.172530 ± 0.020805 | 5.902798 ± 0.035278 | 3.475398 ± 0.016580 | 3.193431 ± 0.016621 | 3.440612 ± 0.016372 | 3.052153 ± 0.015746 | 3.434686 ± 0.016346 | 3.039482 ± 0.015687 |
97+
| Mean PPL ratio | 1.215161 ± 0.002103 | 1.942461 ± 0.007686 | 1.012136 ± 0.000413 | 1.050877 ± 0.001032 | 1.002006 ± 0.000193 | 1.004386 ± 0.000413 | 1.000280 ± 0.000119 | 1.000217 ± 0.000264 |
98+
| Mean ΔPPL | 0.738805 ± 0.007888 | 2.863974 ± 0.025573 | 0.041672 ± 0.001433 | 0.154607 ± 0.003206 | 0.006887 ± 0.000664 | 0.013329 ± 0.001256 | 0.000961 ± 0.000408 | 0.000658 ± 0.000803 |
99+
| PPL correlation | 93.80% | 75.67% | 99.63% | 98.21% | 99.92% | 99.68% | 99.97% | 99.87% |
100+
| Mean KLD | 0.186386 ± 0.001134 | 0.674716 ± 0.003267 | 0.013168 ± 0.000095 | 0.055418 ± 0.000506 | 0.002736 ± 0.000018 | 0.009148 ± 0.000100 | 0.000878 ± 0.000006 | 0.003088 ± 0.000040 |
101+
| Mean Δp | -5.417 ± 0.040 % | -17.236 ± 0.078 % | -0.350 ± 0.010 % | -1.678 ± 0.026 % | -0.076 ± 0.005 % | -0.202 ± 0.010 % | -0.005 ± 0.003 % | -0.007 ± 0.006 % |
102+
| Maximum Δp | 95.064% | 95.799% | 80.018% | 91.140% | 28.193% | 63.263% | 25.395% | 50.187% |
103+
| 99.9% Δp | 46.526% | 60.640% | 23.562% | 47.583% | 10.424% | 24.634% | 6.548% | 14.033% |
104+
| 99.0% Δp | 21.251% | 26.948% | 10.161% | 18.666% | 5.339% | 10.273% | 3.337% | 6.323% |
105+
| Median Δp | -0.447% | -3.780% | -0.004% | -0.022% | -0.001% | -0.002% | -0.000% | 0.000% |
106+
| 1.0% Δp | -81.379% | -98.506% | -15.142% | -47.638% | -5.866% | -13.230% | -3.333% | -6.609% |
107+
| 0.1% Δp | -97.547% | -99.873% | -37.914% | -82.914% | -13.351% | -30.683% | -6.096% | -15.564% |
108+
| Minimum Δp | -99.965% | -99.993% | -81.378% | -98.505% | -46.213% | -82.746% | -34.335% | -63.634% |
109+
| RMS Δp | 17.237 ± 0.077 % | 34.361 ± 0.094 % | 4.154 ± 0.032 % | 9.915 ± 0.067 % | 1.899 ± 0.015 % | 3.721 ± 0.030 % | 1.085 ± 0.007 % | 2.124 ± 0.018 % |
110+
| Same top p | 85.001 ± 0.087 % | 71.991 ± 0.118 % | 95.632 ± 0.050 % | 92.881 ± 0.068 % | 97.651 ± 0.037 % | 96.538 ± 0.048 % | 98.502 ± 0.030 % | 97.825 ± 0.038 % |
111+
112+
## Old Numbers
55113

56-
## Llama 2 70B Scorechart
114+
<details>
115+
<summary>Llama 2 70B Scorechart</summary>
57116

58117
| Quantization | Model size (GiB) | Perplexity | Delta to fp16 |
59118
|--------------|------------------|------------|---------------|

0 commit comments

Comments
 (0)