Skip to content

Commit bed31be

Browse files
perplexity: add BF16 vs. FP16 results
1 parent 83330d8 commit bed31be

File tree

1 file changed

+57
-1
lines changed

1 file changed

+57
-1
lines changed

examples/perplexity/README.md

Lines changed: 57 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,13 @@ In addition to the KL divergence the following statistics are calculated with `-
3232

3333
## LLaMA 3 8b Scoreboard
3434

35-
Results are sorted by Kullback-Leibler divergence relative to FP16.
35+
| Revision | f364eb6f |
36+
|:---------|:-------------------|
37+
| Backend | CUDA |
38+
| CPU | AMD Epyc 7742 |
39+
| GPU | 1x NVIDIA RTX 4090 |
40+
41+
Results were generated using the CUDA backend and are sorted by Kullback-Leibler divergence relative to FP16.
3642
The "WT" importance matrices were created using varying numbers of Uncyclotext tokens and can be found [here](https://huggingface.co/JohannesGaessler/llama.cpp_importance_matrices/blob/main/imatrix-llama_3-8b-f16-2.7m_tokens.dat).
3743

3844
| Quantization | imatrix | Model size [GiB] | PPL | ΔPPL | KLD | Mean Δp | RMS Δp |
@@ -89,6 +95,12 @@ K-quants score better on mean Δp than the legacy quants than e.g. KL divergence
8995

9096
## LLaMA 2 vs. LLaMA 3 Quantization comparison
9197

98+
| Revision | f364eb6f |
99+
|:---------|:-------------------|
100+
| Backend | CUDA |
101+
| CPU | AMD Epyc 7742 |
102+
| GPU | 1x NVIDIA RTX 4090 |
103+
92104
| Metric | L2 7b q2_K | L3 8b q2_K | L2 7b q4_K_M | L3 8b q4_K_M | L2 7b q6_K | L3 8b q6_K | L2 7b q8_0 | L3 8b q8_0 |
93105
|-----------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|
94106
| Mean PPL | 5.794552 ± 0.032298 | 9.751568 ± 0.063312 | 5.877078 ± 0.032781 | 6.407115 ± 0.039119 | 5.808494 ± 0.032425 | 6.253382 ± 0.038078 | 5.798542 ± 0.032366 | 6.234284 ± 0.037878 |
@@ -107,6 +119,50 @@ K-quants score better on mean Δp than the legacy quants than e.g. KL divergence
107119
| RMS Δp | 9.762 ± 0.053 % | 21.421 ± 0.079 % | 3.252 ± 0.024 % | 5.519 ± 0.050 % | 1.339 ± 0.010 % | 2.295 ± 0.019 % | 0.618 ± 0.011 % | 1.198 ± 0.007 % |
108120
| Same top p | 85.584 ± 0.086 % | 71.138 ± 0.119 % | 94.665 ± 0.055 % | 91.901 ± 0.072 % | 97.520 ± 0.038 % | 96.031 ± 0.051 % | 98.846 ± 0.026 % | 97.674 ± 0.040 % |
109121

122+
## LLaMA 3 BF16 vs. FP16 comparison
123+
124+
| Revision | 83330d8c |
125+
|:---------|:--------------|
126+
| Backend | CPU |
127+
| CPU | AMD Epyc 7742 |
128+
| GPU | N/A |
129+
130+
Results were calculated with LLaMA 3 8b BF16 as `--kl-divergence-base` and LLaMA 3 8b FP16 as the `--model` for comparison.
131+
132+
| Metric | Value |
133+
|--------------------------------|--------------------------|
134+
| Mean PPL(Q) | 6.227711 ± 0.037833 |
135+
| Mean PPL(base) | 6.225194 ± 0.037771 |
136+
| Cor(ln(PPL(Q)), ln(PPL(base))) | 99.990% |
137+
| Mean ln(PPL(Q)/PPL(base)) | 0.000404 ± 0.000086 |
138+
| Mean PPL(Q)/PPL(base) | 1.000404 ± 0.000086 |
139+
| Mean PPL(Q)-PPL(base) | 0.002517 ± 0.000536 |
140+
| Mean KLD | 0.00002515 ± 0.00000020 |
141+
| Maximum KLD | 0.012206 |
142+
| 99.9% KLD | 0.000799 |
143+
| 99.0% KLD | 0.000222 |
144+
| 99.0% KLD | 0.000222 |
145+
| Median KLD | 0.000013 |
146+
| 10.0% KLD | -0.000002 |
147+
| 5.0% KLD | -0.000008 |
148+
| 1.0% KLD | -0.000023 |
149+
| Minimum KLD | -0.000059 |
150+
| Mean Δp | -0.0000745 ± 0.0003952 % |
151+
| Maximum Δp | 4.186% |
152+
| 99.9% Δp | 1.049% |
153+
| 99.0% Δp | 0.439% |
154+
| 95.0% Δp | 0.207% |
155+
| 90.0% Δp | 0.125% |
156+
| 75.0% Δp | 0.029% |
157+
| Median Δp | 0.000% |
158+
| 25.0% Δp | -0.030% |
159+
| 10.0% Δp | -0.126% |
160+
| 5.0% Δp | -0.207% |
161+
| 1.0% Δp | -0.434% |
162+
| 0.1% Δp | -1.016% |
163+
| Minimum Δp | -4.672% |
164+
| RMS Δp | 0.150 ± 0.001 % |
165+
| Same top p | 99.739 ± 0.013 % |
110166

111167
## Old Numbers
112168

0 commit comments

Comments
 (0)