Additional KL-divergence statistics #5081

ikawrakow · 2024-01-22T16:08:58Z

I have added:

Probability for quantized model predicting the same top token as the base model
KL-divergence median
KL-divergence min/max
KL-divergence 1%, 5%, 95%, 99%

I'll keep it as a draft PR for a bit.
Please express your wishes for additional things one may want to know.

Here is an example of what you currently get:

chunk        PPL          ln(PPL(Q)/PPL(base))          KL-Divergence           Same top
   1        3.9734      -0.00651 ±    0.01162       0.01145 ±    0.00153    0.96078 ± 0.01218
   2        4.5052       0.00239 ±    0.00794       0.01388 ±    0.00164    0.95686 ± 0.00901
   3        5.3057       0.00175 ±    0.00625       0.01315 ±    0.00118    0.95163 ± 0.00776
   4        6.0247       0.00473 ±    0.00524       0.01236 ±    0.00092    0.95196 ± 0.00670
...
 638        5.7483       0.00903 ±    0.00039       0.01062 ±    0.00006    0.95254 ± 0.00053
 639        5.7520       0.00904 ±    0.00039       0.01062 ±    0.00006    0.95252 ± 0.00053
 640        5.7581       0.00905 ±    0.00039       0.01062 ±    0.00006    0.95254 ± 0.00053
 641        5.7486       0.00904 ±    0.00039       0.01062 ±    0.00006    0.95259 ± 0.00053
 642        5.7427       0.00901 ±    0.00039       0.01062 ±    0.00006    0.95261 ± 0.00053

===== KL-divergence statistics
Average:   0.010623 ±   0.000065
Median :   0.005517
Minimum:  -0.000008
Maximum:   2.462638
KLD_01 :   0.000007
KLD_99 :   0.088446
KLD_05 :   0.000046
KLD_95 :   0.034547

llama_print_timings:        load time =     699.00 ms
llama_print_timings:      sample time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time =   62910.38 ms / 328704 tokens (    0.19 ms per token,  5224.96 tokens per second)
llama_print_timings:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =   66734.41 ms / 328705 tokens

JohannesGaessler · 2024-01-22T16:35:42Z

examples/perplexity/perplexity.cpp

+
+    std::sort(kld_values.begin(), kld_values.end());
+
+    printf("===== KL-divergence statistics\n");


Suggested change

printf("===== KL-divergence statistics\n");

printf("===== KL-divergence statistics =====\n");

Was this intentional? I usually see prints like this.

JohannesGaessler · 2024-01-22T16:42:25Z

examples/perplexity/perplexity.cpp

+    const int n_1percent = nearest_int(0.01f*kld_values.size());
+    printf("KLD_01 : %10.6f\n", kld_values[n_1percent]);
+    printf("KLD_99 : %10.6f\n", kld_values[kld_values.size()-1-n_1percent]);
+    const int n_5percent = nearest_int(0.05f*kld_values.size());
+    printf("KLD_05 : %10.6f\n", kld_values[n_5percent]);
+    printf("KLD_95 : %10.6f\n", kld_values[kld_values.size()-1-n_5percent]);


This is not the correct way to calculate percentiles (due to integer rounding errors) but if the number of tokes is >> 100 it should be fine.

kalomaze · 2024-01-23T20:08:07Z

I am unable to get it to print anything. It simply exits without evaluating.

perplexity.exe -m "C:\Users\Kalo\Downloads\Toppy7b\ggml-model-Q8_0.gguf" --kl-divergence-base "C:\Users\Kalo\Downloads\calibration\8k_kldiv_toppyfp16_v2.dat" --kl-divergence

Regenerating it with or without GPU layers did not resolve the issue.

Removing --kl-divergence causes:

the data file you provided tokenizes to only 1 tokens

This issue may be Windows specific.

EDIT: On WSL I get a segfault.

* perplexity: add top-token probability * perplexity: add additional KL-divergence statistics * perplexity: a better organized KL-divergence statistics output --------- Co-authored-by: Iwan Kawrakow <[email protected]>

Kawrakow added 2 commits January 22, 2024 17:11

perplexity: add top-token probability

8128fa0

perplexity: add additional KL-divergence statistics

150af7e

JohannesGaessler approved these changes Jan 22, 2024

View reviewed changes

perplexity: a better organized KL-divergence statistics output

0b59931

ikawrakow marked this pull request as ready for review January 23, 2024 13:16

ikawrakow merged commit 44879ee into master Jan 23, 2024

ikawrakow deleted the ik/kl-divergence-2 branch January 23, 2024 13:17

ikawrakow mentioned this pull request Apr 12, 2025

Fix KLD precision ikawrakow/ik_llama.cpp#325

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Additional KL-divergence statistics #5081

Additional KL-divergence statistics #5081

Uh oh!

ikawrakow commented Jan 22, 2024

Uh oh!

JohannesGaessler Jan 22, 2024

Uh oh!

JohannesGaessler Jan 22, 2024

Uh oh!

kalomaze commented Jan 23, 2024 •

edited

Loading

Uh oh!

Uh oh!


		std::sort(kld_values.begin(), kld_values.end());

		printf("===== KL-divergence statistics\n");

	printf("===== KL-divergence statistics\n");
	printf("===== KL-divergence statistics =====\n");

Additional KL-divergence statistics #5081

Additional KL-divergence statistics #5081

Uh oh!

Conversation

ikawrakow commented Jan 22, 2024

Uh oh!

JohannesGaessler Jan 22, 2024

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler Jan 22, 2024

Choose a reason for hiding this comment

Uh oh!

kalomaze commented Jan 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kalomaze commented Jan 23, 2024 •

edited

Loading