perplexity : fix integer overflow #9783

ggerganov · 2024-10-08T06:16:23Z

There was an integer overflow in i*n_vocab:

frame #1: 0x0000000100403db8 llama-perplexity`process_logits(int, float const*, int const*, int, std::__1::vector<std::__1::thread, std::__1::allocator<std::__1::thread>>&, double&, double&, float*, float*)::$_0::operator()(this=0x0000600013e88068) const at perplexity.cpp:172:49
   169 	               break;
   170 	           }
   171 	           lock.unlock();
-> 172 	           const results_log_softmax results = log_softmax(n_vocab, logits + i*n_vocab, tokens[i+1]);
   173 	           const double v = -results.log_softmax;
   174 	           local_nll += v;
   175 	           local_nll2 += v*v;

(lldb) print i
(int) 14242
(lldb) print n_vocab
(int) 152064
(lldb) print i*n_vocab
(int) -2129271808

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

ggml-ci

ggerganov · 2024-10-08T06:41:40Z

examples/perplexity/perplexity.cpp

+    int64_t n_vocab;
+    int64_t n_chunk;
    in.read((char *)&n_vocab, sizeof(n_vocab));
    in.read((char *)&n_chunk, sizeof(n_chunk));


Using int64_t here is for n_chunk was incorrect. Pushing a fix

JohannesGaessler · 2024-10-08T09:11:20Z

examples/perplexity/perplexity.cpp

+    for (int i = 0; i < (int) batch.n_tokens; i += n_batch) {
+        const int n_tokens = std::min<int>(n_batch, batch.n_tokens - i);


IIRC the C standard only guarantees at least 16 bit for int.

According to Wikpedia:

The standard integer size is platform-dependent. In C, it is denoted by int and required to be at least 16 bits. Windows and Unix systems have 32-bit ints on both 32-bit and 64-bit architectures.

According to cppreference:

int — basic integer type. The keyword int may be omitted if any of the modifiers listed below are used. If no length modifiers are present, it's guaranteed to have a width of at least 16 bits. However, on 32/64 bit systems it is almost exclusively guaranteed to have width of at least 32 bits (see below).

I doubt this is a concern. We can safely assume that int will always be 32-bit

* perplexity : fix integer overflow ggml-ci * perplexity : keep n_vocab as int and make appropriate casts ggml-ci

github-actions bot added the examples label Oct 8, 2024

perplexity : fix integer overflow

22cc760

ggml-ci

ggerganov force-pushed the gg/perplexity-fix-int-overflow branch from 3229586 to 22cc760 Compare October 8, 2024 06:23

JohannesGaessler approved these changes Oct 8, 2024

View reviewed changes

perplexity : keep n_vocab as int and make appropriate casts

fbefe17

ggml-ci

ggerganov commented Oct 8, 2024

View reviewed changes

JohannesGaessler reviewed Oct 8, 2024

View reviewed changes

ggerganov merged commit e702206 into master Oct 9, 2024
58 checks passed

ggerganov deleted the gg/perplexity-fix-int-overflow branch October 9, 2024 14:00

dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024

perplexity : fix integer overflow (ggml-org#9783)

b6fc168

* perplexity : fix integer overflow ggml-ci * perplexity : keep n_vocab as int and make appropriate casts ggml-ci

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024

perplexity : fix integer overflow (ggml-org#9783)

f47ff96

* perplexity : fix integer overflow ggml-ci * perplexity : keep n_vocab as int and make appropriate casts ggml-ci

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024

perplexity : fix integer overflow (ggml-org#9783)

77279b7

* perplexity : fix integer overflow ggml-ci * perplexity : keep n_vocab as int and make appropriate casts ggml-ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perplexity : fix integer overflow #9783

perplexity : fix integer overflow #9783

Uh oh!

ggerganov commented Oct 8, 2024

Uh oh!

ggerganov Oct 8, 2024

Uh oh!

JohannesGaessler Oct 8, 2024

Uh oh!

JohannesGaessler Oct 8, 2024

Uh oh!

JohannesGaessler Oct 8, 2024

Uh oh!

ggerganov Oct 9, 2024

Uh oh!

Uh oh!

Uh oh!

		for (int i = 0; i < (int) batch.n_tokens; i += n_batch) {
		const int n_tokens = std::min<int>(n_batch, batch.n_tokens - i);

perplexity : fix integer overflow #9783

perplexity : fix integer overflow #9783

Uh oh!

Conversation

ggerganov commented Oct 8, 2024

Uh oh!

ggerganov Oct 8, 2024

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler Oct 8, 2024

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler Oct 8, 2024

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler Oct 8, 2024

Choose a reason for hiding this comment

Uh oh!

ggerganov Oct 9, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!