sampling : avoid expensive softmax during greedy sampling #9605

ggerganov · 2024-09-23T09:49:11Z

When the temperature is non-positive, we can simply sample greedily the token with the highest logit. But in some cases, the probs of the secondary tokens are also required (e.g. llama-server to display candidate probs, llama-speculative to peform stochastic speculative sampling). In such cases, we first filter the the top sparams.n_probs tokens via a top-k sampler and then apply softmax to them in order to avoid sorting the full vocabulary.

Also add perf timings to test-sampling to keep track of the performance of the samplers.

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High
  ggml-ci

ggml-ci

tests/test-sampling.cpp

Co-authored-by: slaren <[email protected]>

) * sampling : avoid expensive softmax during greedy sampling ggml-ci * speculative : fix default RNG seed + set sparams.n_probs * Update tests/test-sampling.cpp Co-authored-by: slaren <[email protected]> * sampling : add clarifying comment [no ci] --------- Co-authored-by: slaren <[email protected]>

ggerganov added 2 commits September 23, 2024 12:31

sampling : avoid expensive softmax during greedy sampling

8241bc7

ggml-ci

speculative : fix default RNG seed + set sparams.n_probs

3cb33a8

github-actions bot added testing Everything test related examples labels Sep 23, 2024

slaren reviewed Sep 23, 2024

View reviewed changes

tests/test-sampling.cpp Outdated Show resolved Hide resolved

Update tests/test-sampling.cpp

a5a11bf

Co-authored-by: slaren <[email protected]>

slaren approved these changes Sep 23, 2024

View reviewed changes

sampling : add clarifying comment [no ci]

e9e1c20

ggerganov merged commit b0f2736 into master Sep 24, 2024
1 check passed

ggerganov deleted the gg/sampling-faster-greedy branch September 24, 2024 06:03

ggerganov mentioned this pull request Sep 24, 2024

Bug: Lower performance in pre-built binary llama-server, Since llama-b3681-bin-win-cuda-cu12.2.0-x64 #9530

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sampling : avoid expensive softmax during greedy sampling #9605

sampling : avoid expensive softmax during greedy sampling #9605

Uh oh!

ggerganov commented Sep 23, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sampling : avoid expensive softmax during greedy sampling #9605

sampling : avoid expensive softmax during greedy sampling #9605

Uh oh!

Conversation

ggerganov commented Sep 23, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!