Skip to content

sampling: fix top_k <= 0 #5388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

JohannesGaessler
Copy link
Collaborator

Fixes abetlen/llama-cpp-python#1154 .
(I assume) the issue is that top_k sampling is called directly rather than via the sampling queue.
top_k values <= 0 are then rounded up to min_keep instead of being set to candidates.size.
This PR adds a corresponding check and two more related test cases.

Copy link
Collaborator

@Green-Sky Green-Sky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this fix the server behavior, when you set the "Show Probabilities" option ?

@JohannesGaessler
Copy link
Collaborator Author

I don't know which issue you're talking about so I can't say.

@Green-Sky
Copy link
Collaborator

No it does not. Ignore me :)

Co-authored-by: Georgi Gerganov <[email protected]>
@neerajprad
Copy link

I still see the same issue on the latest version (0.2.44) where the sampling is deterministic for K=0.

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
* sampling: fix top_k <= 0

* Update llama.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* sampling: fix top_k <= 0

* Update llama.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[0.2.38] Models always produce the same output when setting top_k to 0, still using min_p
5 participants