Skip to content

arg : no n_predict = -2 for examples except for main and infill #12364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 13, 2025

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Mar 13, 2025

Supersede #12347 and #12323

Close #12264

I checked the code base and turns out n_predict is only support on main.cpp and infill.cpp

For server, use --no-context-shift to do the same thing, so it doesn't make sense to add n_predict == -2 support to server (which turns out to be quite messy)

@ngxson ngxson requested a review from ggerganov March 13, 2025 10:29
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove -2 from the main and infill examples and use the --no-context-shift too, but we can do it later.

@ngxson ngxson merged commit be7c303 into ggml-org:master Mar 13, 2025
47 checks passed
jpohhhh pushed a commit to Telosnex/llama.cpp that referenced this pull request Mar 14, 2025
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Mar 19, 2025
@Martin-Laclaustra
Copy link

Please reconsider that there is a real need for n_predict = -2 in the server example and --no-context-shift is not equivalent to stopping at the end of the context:

#12264 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Eval bug: server API endpoint not respecting n_predict with -2 (until context filled)
3 participants