Skip to content

quantize : make output filename optional again #2823

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 28, 2023

Conversation

cebtenzzre
Copy link
Collaborator

Commit 5c7b0e7 removed the ability to simply call quantize with an input filename and a quantization type, like this:

./quantize some_ggml_model.bin Q4_0

The output filename may be omitted if you pass a value for nthreads. In other words, the script works as long as you pass at least three arguments, but two of the four are optional.

This was presumably unintentional. Changing the minimum argument count to two makes the behavior of quantize match the usage string:

usage: ./quantize [--help] [--allow-requantize] [--leave-output-tensor] model-f32.bin [model-quant.bin] type [nthreads]

@cebtenzzre cebtenzzre requested a review from KerfuffleV2 August 27, 2023 05:46
@ghost
Copy link

ghost commented Aug 27, 2023

Hi,

Unrelated to this PR specifically: I'm getting an error with quantize trying to convert to GGUF v2. wizardLM & llama2chat converted fine, it seems that Vicuna/Pygmalion-Vicuna has a q6_k tensor, but conversion is disabled for it:

llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
llama_model_quantize_internal: meta size = 773344 bytes
[   1/ 291]                    token_embd.weight - [ 4096, 32000,     1,     1], type =   q4_0, size =   70.312 MB
[   2/ 291]                   output_norm.weight - [ 4096,     1,     1,     1], type =    f32, size =    0.016 MB
[   3/ 291]                        output.weight - [ 4096, 32000,     1,     1], type =   q6_K, llama_model_quantize: failed to quantize: requantizing from type q6_K is disabled
main: failed to quantize model from '/data/data/com.termux/files/home/Pygmalion-Vicuna.gguf'

Here's my attempt: ./quantize ~/Pygmalion-Vicuna.gguf 2 3 --allow-requantization

I understand if it cannot be enabled, but figured I'd ask anyway.

Thank you.

@ggerganov
Copy link
Member

Try ./quantize --allow-requantization ~/Pygmalion-Vicuna.gguf 2 3 instead

@slaren
Copy link
Member

slaren commented Aug 27, 2023

While fixing this, can we also fix the destination path when using \ as the directory separator under Windows? Otherwise, it writes the file to the current directory.

diff --git a/examples/quantize/quantize.cpp b/examples/quantize/quantize.cpp
index 0e3a26b..df9a214 100644
--- a/examples/quantize/quantize.cpp
+++ b/examples/quantize/quantize.cpp
@@ -114,7 +114,7 @@ int main(int argc, char ** argv) {
     std::string ftype_str;
     if (try_parse_ftype(argv[arg_idx], params.ftype, ftype_str)) {
         std::string fpath;
-        const size_t pos = fname_inp.find_last_of('/');
+        const size_t pos = fname_inp.find_last_of("/\\");
         if (pos != std::string::npos) {
             fpath = fname_inp.substr(0, pos + 1);
         }

@ghost
Copy link

ghost commented Aug 27, 2023

Try ./quantize --allow-requantize ~/Pygmalion-Vicuna.gguf 2 3 instead

I successfully converted, thank you.

Copy link
Collaborator

@KerfuffleV2 KerfuffleV2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this, I think you're right about it being an oversight. I reaaaallly hate the way the quantize tool handles arguments in general. I think it's better to be explicit than have weird magical behavior with positional arguments.

@KerfuffleV2
Copy link
Collaborator

it seems that Vicuna/Pygmalion-Vicuna has a q6_k tensor, but conversion is disabled for it

It was probably quantized with --leave-output-tensor. Quantizing to q4_0 from the q6_k tensor there is probably going to noticeably hurt quality (fine if you you were just testing the arguments, but I thought I'd mention that).

@ggerganov ggerganov merged commit ebcee20 into ggml-org:master Aug 28, 2023
akawrykow pushed a commit to akawrykow/llama.cpp that referenced this pull request Aug 29, 2023
* quantize : make output filename optional again

* quantize : fix path parsing on Windows

suggested by @slaren
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants