Skip to content

Help: Where/how dequantize happens and how to create new quantization formats? #1796

Answered by KerfuffleV2
emin63 asked this question in Q&A
Discussion options

You must be logged in to vote

You'll probably get a better answer from someone else, but:

For the most part, the packing (quantizing) part happens when the model file is created. Look at examples/quantize for the tool used to quantize models. Each tensor saved in the model has an ftype which is the type of tensor: it could be GGML_TYPE_F16, it could be GGML_TYPE_Q5_0, etc.

When the file is loaded, the tensors are created with the type that was saved (generally). Getting ready for inference involves building a graph of the various operations that will be performed on the tensors. Some operations support working on quantized tensors, some don't so doing something like trying to perform an operation with the wrong type o…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@emin63
Comment options

Answer selected by emin63
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants