Skip to content

Support 16 channel TAEs (taesd3 and taef1) #527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 28, 2024
Merged

Conversation

stduhpf
Copy link
Contributor

@stduhpf stduhpf commented Dec 21, 2024

No description provided.

@stduhpf stduhpf changed the title Support 16 channel TAEs (taesd and taef1) Support 16 channel TAEs (taesd3 and taef1) Dec 21, 2024
@leejet leejet merged commit d50473d into leejet:master Dec 28, 2024
9 checks passed
@leejet
Copy link
Owner

leejet commented Dec 28, 2024

Thank you for your contribution.

stduhpf added a commit to stduhpf/stable-diffusion.cpp that referenced this pull request Dec 28, 2024
@stduhpf stduhpf deleted the tae16 branch January 1, 2025 15:48
@LostRuins
Copy link
Contributor

TAE is truly amazing. By using f8 e4m3 for the weights, it compresses down to 2.5mb per TAE, I'm able to pack the TAEs for sd1.5, sdxl, sd3 and flux all together in under 10mb total, and they're usable enough to replace the usual VAEs.

image

taesd_flux_f8e4m3.zip < 2mb!!!

It does get a little wonky though. I did a little bit of testing and noticed that the numerical range for all of the values in weight tensors within TAEs so far lies within -5.0 to 5.0 range. Perhaps we can consider adding compatibility for the fp8 e3m4 format, it'll provide better precision within this range compared to e4m3? Thouights?

@stduhpf
Copy link
Contributor Author

stduhpf commented Jan 8, 2025

Perhaps we can consider adding compatibility for the fp8 e3m4 format, it'll provide better precision within this range compared to e4m3? Thouights?

I can try to give it a shot, but wouldn't q8_0 be better anyways?

@LostRuins
Copy link
Contributor

Perhaps we can consider adding compatibility for the fp8 e3m4 format, it'll provide better precision within this range compared to e4m3? Thouights?

I can try to give it a shot, but wouldn't q8_0 be better anyways?

I did consider that. Unfortunately that is not possible as far as I know, because the shape of a majority of the weight tensors are in the shape [ 3, 3, 64, 64 ] while q8_0 and in fact all GGUF quants require a smallest block size of 32?

@stduhpf
Copy link
Contributor Author

stduhpf commented Jan 8, 2025

Ah, right, that would be a problem

@stduhpf
Copy link
Contributor Author

stduhpf commented Jan 8, 2025

I can't find any fp8 e3m4 standard... Should I just make up something like "e3m4 fn" using the same kind of formating as fp8 e4m3 fn?

Edit: I just tried, it matches the table here https://paperswithcode.com/paper/efficient-post-training-quantization-with-fp8/review/#arxiv-table-container

@stduhpf
Copy link
Contributor Author

stduhpf commented Jan 8, 2025

@LostRuins I'm pretty sure this should work: #559
Now, I have no idea how to even convert the models to e4m3 to test it.

Edit: also, depending on how the weight values are distributed, we might as well make up something like "fp8 e2m5", which would have a range between -7.75 and 7.75 (but a huge minimum positive subnormal of 0.03125)

@stduhpf stduhpf mentioned this pull request Jan 9, 2025
@LostRuins
Copy link
Contributor

Hm, previously I converted it using a pytorch script as torch.float8_e4m3fn is a supported type https://pytorch.org/docs/stable/tensors.html

However, e3m4 is not. So that might be a little more inconvenient, and perhaps not such a good idea. I was not aware that the format was not supported by torch natively. Ideally, it should be a valid standards adhering safetensors file.

@stduhpf
Copy link
Contributor Author

stduhpf commented Jan 9, 2025

Another format that would be better than standard e4m3(fn) is e4m3fnuz (from onnx). At least it's a bit more documented than e3m4, but still not compatible with safetensors as far as I know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants