Releases: leejet/stable-diffusion.cpp
Releases Β· leejet/stable-diffusion.cpp
master-5c561ea
feat: do not convert more flux tensors
master-1bdc767
feat: force using f32 for some layers
master-c837c5d
style: format code
master-64d231f
feat: add flux support (#356) * add flux support * avoid build failures in non-CUDA environments * fix schnell support * add k quants support * add support for applying lora to quantized tensors * add inplace conversion support for f8_e4m3 (#359) in the same way it is done for bf16 like how bf16 converts losslessly to fp32, f8_e4m3 converts losslessly to fp16 * add xlabs flux comfy converted lora support * update docs --------- Co-authored-by: Erik Scholz <[email protected]>
master-697d000
feat: add SYCL Backend Support for Intel GPUs (#330) * update ggml and add SYCL CMake option Signed-off-by: zhentaoyu <[email protected]> * hacky CMakeLists.txt for updating ggml in cpu backend Signed-off-by: zhentaoyu <[email protected]> * rebase and clean code Signed-off-by: zhentaoyu <[email protected]> * add sycl in README Signed-off-by: zhentaoyu <[email protected]> * rebase ggml commit Signed-off-by: zhentaoyu <[email protected]> * refine README Signed-off-by: zhentaoyu <[email protected]> * update ggml for supporting sycl tsembd op Signed-off-by: zhentaoyu <[email protected]> --------- Signed-off-by: zhentaoyu <[email protected]>
master-3d854f7
sync: update ggml submodule url
master-73c2176
feat: add sd3 support (#298)
master-4a6e36e
sync: update ggml
master-9c51d87
chore: fix cuda CI (#286)
master-e1384de
perf: make crc32 100x faster on x86-64 (#278) This change makes checkpoints load significantly faster by optimizing pkzip's cyclic redundancy check. This code was developed by Intel and Google and Mozilla. See Chromium's zlib codebase for further details.