Skip to content

Releases: leejet/stable-diffusion.cpp

master-5c561ea

25 Aug 09:14
Compare
Choose a tag to compare
feat: do not convert more flux tensors

master-1bdc767

25 Aug 07:05
Compare
Choose a tag to compare
feat: force using f32 for some layers

master-c837c5d

24 Aug 17:41
Compare
Choose a tag to compare
style: format code

master-64d231f

24 Aug 07:42
64d231f
Compare
Choose a tag to compare
feat: add flux support (#356)

* add flux support

* avoid build failures in non-CUDA environments

* fix schnell support

* add k quants support

* add support for applying lora to quantized tensors

* add inplace conversion support for f8_e4m3 (#359)

in the same way it is done for bf16
like how bf16 converts losslessly to fp32,
f8_e4m3 converts losslessly to fp16

* add xlabs flux comfy converted lora support

* update docs

---------

Co-authored-by: Erik Scholz <[email protected]>

master-697d000

10 Aug 06:56
697d000
Compare
Choose a tag to compare
feat: add SYCL Backend Support for Intel GPUs (#330)

* update ggml and add SYCL CMake option

Signed-off-by: zhentaoyu <[email protected]>

* hacky CMakeLists.txt for updating ggml in cpu backend

Signed-off-by: zhentaoyu <[email protected]>

* rebase and clean code

Signed-off-by: zhentaoyu <[email protected]>

* add sycl in README

Signed-off-by: zhentaoyu <[email protected]>

* rebase ggml commit

Signed-off-by: zhentaoyu <[email protected]>

* refine README

Signed-off-by: zhentaoyu <[email protected]>

* update ggml for supporting sycl tsembd op

Signed-off-by: zhentaoyu <[email protected]>

---------

Signed-off-by: zhentaoyu <[email protected]>

master-3d854f7

03 Aug 05:01
Compare
Choose a tag to compare
sync: update ggml submodule url

master-73c2176

28 Jul 07:59
73c2176
Compare
Choose a tag to compare
feat: add sd3 support (#298)

master-4a6e36e

28 Jul 11:45
Compare
Choose a tag to compare
sync: update ggml

master-9c51d87

12 Jun 15:33
9c51d87
Compare
Choose a tag to compare
chore: fix cuda CI (#286)

master-e1384de

01 Jun 05:12
e1384de
Compare
Choose a tag to compare
perf: make crc32 100x faster on x86-64 (#278)

This change makes checkpoints load significantly faster by optimizing
pkzip's cyclic redundancy check. This code was developed by Intel and
Google and Mozilla. See Chromium's zlib codebase for further details.