Releases: xlite-dev/LeetCUDA
Releases · xlite-dev/LeetCUDA
🎉FA2 MMA Split KV/Q
What's Changed
- [FlashAttention] Update flash-attention-mma 0.0.1 🎉 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/159
- [FA2] Release flash-attn-mma split-kv/q🎉 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/160
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.6.6...v2.6.7
🎉flash-attention-mma 0.0.1
What's Changed
- [HGEMM] CuTe HGEMM debug Makefile target by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/154
- [Softmax] Update Online Softmax bindings by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/155
- [FlashAttention] Refactor toy-flash-attn codes part-1 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/156
- [Bug]Fix typo by @wjj19950828 in https://github.com/DefTruth/CUDA-Learn-Notes/pull/157
- [FlashAttention] Release flash-atttention-mma 0.0.1 🎉 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/158
New Contributors
- @wjj19950828 made their first contribution in https://github.com/DefTruth/CUDA-Learn-Notes/pull/157
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.6.5...v2.6.6
⚡️⚡️toy-hgemm library
What's Changed
- [HGEMM] Update RTX 3080 Laptop perf by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/148
- [HGEMM] Update toy-hgemm library 0.1.0 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/149
- [HGEMM] Update toy-hgemm library 0.1.0 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/150
- [HGEMM] Update toy-hgemm library 0.1.0 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/152
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.6.4...v2.6.5
toy-hgemm library
What's Changed
- [HGEMM] Release toy-hgemm library 0.1.0 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/146
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.6.3...v2.6.4
toy-hgemm library
What's Changed
- [HGEMM] Release toy-hgemm library 0.1.0 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/145
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.6.2...v2.6.3
CuTe HGEMM Block Swizzle
What's Changed
- [HGEMM] trans mat b from row major -> col major by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/135
- [HGEMM] refactor HGEMM cpp benchmark by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/136
- [HGEMM] Update HGEMM L20/4090 Bench by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/137
- [HGEMM] fix cublas hgemm handle error by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/138
- [HGEMM] Add MMA HGEMM NN C++ benchmark by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/139
- [HGEMM] CuTe HGEMM with Thread Block Swizzle by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/140
- [HGEMM] clear tensor cache avoid OOM by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/141
- [HGEMM] Add gc.collect to HGEMM bench script by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/142
- [HGEMM] Add show_memory option to bench by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/143
- [HGEMM] manually init/destroy cublas handle by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/144
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.6.1...v2.6.2
v2.6.1 CuTe HGEMM
What's Changed
- [HGEMM] Add large MNK block swizzle policy by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/132
- [HGEMM] Add CuTe HGEMM with SMEM Swizzle by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/134
- Update embedding.cu by @TheManWhoIsStupid in https://github.com/DefTruth/CUDA-Learn-Notes/pull/133
New Contributors
- @TheManWhoIsStupid made their first contribution in https://github.com/DefTruth/CUDA-Learn-Notes/pull/133
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.6...v2.6.1
v2.6 Refactor 7/N
What's Changed
- [HGEMM] Update NVIDIA L20/4090 Perf plots by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/126
- [Blog]图解DeepSpeed-Ulysses&Megatron-LM TP/SP by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/127
- [README] Add contents lists by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/128
- [README] Update README by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/129
- [README] Update README.md by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/130
- Bump up to v2.6 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/131
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.5...v2.6
v2.5
What's Changed
- [HGEMM] Update HGEMM README.md by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/120
- [HGEMM] Add plot tflops function by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/121
- [HGEMM] Add NVIDIA RTX 3090 Laptop perf plot by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/122
- [PERF] Update HGEMM benchmark scripts by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/123
- [HGEMM] Add HGEMM L20/4090 benchmark figures by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/124
- Bump up to v2.5 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/125
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.18...v2.5
v2.4.18
What's Changed
- Update README.md by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/115
- [HGEMM] Update HGEMM Supported Matrix by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/116
- [HGEMM] Update HGEMM/SGEMM Supported Matrix by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/117
- [README] Update HGEMM/SGEMM Supported Matrix by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/118
- [HGEMM] Add NVIDIA RTX 4090 benchmark by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/119
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.17...v2.4.18