v2.4.5 HGEMM Double Buffers
What's Changed
- [FlashAttention] Refactor FlashAttention PyTorch bindings by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/55
- [SGEMM] test bank conflicts free with smem offset by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/56
- [HGEMM] HEGMM kernel with double buffers by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/57
- [Docs] Add docs for HGEMM/SGEMM double buffers by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/58
- [HGEMM] Add PyTorch HGEMM profile by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/59
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.4...v2.4.5