v2.4.4 Pack HGEMM
What's Changed
- [SGEMM] Add naive sgemm kernel by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/51
- [SGEMM] bank conflicts free & double buffers by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/52
- [Misc][Benchmark] optimize benchmarks by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/53
- [HGEMM] Pack sliced_k f16x4/fp16x8 HGEMM by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/54
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.3...v2.4.4