CuTe HGEMM Block Swizzle
What's Changed
- [HGEMM] trans mat b from row major -> col major by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/135
- [HGEMM] refactor HGEMM cpp benchmark by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/136
- [HGEMM] Update HGEMM L20/4090 Bench by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/137
- [HGEMM] fix cublas hgemm handle error by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/138
- [HGEMM] Add MMA HGEMM NN C++ benchmark by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/139
- [HGEMM] CuTe HGEMM with Thread Block Swizzle by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/140
- [HGEMM] clear tensor cache avoid OOM by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/141
- [HGEMM] Add gc.collect to HGEMM bench script by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/142
- [HGEMM] Add show_memory option to bench by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/143
- [HGEMM] manually init/destroy cublas handle by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/144
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.6.1...v2.6.2