v2.4.6 HGEMM Copy Async
What's Changed
- [Softmax] Add online softmax according to Nvidia Paper by @bear-zd in https://github.com/DefTruth/CUDA-Learn-Notes/pull/60
- [HGEMM][Async] support K16/32 pack+cp.async+dbuf by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/62
- [Softmax][Bugfix] fixed softmax compile error by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/63
New Contributors
- @bear-zd made their first contribution in https://github.com/DefTruth/CUDA-Learn-Notes/pull/60
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.5...v2.4.6