FA2 QKV SMEM Swizzle✔️
What's Changed
- [Doc] Refactor README for better readability✔️ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/186
- [FA2] shared-kv + fully smem swizzle✔️ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/187
- [FA2] tiling-qk + fully smem swizzle✔️ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/188
- [FA2] shared-qkv + fully swizzle q/qk/qkv✔️ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/189
- [FA2] hotfix launch setting -> shared-kv d=256✔️ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/190
- [FA2] support shared-qkv + O s2g kernel✔️ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/191
- [FA2] Update RTX 3080 Laptop performance✔️ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/192
- [Doc] Refactor README for better readability✔️ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/193
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.6.12...v2.6.13