LeetCUDA v3.0.6
What's Changed
- misc: update merge_attn_states unit tests by @DefTruth in #281
- misc: update merge_attn_states docs by @DefTruth in #282
- misc: update merge_attn_states docs by @DefTruth in #283
- feat: remove merge_attn_states kernel help func by @DefTruth in #284
- misc: remove static flag for to/from_float by @DefTruth in #285
- misc: add new zhihu tech blog link by @DefTruth in #287
- misc: add debug flag for ncu profile by @DefTruth in #288
- bugfix: corrected theta calculation in RoPE CUDA kernel by @jiaau in #290
- docs: Add my ring-attention zhihu blog by @DefTruth in #291
- Add simple CuTe mat-transpose implementations by @botbw in #292
- Update README.md by @DefTruth in #296
- Update README.md by @DefTruth in #297
- Update README.md by @DefTruth in #298
- Rename to LeetCUDA by @DefTruth in #299
New Contributors
Full Changelog: v3.0.5...v3.0.6