Skip to content

Navigation Menu

Appearance settings

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Appearance settings

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

xlite-dev / LeetCUDA Public

Notifications You must be signed in to change notification settings
Fork 514
Star 4.8k

Code
Issues 4
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: xlite-dev/LeetCUDA

Releases · xlite-dev/LeetCUDA

v2.4.6 HGEMM Copy Async

08 Oct 03:48

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4.6 HGEMM Copy Async

What's Changed

[Softmax] Add online softmax according to Nvidia Paper by @bear-zd in https://github.com/DefTruth/CUDA-Learn-Notes/pull/60
[HGEMM][Async] support K16/32 pack+cp.async+dbuf by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/62
[Softmax][Bugfix] fixed softmax compile error by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/63

New Contributors

@bear-zd made their first contribution in https://github.com/DefTruth/CUDA-Learn-Notes/pull/60

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.5...v2.4.6

Contributors

DefTruth and bear-zd

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

DefTruth and bear-zd reacted with heart emoji

All reactions

❤️ 2 reactions

2 people reacted

v2.4.5 HGEMM Double Buffers

30 Sep 07:47

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4.5 HGEMM Double Buffers

What's Changed

[FlashAttention] Refactor FlashAttention PyTorch bindings by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/55
[SGEMM] test bank conflicts free with smem offset by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/56
[HGEMM] HEGMM kernel with double buffers by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/57
[Docs] Add docs for HGEMM/SGEMM double buffers by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/58
[HGEMM] Add PyTorch HGEMM profile by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/59

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.4...v2.4.5

Contributors

DefTruth

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

DefTruth reacted with heart emoji

All reactions

❤️ 1 reaction

1 person reacted

v2.4.4 Pack HGEMM

29 Sep 11:01

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4.4 Pack HGEMM

What's Changed

[SGEMM] Add naive sgemm kernel by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/51
[SGEMM] bank conflicts free & double buffers by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/52
[Misc][Benchmark] optimize benchmarks by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/53
[HGEMM] Pack sliced_k f16x4/fp16x8 HGEMM by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/54

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.3...v2.4.4

Contributors

DefTruth

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

v2.4.3 Pack Softmax

27 Sep 02:00

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4.3 Pack Softmax

What's Changed

[LayerNorm][FP16] support fp16x8_pack_f32 kernel by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/48
[Softmax][FP16] Pack f16x8 softmax kernel by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/49

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.2...v2.4.3

Contributors

DefTruth

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

v2.4.2 Pack RMSNorm

26 Sep 01:14

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4.2 Pack RMSNorm

What's Changed

[RMSNorm][FP16] Pack f16x8 rmsnorm by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/47

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.1...v2.4.2

Contributors

DefTruth

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

v2.4.1 Pack LayerNorm

25 Sep 06:07

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4.1 Pack LayerNorm

What's Changed

[Nsight] Add nsys/ncu usage, ptx/sass by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/44
[DotProd][FP16] support f16x8_pack kernel by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/45
[LayerNorm][FP16] Add pack support for f16x8 LD/ST by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/46

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4...v2.4.1

Contributors

DefTruth

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

v2.4 Pack Reduce LDST

24 Sep 02:13

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4 Pack Reduce LDST

What's Changed

[Reduce][Kernel] Pack f16/bf16x8 & fp8/i8x16 LD/ST by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/43

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.3.1...v2.4

Contributors

DefTruth

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

v2.3.1 f16x8 Pack Elementwise

23 Sep 03:44

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.3.1 f16x8 Pack Elementwise

What's Changed

[FA2][Half] Add FA2 f16_mma_m16n8k16 kernel by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/35
[Refactor][7/N] CUDA Learn Notes refactor Part-7 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/36
Clamped input range in Sigmoid kernel to prevent overflow by @Phoenix8215 in https://github.com/DefTruth/CUDA-Learn-Notes/pull/37
[Sigmoid][F16] Add f16x8_pack kernel, boost 1.5x ~ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/39
[Elementwise][Half] support f16x8_pack kernel, boost 1.1x by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/40
[FlashAttention] replace FLOAT4 with LDST128BITS macro by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/41
[RELU][FP16] Add f16x8_pack kernel, boost 2.1x by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/42

New Contributors

@Phoenix8215 made their first contribution in https://github.com/DefTruth/CUDA-Learn-Notes/pull/37

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.3...v2.3.1

Contributors

DefTruth and Phoenix8215

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

v2.3 Refactor 6/N

17 Sep 07:57

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.3 Refactor 6/N

What's Changed

[Refactor][6/N] CUDA Learn Notes refactor Part-6 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/17
[Refactor][5/N] CUDA Learn Notes refactor Part-6 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/18
[LayerNorm][Half] support fp16x8 packed LayerNorm by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/19
[Reduce][Half] add HALF2 & BFLOAT2 macro by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/21
[RMSNorm][Half] support fp16x8 packed RMSNorm by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/22
[Bugfix][Kernel] fixed some kernel blocks calculate errors by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/23
[Elementwise][Half] support fp16x8 packed Elementwise by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/24
[Elementwise][Half] support fp16x8 packed Elementwise by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/25
[RELU][Half] support fp16x8 RELU kernel by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/26
[RMSNorm] support f16x8_f32 RMSNorm by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/28
[RMSNorm][Kernel] Add FLOAT2/HALF2_VARIANCE macro by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/29
[LayerNorm][Kernel] Add HALF2 SUM/SUB/VAR macro by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/30
[HGEMM] Add slicked_k&t_8x8_sliced_k_f16x4 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/31
[HGEMV][Half] support hgemv k32/k128/f16 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/32
[FlashAttention] Refactor flash_attn_1_fwd_f32 kernel by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/33
Bump up to v2.3 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/34

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.2...v2.3

Contributors

DefTruth

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

v2.2 Refactor 5/N

12 Sep 01:36

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.2 Refactor 5/N

What's Changed

[Refactor][5/N] CUDA Learn Notes refactor Part-5 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/15
Bump up to v2.2 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/16

Full Changelog: DefTruth/CUDA-Learn-Notes@2.1...v2.2

Contributors

DefTruth

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

Previous 1 2 3 4 5 6 Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.