Skip to content

Navigation Menu

Appearance settings

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Appearance settings

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

xlite-dev / LeetCUDA Public

Notifications You must be signed in to change notification settings
Fork 515
Star 4.8k

Code
Issues 4
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: xlite-dev/LeetCUDA

Releases · xlite-dev/LeetCUDA

v3.0.11

11 Jun 05:57

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v3.0.11 Latest

Latest

What's Changed

feat: add cute hgemv implement by @kitecats in #331
Update README.md by @DefTruth in #333
feat: add a cute bank-free mat transpose vectorize impelment by @kitecats in #334
bugfix: fix layernorm & rmsnorm f16 overflow by @hebangwen in #335
Bugfix: fix a compilation error by @lixiaoquan in #336

New Contributors

@hebangwen made their first contribution in #335
@lixiaoquan made their first contribution in #336

Full Changelog: v3.0.10...v3.0.11

Contributors

lixiaoquan, DefTruth, and 2 other contributors

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

DefTruth reacted with thumbs up emoji

All reactions

👍 1 reaction

1 person reacted

v3.0.10

03 Jun 02:13

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v3.0.10

What's Changed

Update README.md by @DefTruth in #322
Update README.md by @DefTruth in #323
Fix: missing source by @botbw in #325
Use 128-bit data loading by @kitecats in #326
Create FUNDING.yml by @DefTruth in #327
Add open-collective badge by @DefTruth in #328
Update open-collective contributors badge by @DefTruth in #329

New Contributors

@kitecats made their first contribution in #326

Full Changelog: v3.0.9...v3.0.10

Contributors

DefTruth, botbw, and kitecats

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

DefTruth reacted with thumbs up emoji

All reactions

👍 1 reaction

1 person reacted

v3.0.9

12 May 01:53

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v3.0.9

What's Changed

feat: add some torch.distributed examples by @DefTruth in #313
feat: add some torch.distributed examples by @DefTruth in #315
feat: add a naive CuTe flash-attn by @botbw in #314
fix(kernels): correct typo in LayerNorm kernel at line 73 110 346 443 by @nxdxml in #317
misc: manually update submodules by @DefTruth in #318
chore: add naive cute flash-attn index by @DefTruth in #319
add triton merge_attn_states zhihu blog by @DefTruth in #320

New Contributors

@nxdxml made their first contribution in #317

Full Changelog: v3.0.8...v3.0.9

Contributors

DefTruth, botbw, and nxdxml

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

v3.0.8

06 May 06:23

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v3.0.8

What's Changed

Update README.md by @DefTruth in #308
misc: add triton vector add zhihu blog by @DefTruth in #310
Update README.md by @DefTruth in #311

Full Changelog: v3.0.7...v3.0.8

Contributors

DefTruth

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

LeetCUDA v3.0.7

28 Apr 06:02

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

LeetCUDA v3.0.7

What's Changed

Update mat-transpose/README.md by @DefTruth in #300
feat: add triton fused-softmax by @DefTruth in #301
misc: add pre-commit & format by @DefTruth in #302
misc: add developer guide by @DefTruth in #303
misc: add developer guide by @DefTruth in #304
misc: fix typo by @DefTruth in #305
Update CONTRIBUTE.md by @DefTruth in #306
feat: update pre-commit max-length=80 by @DefTruth in #307

Full Changelog: v3.0.6...v3.0.7

Contributors

DefTruth

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

LeetCUDA v3.0.6

26 Apr 06:51

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

LeetCUDA v3.0.6

What's Changed

misc: update merge_attn_states unit tests by @DefTruth in #281
misc: update merge_attn_states docs by @DefTruth in #282
misc: update merge_attn_states docs by @DefTruth in #283
feat: remove merge_attn_states kernel help func by @DefTruth in #284
misc: remove static flag for to/from_float by @DefTruth in #285
misc: add new zhihu tech blog link by @DefTruth in #287
misc: add debug flag for ncu profile by @DefTruth in #288
bugfix: corrected theta calculation in RoPE CUDA kernel by @jiaau in #290
docs: Add my ring-attention zhihu blog by @DefTruth in #291
Add simple CuTe mat-transpose implementations by @botbw in #292
Update README.md by @DefTruth in #296
Update README.md by @DefTruth in #297
Update README.md by @DefTruth in #298
Rename to LeetCUDA by @DefTruth in #299

New Contributors

@jiaau made their first contribution in #290
@botbw made their first contribution in #292

Full Changelog: v3.0.5...v3.0.6

Contributors

DefTruth, botbw, and jiaau

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

DefTruth, zobinHuang, kechengcode, and MiRaCLeXeoN reacted with hooray emoji

All reactions

🎉 4 reactions

4 people reacted

v3.0.5

09 Apr 15:15

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v3.0.5

What's Changed

[Misc] Automated submodule update by @DefTruth in #261
Update README.md by @tpoisonooo in #264
Update README.md by @DefTruth in #265
bugfix: only export per token softmax kernels by @DefTruth in #266
misc: update vllm latest slides by @DefTruth in #267
feat: add triton vector_add kernel by @DefTruth in #268
feat: add triton merge_attn_states kernel by @DefTruth in #269
feat: add cuda merge_attn_states kernel by @DefTruth in #270
feat: update cuda merge_attn_states kernel by @DefTruth in #271
misc: dispatch CUDA merge_attn_states by @DefTruth in #273
misc: add triton kernel index by @DefTruth in #274
Fix mistake on mat trans 2d when init grid. by @bear-zd in #275
misc: update cuda merge_attn_states kernel by @DefTruth in #276
kernel: optimize merge_attn_states CUDA kernel dispatch by @DefTruth in #278
feat: optimize merge_attn_states thread block dispatch by @DefTruth in #279

New Contributors

@tpoisonooo made their first contribution in #264

Full Changelog: v3.0.4...v3.0.5

Contributors

tpoisonooo, DefTruth, and bear-zd

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

johnnynunez, DefTruth, ItsAbdula, hebangwen, and haochengxi reacted with rocket emoji

All reactions

🚀 5 reactions

5 people reacted

v3.0.4

15 Mar 03:14

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v3.0.4

What's Changed

[Docs] Add vLLM + DeepSeek-R1 671B deploy blog by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/259

Full Changelog: DefTruth/CUDA-Learn-Notes@v3.0.3...v3.0.4

Contributors

DefTruth

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

v3.0.3

04 Mar 04:14

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v3.0.3

What's Changed

[Misc] Automated submodule update by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/257

Full Changelog: DefTruth/CUDA-Learn-Notes@v3.0.2...v3.0.3

Contributors

DefTruth

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

v3.0.2

24 Feb 01:30

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v3.0.2

What's Changed

Fix typo in block_all_reduce.cu by @wplf in https://github.com/DefTruth/CUDA-Learn-Notes/pull/247
fix typo about enougth by @wplf in https://github.com/DefTruth/CUDA-Learn-Notes/pull/248
[FFPA] Add FFPA tech zhihu blog by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/252
[FFPA] Update FFPA(Split-D) blog title by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/253
[Misc] Automated submodule update by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/254

New Contributors

@wplf made their first contribution in https://github.com/DefTruth/CUDA-Learn-Notes/pull/247

Full Changelog: DefTruth/CUDA-Learn-Notes@v3.0.1...v3.0.2

Contributors

DefTruth and wplf

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

Previous 1 2 3 4 5 6 Next

Previous Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.