NVIDIA / TensorRT-LLM Public

Notifications You must be signed in to change notification settings
Fork 1.5k
Star 10.7k

Code
Issues 608
Pull requests 267
Discussions
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: NVIDIA/TensorRT-LLM

Labels 43 Milestones 1

New pull request New

267 Open 2,065 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[feat]: improve performance of XQA-MLA for sm120

#5087 opened Jun 10, 2025 by lowsfer

Loading…

[feat] Support torch compile for attention dp

#5086 opened Jun 10, 2025 by liji-nv

Loading…

test: strictly constraint disaggregated serving llama4 to H200

#5085 opened Jun 10, 2025 by StanleySun639

Loading…

[fix] Fix test_attention_mla

#5084 opened Jun 10, 2025 by jinyangyuan-nvidia

Loading…

test: add more cases for rtx_pro_6000_se and add option kv_cache_dtype in perf test

#5083 opened Jun 10, 2025 by ruodil

Loading…

chore: Mass integration of release/0.20

#5082 opened Jun 10, 2025 by amirkl94

Loading…

ci: waive test [NVBUGS/5301492]

#5081 opened Jun 10, 2025 by stnie

Loading…

Fix LoRA broadcast issue in pytorch flow

#5080 opened Jun 10, 2025 by amitz-nv • Draft

refactor: improve code readability

#5079 opened Jun 10, 2025 by Shixiaowei02

Loading…

fix: fix cuda graph max batch size for spec decoding cases.

#5076 opened Jun 10, 2025 by lfr-0531

Loading…

infra[TRTLLM-5635] remove package stage in CI build

#5075 opened Jun 10, 2025 by niukuo

Loading…

[fix]: Fall back to HMAC to Avoid IPC Serialization Churn

#5074 opened Jun 10, 2025 by yibinl-nvidia • Draft

[TRTLLM-5786][test] Add llama3.2-3b test case

#5073 opened Jun 10, 2025 by crazydemo

Loading…

fix: remove duplicate trust_remote_code from serve command

#5072 opened Jun 10, 2025 by yechank-nvidia

Loading…

Draft: test: skip disaggregated tests on arm

#5070 opened Jun 10, 2025 by xinhe-nv • Draft

chore: rename IOFormatter to BaseCacheFormatter

#5068 opened Jun 10, 2025 by zhengd-nv

Loading…

test(perf): Add remaining Llama-Nemotron perftests (nano, super, ultra) + extras ✨

#5066 opened Jun 10, 2025 by venkywonka

Loading…

4 tasks done

[https://nvbugspro.nvidia.com/bug/5332927][fix] Fix the bug in the routing unit test

#5065 opened Jun 10, 2025 by ChristinaZ

Loading…

Draft: test: [CI] remove closed bugs

#5063 opened Jun 10, 2025 by xinhe-nv • Draft

Fix:Add decodingcondfig into serialization

#5062 opened Jun 10, 2025 by nv-guomingz • Draft

chore: build the wheel using NIXL as the default

#5061 opened Jun 10, 2025 by Shixiaowei02

Loading…

bugfix [AutoDeploy]: Correct usage of pytorch_config in autodeploy integration of trtllm-bench

#5059 opened Jun 10, 2025 by suyoggupta

Loading…

[https://nvbugs/5277592][fix] fix cuda graph padding for spec decoding (only for 0.20)

#5058 opened Jun 10, 2025 by lfr-0531

Loading…

test(perf): Add Llama-3_1-Nemotron-Ultra-253B-v1 perf tests (pyt, fp8)

#5057 opened Jun 10, 2025 by venkywonka • Draft

None: fix OOM because of unnecessary mha workspace

#5056 opened Jun 10, 2025 by ttyio

Loading…

Previous 1 2 3 4 5 … 10 11 Next

Previous Next

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!