Skip to content

No labels!

There aren’t any labels for this repository quite yet.

bug
bug
Something isn't working
CI
CI
automated tests, build checks, github actions, system stability & efficiency.
Community Engagement
Community Engagement
help/insights needed from community
Community want to contribute
Community want to contribute
PRs initiated from Community
Customized Kernels
Customized Kernels
Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.
dependencies
dependencies
Pull requests that update a dependency file
Disaggregated Serving
Disaggregated Serving
Deploying TRTLLM with separated, distributed components (params, kv-cache, compute). Arch & perf.
Documentation
Documentation
TRTLLM's textual/illustrative materials: API refs, guides, tutorials. Improvement & clarity.
duplicate
duplicate
This issue or pull request already exists
Ease of Use
Ease of Use
Items about improving or complaints about TRTLLM ease of use
feature request
feature request
New feature or request. This includes new model, dtype, functionality support
functionality issue
functionality issue
Generic Runtime
Generic Runtime
General operational aspects of TRTLLM execution not in other categories.
help wanted
help wanted
Extra attention is needed
Installation
Installation
Setting up and building TRTLLM: compilation, pip install, dependencies, env config, CMake.
Investigating
Investigating
KV-Cache Management
KV-Cache Management
kv-cache management for efficient LLM inference
LLM API/Workflow
LLM API/Workflow
High-level LLM Python API & tools (e.g., trtllm-llmapi-launch) for TRTLLM inference/workflows.
Lora/P-tuning
Lora/P-tuning
Parameter-Efficient Fine-Tuning (PEFT) like LoRA/P-tuning in TRTLLM: adapter use & perf.
Low Precision
Low Precision
Lower-precision formats (INT8/INT4/FP8) for TRTLLM quantization (AWQ, GPTQ).
Memory
Memory
Memory utilization in TRTLLM: leak/OOM handling, footprint optimization, memory profiling.
Merged
Merged
need more info
need more info
Further info is required from the requester for devs to help
new model
new model
Request to add a new model
not a bug
not a bug
Some known limitation, but not a bug.
OpenAI API
OpenAI API
trtllm-serve's OpenAI-compatible API: endpoint behavior, req/resp formats, feature parity.
Performance Config Help
Performance Config Help
Performance
Performance
TRTLLM model inference speed, throughput, efficiency. Latency, benchmarks, regressions, opts.