Labels · NVIDIA/TensorRT-LLM

AutoDeploy

55

bug

Something isn't working

201

CI

automated tests, build checks, github actions, system stability & efficiency.

Community Engagement

help/insights needed from community

24

Community want to contribute

PRs initiated from Community

67

Customized Kernels

Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.

2

dependencies

Pull requests that update a dependency file

Disaggregated Serving

Deploying TRTLLM with separated, distributed components (params, kv-cache, compute). Arch & perf.

3

Documentation

TRTLLM's textual/illustrative materials: API refs, guides, tutorials. Improvement & clarity.

8

duplicate

This issue or pull request already exists

1

Ease of Use

Items about improving or complaints about TRTLLM ease of use

1

feature request

New feature or request. This includes new model, dtype, functionality support

50

functionality issue

7

Generic Runtime

General operational aspects of TRTLLM execution not in other categories.

41

help wanted

Extra attention is needed

Installation

Setting up and building TRTLLM: compilation, pip install, dependencies, env config, CMake.

14

Investigating

93

KV-Cache Management

kv-cache management for efficient LLM inference

9

LLM API/Workflow

High-level LLM Python API & tools (e.g., trtllm-llmapi-launch) for TRTLLM inference/workflows.

13

Lora/P-tuning

Parameter-Efficient Fine-Tuning (PEFT) like LoRA/P-tuning in TRTLLM: adapter use & perf.

2

Low Precision

Lower-precision formats (INT8/INT4/FP8) for TRTLLM quantization (AWQ, GPTQ).

28

Memory

Memory utilization in TRTLLM: leak/OOM handling, footprint optimization, memory profiling.

2

Merged

need more info

Further info is required from the requester for devs to help

3

new model

Request to add a new model

9

not a bug

Some known limitation, but not a bug.

23

OpenAI API

trtllm-serve's OpenAI-compatible API: endpoint behavior, req/resp formats, feature parity.

5

others

4

Performance Config Help

Performance

TRTLLM model inference speed, throughput, efficiency. Latency, benchmarks, regressions, opts.

32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

No labels!

43 labels