-
Notifications
You must be signed in to change notification settings - Fork 1.5k
No labels!
There aren’t any labels for this repository quite yet.
43 labels
Customized Kernels
Customized Kernels
Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.
Disaggregated Serving
Disaggregated Serving
Deploying TRTLLM with separated, distributed components (params, kv-cache, compute). Arch & perf.
Documentation
Documentation
TRTLLM's textual/illustrative materials: API refs, guides, tutorials. Improvement & clarity.
feature request
feature request
New feature or request. This includes new model, dtype, functionality support
Generic Runtime
Generic Runtime
General operational aspects of TRTLLM execution not in other categories.
Installation
Installation
Setting up and building TRTLLM: compilation, pip install, dependencies, env config, CMake.
LLM API/Workflow
LLM API/Workflow
High-level LLM Python API & tools (e.g., trtllm-llmapi-launch) for TRTLLM inference/workflows.
Lora/P-tuning
Lora/P-tuning
Parameter-Efficient Fine-Tuning (PEFT) like LoRA/P-tuning in TRTLLM: adapter use & perf.
Low Precision
Low Precision
Lower-precision formats (INT8/INT4/FP8) for TRTLLM quantization (AWQ, GPTQ).
Memory
Memory
Memory utilization in TRTLLM: leak/OOM handling, footprint optimization, memory profiling.
OpenAI API
OpenAI API
trtllm-serve's OpenAI-compatible API: endpoint behavior, req/resp formats, feature parity.
Performance
Performance
TRTLLM model inference speed, throughput, efficiency. Latency, benchmarks, regressions, opts.