Skip to content

build: DIS-148 use the tensorrt_llm public wheel from pypi by default in container build #1525

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 16, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 24 additions & 2 deletions container/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -88,14 +88,15 @@ TENSORRTLLM_PIP_WHEEL_DIR="/tmp/trtllm_wheel/"
# TensorRT-LLM commit to use for building the trtllm wheel if not provided.
# Important Note: This commit is not used in our CI pipeline. See the CI
# variables to learn how to run a pipeline with a specific commit.
TRTLLM_COMMIT="137fe35539ea182f1495f5021bfda97c729e50c3"
DEFAULT_EXPERIMENTAL_TRTLLM_COMMIT="137fe35539ea182f1495f5021bfda97c729e50c3"
TRTLLM_COMMIT=""

# TensorRT-LLM PyPI index URL
TENSORRTLLM_INDEX_URL="https://pypi.python.org/simple"
DEFAULT_TENSORRTLLM_PIP_WHEEL="tensorrt-llm==0.21.0rc0"
TENSORRTLLM_PIP_WHEEL=""



VLLM_BASE_IMAGE="nvcr.io/nvidia/cuda-dl-base"
# FIXME: NCCL will hang with 25.03, so use 25.01 for now
# Please check https://github.com/ai-dynamo/dynamo/pull/1065
Expand Down Expand Up @@ -155,6 +156,13 @@ get_options() {
missing_requirement "$1"
fi
;;
--use-default-experimental-tensorrtllm-commit)
if [ -n "$2" ] && [[ "$2" != --* ]]; then
echo "ERROR: --use-default-experimental-tensorrtllm-commit does not take any argument"
exit 1
fi
USE_DEFAULT_EXPERIMENTAL_TRTLLM_COMMIT=true
;;
--tensorrtllm-pip-wheel)
if [ "$2" ]; then
TENSORRTLLM_PIP_WHEEL=$2
Expand Down Expand Up @@ -341,6 +349,7 @@ show_help() {
echo " [--framework framework one of ${!FRAMEWORKS[*]}]"
echo " [--tensorrtllm-pip-wheel-dir path to tensorrtllm pip wheel directory]"
echo " [--tensorrtllm-commit tensorrtllm commit to use for building the trtllm wheel if the wheel is not provided]"
echo " [--use-default-experimental-tensorrtllm-commit] Use the default experimental commit (${DEFAULT_EXPERIMENTAL_TRTLLM_COMMIT}) to build TensorRT-LLM. This is a flag (no argument). Do not combine with --tensorrtllm-commit or --tensorrtllm-pip-wheel."
echo " [--tensorrtllm-pip-wheel tensorrtllm pip wheel on artifactory]"
echo " [--tensorrtllm-index-url tensorrtllm PyPI index URL if providing the wheel from artifactory]"
echo " [--build-arg additional build args to pass to docker build]"
Expand Down Expand Up @@ -470,6 +479,19 @@ check_wheel_file() {
}

if [[ $FRAMEWORK == "TENSORRTLLM" ]]; then
if [ "$USE_DEFAULT_EXPERIMENTAL_TRTLLM_COMMIT" = true ]; then
if [ -n "$TRTLLM_COMMIT" ] || [ -n "$TENSORRTLLM_PIP_WHEEL" ]; then
echo "ERROR: When using --use-default-experimental-trtllm-commit, do not set --tensorrtllm-commit or --tensorrtllm-pip-wheel."
exit 1
fi
TRTLLM_COMMIT="$DEFAULT_EXPERIMENTAL_TRTLLM_COMMIT"
fi

# If user didn't set both wheel and commit, use default tensorrt_llm pip wheel
if [ -z "$TENSORRTLLM_PIP_WHEEL" ] && [ -z "$TRTLLM_COMMIT" ]; then
TENSORRTLLM_PIP_WHEEL="$DEFAULT_TENSORRTLLM_PIP_WHEEL"
fi

if [ -z "${TENSORRTLLM_PIP_WHEEL}" ]; then
# Use option 1
if [ ! -d "${TENSORRTLLM_PIP_WHEEL_DIR}" ]; then
Expand Down
12 changes: 12 additions & 0 deletions examples/tensorrt_llm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,11 @@ apt-get update && apt-get -y install git git-lfs

# On an ARM machine:
./container/build.sh --framework tensorrtllm --platform linux/arm64

# Build the container with the default experimental TensorRT-LLM commit
# WARNING: This is for experimental feature testing only.
# The container should not be used in a production environment.
./container/build.sh --framework tensorrtllm --use-default-experimental-tensorrtllm-commit
```

> [!NOTE]
Expand Down Expand Up @@ -136,6 +141,10 @@ dynamo serve graphs.agg:Frontend -f configs/deepseek_r1/mtp/mtp_agg.yaml
```

Notes:
- MTP is only available within the container built with the experimental TensorRT-LLM commit. Please add --use-default-experimental-tensorrtllm-commit to the arguments of the build.sh script.

Example: `./container/build.sh --framework tensorrtllm --use-default-experimental-tensorrtllm-commit`

- There is a noticeable latency for the first two inference requests. Please send warm-up requests before starting the benchmark.
- MTP performance may vary depending on the acceptance rate of predicted tokens, which is dependent on the dataset or queries used while benchmarking. Additionally, `ignore_eos` should generally be omitted or set to `false` when using MTP to avoid speculating garbage outputs and getting unrealistic acceptance rates.

Expand Down Expand Up @@ -275,6 +284,9 @@ dynamo serve components.prefill_worker:TensorRTLLMPrefillWorker -f configs/deeps
```

Notes:
- MTP is only available within the container built with the experimental TensorRT-LLM commit. Please add --use-default-experimental-tensorrtllm-commit to the arguments of the build.sh script.

Example: `./container/build.sh --framework tensorrtllm --use-default-experimental-tensorrtllm-commit`
- There is a noticeable latency for the first two inference requests. Please send warm-up requests before starting the benchmark.
- MTP performance may vary depending on the acceptance rate of predicted tokens, which is dependent on the dataset or queries used while benchmarking. Additionally, `ignore_eos` should generally be omitted or set to `false` when using MTP to avoid speculating garbage outputs and getting unrealistic acceptance rates.

Expand Down
Loading