feat: Expose the task flag from dynamo-run cli for trtllm #1529

tanmayv25 · 2025-06-14T00:13:31Z

Overview:

Allow passing task flag from dynamo-run cli.
Tested the instructions locally.
Follow-up work from: #1355

Details:

There is planned future work for describing the workflow via --next argument and self-discovery for remote prefill worker.
Currently, I have described the limitation in the instructions.

Graham had ideas to express component graph in dynamo run which is somewhat more work. We are having more conversations around how to better synchronize dynamo-run and dynamo-serve here.

Summary by CodeRabbit

New Features
- Added support for specifying a task mode (prefill_and_decode, decode, or prefill) when running the TensorRT-LLM engine, enabling more flexible serving configurations.
Documentation
- Expanded instructions for using the TensorRT-LLM engine in multi-GPU, multinode, and disaggregated serving setups, including detailed examples and important usage notes.
Configuration
- Introduced a new disable_overlap_scheduler option in the sample configuration file for advanced engine tuning.

coderabbitai · 2025-06-14T00:26:07Z

Walkthrough

This update introduces a new --task CLI flag for the dynamo-run tool when using the TensorRT-LLM (TRTLLM) engine, enabling disaggregated serving modes ("prefill", "decode", or "prefill_and_decode"). Documentation is expanded to cover multi-GPU, multinode, and disaggregated serving configurations. A configuration YAML disables the overlap scheduler.

Changes

File(s)	Change Summary
docs/guides/dynamo_run.md	Expanded documentation for TRTLLM engine: added multi-GPU, multinode, and disaggregated serving instructions.
launch/dynamo-run/src/flags.rs	Added optional `task` field to `Flags` struct for TRTLLM, enabling task selection via CLI.
launch/dynamo-run/src/main.rs	Updated usage string to document new `--task` argument and its accepted values.
launch/dynamo-run/src/subprocess.rs	Appends `--task` and its value to TRTLLM subprocess arguments if specified in flags.
launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml	Added `disable_overlap_scheduler: true` to TRTLLM config YAML.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI
    participant Subprocess
    participant TRTLLM Engine

    User->>CLI: Run dynamo-run with --task argument
    CLI->>Subprocess: Build command-line args (includes --task if set)
    Subprocess->>TRTLLM Engine: Launch with specified task (prefill/decode/prefill_and_decode)
    TRTLLM Engine-->>Subprocess: Executes task-specific logic
    Subprocess-->>CLI: Report result
    CLI-->>User: Output / status

Possibly related PRs

feat: Enable disagg support in trtllm standalone script #1355: Adds CLI flag support and documentation for disaggregated serving mode in dynamo-run, complementing this PR's integration of disaggregation logic for TRTLLM.
docs: Add multi-node TRTLLM worker example (Deepseek R1) #1511: Provides multi-node example documentation and scripts for TRTLLM, related to this PR's expanded multi-node and disaggregated usage instructions.

Poem

In the warren of code, a new flag appears,
"Task" lets the engines split duties with cheers!
Prefill or decode, now each has its say,
On multi-GPU fields, the rabbits will play.
With docs freshly sprouted and configs anew,
Disaggregated dreams are coming true!
🐇✨

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml (1)

23-23: Setting looks good but consider documenting the rationale inline

disable_overlap_scheduler: true is reasonable for disaggregated runs but it is not obvious why it should always be disabled.
A short inline YAML comment (one-liner) can help future readers understand why overlap scheduling hurts prefill/decoder separation.

docs/guides/dynamo_run.md (1)

458-486: Minor Markdown polish

Line 462 – missing comma after “to use”, and “the” before “environment variable”.

Code block at 471 should declare its language (e.g. ```bash) for correct
syntax highlighting.

The bolded Multi-GPU/etc headers (MD036) could be converted to level-3
headings for consistency with the rest of the guide.

Pure documentation, but quick fixes improve readability and linter hygiene.

🧰 Tools

🪛 LanguageTool

[uncategorized] ~462-~462: A comma might be missing here.
Context: ...dynamo-run. To specify which GPUs to use set environment variable `CUDA_VISIBLE_...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)

[uncategorized] ~462-~462: You might be missing the article “the” here.
Context: ...run. To specify which GPUs to use set environment variable CUDA_VISIBLE_DEVICES`. **Mul...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

[uncategorized] ~469-~469: You might be missing the article “a” here.
Context: ...tllm engine can be configured to run in disaggregated way using --task. ``` # Ingress dyna...

(AI_EN_LECTOR_MISSING_DETERMINER_A)

🪛 markdownlint-cli2 (0.17.2)

458-458: Emphasis used instead of a heading
null

(MD036, no-emphasis-as-heading)

471-471: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 219cfa1 and 5bf1ff7.

📒 Files selected for processing (5)

docs/guides/dynamo_run.md (1 hunks)
launch/dynamo-run/src/flags.rs (1 hunks)
launch/dynamo-run/src/main.rs (1 hunks)
launch/dynamo-run/src/subprocess.rs (1 hunks)
launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml (1 hunks)

🧰 Additional context used

🪛 LanguageTool

docs/guides/dynamo_run.md

[uncategorized] ~462-~462: A comma might be missing here.
Context: ...dynamo-run. To specify which GPUs to use set environment variable `CUDA_VISIBLE_...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)

[uncategorized] ~462-~462: You might be missing the article “the” here.
Context: ...run. To specify which GPUs to use set environment variable CUDA_VISIBLE_DEVICES`. **Mul...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

[uncategorized] ~469-~469: You might be missing the article “a” here.
Context: ...tllm engine can be configured to run in disaggregated way using --task. ``` # Ingress dyna...

(AI_EN_LECTOR_MISSING_DETERMINER_A)

🪛 markdownlint-cli2 (0.17.2)

docs/guides/dynamo_run.md

458-458: Emphasis used instead of a heading
null

(MD036, no-emphasis-as-heading)

471-471: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms (5)

GitHub Check: Mirror Repository to GitLab
GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: pre-merge-rust (.)
GitHub Check: Build and Test - vllm

coderabbitai · 2025-06-14T00:26:10Z

launch/dynamo-run/src/main.rs

@@ -35,7 +35,7 @@ Example:
 - OR: ./dynamo-run /data/models/Llama-3.2-1B-Instruct-Q4_K_M.gguf
 "#;

-const USAGE: &str = "USAGE: dynamo-run in=[http|text|dyn://<path>|batch:<folder>] out=ENGINE_LIST|dyn [--http-port 8080] [--model-path <path>] [--model-name <served-model-name>] [--model-config <hf-repo>] [--tensor-parallel-size=1] [--context-length=N] [--kv-cache-block-size=16] [--num-nodes=1] [--node-rank=0] [--leader-addr=127.0.0.1:9876] [--base-gpu-id=0] [--extra-engine-args=args.json] [--router-mode random|round-robin|kv] [--kv-overlap-score-weight=2.0] [--kv-gpu-cache-usage-weight=1.0] [--kv-waiting-requests-weight=1.0] [--verbosity (-v|-vv)]";
+const USAGE: &str = "USAGE: dynamo-run in=[http|text|dyn://<path>|batch:<folder>] out=ENGINE_LIST|dyn [--http-port 8080] [--model-path <path>] [--model-name <served-model-name>] [--model-config <hf-repo>] [--task prefill_and_decode|decode|prefill] [--tensor-parallel-size=1] [--context-length=N] [--kv-cache-block-size=16] [--num-nodes=1] [--node-rank=0] [--leader-addr=127.0.0.1:9876] [--base-gpu-id=0] [--extra-engine-args=args.json] [--router-mode random|round-robin|kv] [--kv-overlap-score-weight=2.0] [--kv-gpu-cache-usage-weight=1.0] [--kv-waiting-requests-weight=1.0] [--verbosity (-v|-vv)]";


🛠️ Refactor suggestion

Manual USAGE string is drifting again – delegate to clap

Every time a flag is added (--task in this PR) we must remember to update the hard-coded USAGE string. We’ve already missed this at least once in the past.

Instead, rely on Flags::command().render_usage() (or clap_complete) so the help/usage text is always in sync with the parser:

-const USAGE: &str = "USAGE: dynamo-run ..."; +lazy_static::lazy_static! { + static ref USAGE: String = { + use clap::CommandFactory; + dynamo_run::Flags::command().render_usage().to_string() + }; +}

This removes manual duplication and prevents future inconsistencies.

🤖 Prompt for AI Agents

In launch/dynamo-run/src/main.rs at line 38, the USAGE string is manually maintained and can easily become outdated when flags change. Remove the hard-coded USAGE string and instead generate the usage text dynamically by calling Flags::command().render_usage() from the clap crate. This ensures the usage information always matches the current command-line argument definitions and eliminates manual duplication.

coderabbitai · 2025-06-14T00:26:10Z

launch/dynamo-run/src/subprocess.rs

+    //
    // The worker node will only publish events and metrics if the router mode is KV
    if flags.router_mode == RouterMode::KV {
        args.push("--publish-events-and-metrics".to_string());
    }
+    // Specific task for TRTLLM engine
+    if flags.task.is_some() {
+        args.push("--task".to_string());
+        args.push(flags.task.clone().unwrap());
+    }



💡 Verification agent

🧩 Analysis chain

Safer & engine-specific propagation of --task

unwrap() is unnecessary – use pattern-matching and avoid the extra clone:

-if flags.task.is_some() { - args.push("--task".to_string()); - args.push(flags.task.clone().unwrap()); -} +if let Some(task) = &flags.task { + args.push("--task".to_string()); + args.push(task.to_string()); +}

--task is silently forwarded even when the selected engine is not TRTLLM.
The other Python shims (sglang_inc.py, vllm_inc.py) will fail on an
unknown flag.

Guard the push inside the TRTLLM code path (or assert in Flags that
--task may only be supplied with out=trtllm).

🏁 Script executed:

#!/bin/bash set -euxo pipefail # Locate the definition of Flags to inspect its fields rg -n "struct Flags" -C5 # Inspect any 'task' field within Flags rg -n "task" -C2 # Check for any references to TRTLLM in the codebase rg -n "trtllm" -C2 # Look for how the engine/output flag is defined (e.g., 'out') rg -n "out" -C2

Length of output: 70880

🏁 Script executed:

#!/bin/bash set -euxo pipefail # Display the Flags struct definition to inspect its fields (incl. task, router_mode, out, engine, etc.) rg -n "pub struct Flags" -A 30 launch/dynamo-run/src/flags.rs # Locate any line where `task` is defined in Flags rg -n "task" -C2 launch/dynamo-run/src/flags.rs # Locate router_mode definition in Flags rg -n "router_mode" -C2 launch/dynamo-run/src/flags.rs # Show the first 100 lines of subprocess.rs to confirm usage context sed -n '1,100p' launch/dynamo-run/src/subprocess.rs

Length of output: 5594

🏁 Script executed:

#!/bin/bash set -euxo pipefail # Show the beginning of the TRTLLM launcher to see how start() is invoked sed -n '1,200p' launch/dynamo-run/src/trtllm.rs # Show the beginning of the sglang launcher sed -n '1,200p' launch/dynamo-run/src/sglang.rs # Show the beginning of the vllm launcher sed -n '1,200p' launch/dynamo-run/src/vllm.rs

Length of output: 143

🏁 Script executed:

#!/bin/bash set -euxo pipefail # Show full Flags struct, including any engine/subcommand fields rg -n "pub struct Flags" -A200 launch/dynamo-run/src/flags.rs # List top-level Rust files in launch/dynamo-run/src find launch/dynamo-run/src -maxdepth 1 -type f -name "*.rs" # List Rust files in launch/dynamo-run/src/subprocess directory (engine modules) find launch/dynamo-run/src/subprocess -type f -name "*.rs"

Length of output: 9111

🏁 Script executed:

#!/bin/bash set -euxo pipefail # Show main.rs to understand engine subcommands and how Flags is used sed -n '1,200p' launch/dynamo-run/src/main.rs

Length of output: 4635

🏁 Script executed:

#!/bin/bash set -euxo pipefail # Show script constant and start function in TRTLLM module sed -n '1,50p' launch/dynamo-run/src/subprocess/trtllm.rs # Show script constant and start function in SGLANG module sed -n '1,50p' launch/dynamo-run/src/subprocess/sglang.rs # Show script constant and start function in VLLM module sed -n '1,50p' launch/dynamo-run/src/subprocess/vllm.rs

Length of output: 2567

🏁 Script executed:

#!/bin/bash set -euxo pipefail # List Python include scripts find launch/dynamo-run/src/subprocess -maxdepth 1 -type f -name "*.py" # Inspect TRTLLM Python shim for --task support grep -R "add_argument.*task" -n launch/dynamo-run/src/subprocess/trtllm_inc.py || true sed -n '1,200p' launch/dynamo-run/src/subprocess/trtllm_inc.py # Inspect SGLANG Python shim for --task handling grep -R "add_argument.*task" -n launch/dynamo-run/src/subprocess/sglang_inc.py || true sed -n '1,200p' launch/dynamo-run/src/subprocess/sglang_inc.py # Inspect VLLM Python shim for --task handling grep -R "add_argument.*task" -n launch/dynamo-run/src/subprocess/vllm_inc.py || true sed -n '1,200p' launch/dynamo-run/src/subprocess/vllm_inc.py

Length of output: 20737

Restrict and simplify --task propagation

Use pattern‐matching to eliminate the panic and extra clone:

--- a/launch/dynamo-run/src/subprocess.rs +++ b/launch/dynamo-run/src/subprocess.rs // TRTLLM only -// Specific task for TRTLLM engine -if flags.task.is_some() { - args.push("--task".to_string()); - args.push(flags.task.clone().unwrap()); -} +// Specific task for TRTLLM engine +use crate::subprocess::trtllm; + +if py_script == trtllm::PY { + if let Some(task) = &flags.task { + args.push("--task".to_string()); + args.push(task.clone()); + } +} else if flags.task.is_some() { + anyhow::bail!("`--task` is only supported with the TRTLLM engine"); +}

This removes unwrap(), clones only once, and avoids forwarding --task to SGLANG/VLLM (whose Python shims don’t accept it).

Locations to update:

launch/dynamo-run/src/subprocess.rs (around the // Specific task for TRTLLM engine block)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

//

// The worker node will only publish events and metrics if the router mode is KV

if flags.router_mode == RouterMode::KV {

args.push("--publish-events-and-metrics".to_string());

}

// Specific task for TRTLLM engine

if flags.task.is_some() {

args.push("--task".to_string());

args.push(flags.task.clone().unwrap());

}

//

// The worker node will only publish events and metrics if the router mode is KV

if flags.router_mode == RouterMode::KV {

args.push("--publish-events-and-metrics".to_string());

}

// Specific task for TRTLLM engine

use crate::subprocess::trtllm;

if py_script == trtllm::PY {

if let Some(task) = &flags.task {

args.push("--task".to_string());

args.push(task.clone());

}

} else if flags.task.is_some() {

anyhow::bail!("`--task` is only supported with the TRTLLM engine");

}

🤖 Prompt for AI Agents

In launch/dynamo-run/src/subprocess.rs around lines 56 to 66, the code currently uses unwrap() and clone() to forward the --task flag unconditionally, which can cause panics and forwards the flag to engines that do not support it. Refactor this by using pattern matching on flags.task to safely extract the value without unwrap or extra clone, and only push the --task argument and its value if the selected engine is TRTLLM. This ensures safer handling and prevents passing unsupported flags to other engines.

coderabbitai · 2025-06-14T00:26:10Z

launch/dynamo-run/src/flags.rs

+    /// TRTLLM only
+    ///
+    /// The task to perform with the model. Can be used to execute engines in disaggregated mode.
+    ///
+    /// Supported tasks:
+    /// - prefill_and_decode: Prefill and decode (default)
+    /// - decode: Decode
+    /// - prefill: Prefill
+    #[arg(long)]
+    pub task: Option<String>,
+


🛠️ Refactor suggestion

Validate --task at parse-time with a ValueEnum

Accepting an arbitrary String allows typos that only surface deep inside
TensorRT-LLM. Convert it to a typed enum so clap handles validation and
auto-documents the choices:

use clap::ValueEnum @@ -#[arg(long)] -pub task: Option<String>, +#[derive(ValueEnum, Clone, Debug, PartialEq, Eq)] +pub enum TrtllmTask { + #[value(name = "prefill_and_decode")] + PrefillAndDecode, + Decode, + Prefill, +} + +#[arg(long, value_enum)] +pub task: Option<TrtllmTask>,

Remember to update call-sites (subprocess.rs, Flags::as_vec) to convert
TrtllmTask to its lowercase string with task.to_string().

This prevents invalid values and makes --help self-documenting.

🤖 Prompt for AI Agents

In launch/dynamo-run/src/flags.rs around lines 49 to 59, the task field is currently an Option<String>, which allows invalid values and lacks validation. Change the task field to use a custom enum implementing clap's ValueEnum trait to enforce valid choices at parse-time. Update all call-sites such as subprocess.rs and Flags::as_vec to convert the enum variant to a lowercase string using task.to_string() to maintain compatibility. This will ensure only supported tasks are accepted and improve help documentation.

coderabbitai · 2025-06-14T00:31:30Z

Walkthrough

This update introduces a new optional --task flag to the dynamo-run CLI, specifically for use with the TensorRT-LLM (trtllm) engine. Documentation is expanded to describe multi-GPU, multinode, and disaggregated serving configurations. The subprocess logic and configuration files are updated to support and document these new deployment patterns.

Changes

File(s)	Summary
docs/guides/dynamo_run.md	Extended documentation for trtllm engine: added multi-GPU, multinode, and disaggregated serving instructions.
launch/dynamo-run/src/flags.rs	Added optional `task` field to `Flags` struct for specifying trtllm engine tasks.
launch/dynamo-run/src/main.rs	Updated usage string to document new `--task` argument and its possible values.
launch/dynamo-run/src/subprocess.rs	Updated subprocess argument construction to include `--task` if specified.
launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml	Added `disable_overlap_scheduler: true` option to trtllm sample YAML config.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI (dynamo-run)
    participant Subprocess (trtllm engine)

    User->>CLI (dynamo-run): Run with --task <value>
    CLI (dynamo-run)->>Subprocess (trtllm engine): Launch subprocess with --task <value>
    Subprocess (trtllm engine)->>Subprocess (trtllm engine): Execute specified task (prefill, decode, etc.)

Possibly related PRs

feat: Publish events and metrics when using kv routing #1262: Both PRs modify subprocess argument handling in subprocess.rs, though for different flags (--task vs. --publish-events-and-metrics).
feat: Enable disagg support in trtllm standalone script #1355: Closely related, as it implements the disaggregated serving logic in the trtllm script, which this PR enables and documents.

Poem

A flag for every trtllm task,
Now dynamo-run wears a brand new mask!
Docs for GPUs, nodes, and more,
Disaggregated serving at your core.
With YAML tweaks and subprocess flair,
The rabbits cheer this thorough care! 🐇

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

launch/dynamo-run/src/main.rs (1)

38-38: Keep top-level docs in sync with the new --task flag

USAGE now advertises --task …, but the first Usage block in docs/guides/dynamo_run.md (line 11) still lacks it. Users will see two conflicting sources of truth.

Add the flag to the docs or point to --help instead of duplicating the list.
launch/dynamo-run/src/subprocess.rs (1)
61-65: Prefer if let and avoid the unnecessary unwrap()

The current pattern clones then unwraps the Option. Slightly clearer/safer:
-if flags.task.is_some() {
-    args.push("--task".to_string());
-    args.push(flags.task.clone().unwrap());
-}
+if let Some(task) = &flags.task {
+    args.push("--task".to_string());
+    args.push(task.clone());
+}
No functional change, but eliminates the unwrap and the redundant is_some check.
docs/guides/dynamo_run.md (1)
458-486: Minor docs polish: punctuation, heading style, fenced-code language

Lint warnings:

“Multi-GPU” line should be a heading (### Multi-GPU) rather than bold text (MD036).

Missing commas/articles in sentences 462 & 469.

Specify a language for the fenced block starting line 471 (MD040).

Proposed patch:
-**Multi-GPU**
-
-Pass `--tensor-parallel-size <NUM-GPUS>` to `dynamo-run`.
+### Multi-GPU
+
+Pass `--tensor-parallel-size <NUM-GPUS>` to `dynamo-run`.

-To specify which GPUs to use set environment variable `CUDA_VISIBLE_DEVICES`.
+To specify which GPUs to use, set the environment variable `CUDA_VISIBLE_DEVICES`.

-**Disaggregated Serving:**
-The trtllm engine can be configured to run in disaggregated way using `--task`.
+### Disaggregated Serving
+The trtllm engine can be configured to run in a disaggregated way using `--task`.

 ```bash
 # Ingress
🧰 Tools

🪛 LanguageTool

[uncategorized] ~462-~462: A comma might be missing here.
Context: ...dynamo-run. To specify which GPUs to use set environment variable `CUDA_VISIBLE_...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)

[uncategorized] ~462-~462: You might be missing the article “the” here.
Context: ...run. To specify which GPUs to use set environment variable CUDA_VISIBLE_DEVICES`. **Mul...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

[uncategorized] ~469-~469: You might be missing the article “a” here.
Context: ...tllm engine can be configured to run in disaggregated way using --task. ``` # Ingress dyna...

(AI_EN_LECTOR_MISSING_DETERMINER_A)

🪛 markdownlint-cli2 (0.17.2)

458-458: Emphasis used instead of a heading
null

(MD036, no-emphasis-as-heading)

471-471: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 219cfa1 and 5bf1ff7.

📒 Files selected for processing (5)

docs/guides/dynamo_run.md (1 hunks)
launch/dynamo-run/src/flags.rs (1 hunks)
launch/dynamo-run/src/main.rs (1 hunks)
launch/dynamo-run/src/subprocess.rs (1 hunks)
launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml (1 hunks)

🧰 Additional context used

🪛 LanguageTool

docs/guides/dynamo_run.md

[uncategorized] ~462-~462: A comma might be missing here.
Context: ...dynamo-run. To specify which GPUs to use set environment variable `CUDA_VISIBLE_...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)

[uncategorized] ~462-~462: You might be missing the article “the” here.
Context: ...run. To specify which GPUs to use set environment variable CUDA_VISIBLE_DEVICES`. **Mul...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

[uncategorized] ~469-~469: You might be missing the article “a” here.
Context: ...tllm engine can be configured to run in disaggregated way using --task. ``` # Ingress dyna...

(AI_EN_LECTOR_MISSING_DETERMINER_A)

🪛 markdownlint-cli2 (0.17.2)

docs/guides/dynamo_run.md

458-458: Emphasis used instead of a heading
null

(MD036, no-emphasis-as-heading)

471-471: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: pre-merge-rust (.)
GitHub Check: Build and Test - vllm

🔇 Additional comments (1)

launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml (1)

23-25:
❓ Verification inconclusive

Confirm TRT-LLM really supports disable_overlap_scheduler

Nice to see the flag exposed, but the trtllm Python entry points we ship (trtllm_inc.py) do not appear to parse this key today. If the engine silently ignores unknown YAML keys, users may assume the scheduler is disabled when it is not.

Verify the current TensorRT-LLM release recognises this option.

If it is not yet upstream, guard it with a comment indicating the minimum version required.

I’ll check the codebase for any references to disable_overlap_scheduler or its handling in trtllm_inc.py.

🏁 Script executed:
#!/bin/bash
# 1. Find any occurrences of the flag in code
rg "disable_overlap_scheduler" -n

# 2. Find any overlap_scheduler logic
rg "overlap_scheduler" -n

# 3. Locate the Python entrypoint and inspect scheduler handling
rg --files-with-matches "trtllm_inc.py" -n | xargs -I{} sh -c 'echo "----- {} -----"; sed -n "1,200p" {}'
Length of output: 13212
Verify TRT-LLM support for disable_overlap_scheduler

I searched the code and didn’t find any parsing of this key in our Python entrypoint:

launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml:23 (defines disable_overlap_scheduler)

launch/dynamo-run/src/subprocess/trtllm_inc.py (no references to disable_overlap_scheduler or enable_overlap_scheduler)

Examples show the flag in YAML but only as comments or user-provided fields

Please confirm whether the TRT-LLM release you’re using actually recognizes disable_overlap_scheduler. If it doesn’t, guard this setting with a comment indicating the minimum TRT-LLM version required, for example:
# Requires TRT-LLM ≥ X.Y.Z to honor this flag
disable_overlap_scheduler: true

coderabbitai · 2025-06-14T00:31:33Z

launch/dynamo-run/src/flags.rs

+    /// TRTLLM only
+    ///
+    /// The task to perform with the model. Can be used to execute engines in disaggregated mode.
+    ///
+    /// Supported tasks:
+    /// - prefill_and_decode: Prefill and decode (default)
+    /// - decode: Decode
+    /// - prefill: Prefill
+    #[arg(long)]
+    pub task: Option<String>,


🛠️ Refactor suggestion

Use an enum instead of a raw String for task

Accepting an arbitrary string postpones validation until the Python side fails. Clap already supports ValueEnum; we can exhaustively validate at parse-time and get --help text for free.

-#[arg(long)] -pub task: Option<String>, +#[derive(ValueEnum, Clone, Debug)] +pub enum TrtTask { + #[value(name = "prefill_and_decode")] + PrefillAndDecode, + Decode, + Prefill, +} + +#[arg(long)] +pub task: Option<TrtTask>,

Downstream (subprocess.rs) you’d convert with task.to_string().

This removes a whole class of “unknown task” runtime errors.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In launch/dynamo-run/src/flags.rs around lines 49 to 58, replace the raw String type for the `task` field with a custom enum that derives `ValueEnum` from Clap. Define the enum with variants for each supported task ("prefill_and_decode", "decode", "prefill") and update the `task` field to use `Option<YourEnum>`. This enables compile-time validation of input values and automatic help text generation. Downstream code that uses `task` should convert the enum variant to a string with `to_string()` as needed.

Expose the task flag from dynamo-run cli for trtllm

5bf1ff7

tanmayv25 requested review from ryanolson, grahamking, paulhendricks, biswapanda, tmonty12, GuanLuo, rmccorm4, alec-flowers, kkranen, oandreeva-nv, jthomson04, PeaBrane and a team as code owners June 14, 2025 00:13

pull-request-size bot added the size/M label Jun 14, 2025

copy-pr-bot bot temporarily deployed to GITLAB June 14, 2025 00:13 Inactive

github-actions bot added the feat label Jun 14, 2025

copy-pr-bot bot temporarily deployed to GITLAB June 14, 2025 00:19 Inactive

coderabbitai bot reviewed Jun 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Expose the task flag from dynamo-run cli for trtllm #1529

feat: Expose the task flag from dynamo-run cli for trtllm #1529

Uh oh!

tanmayv25 commented Jun 14, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jun 14, 2025

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jun 14, 2025

Uh oh!

coderabbitai bot Jun 14, 2025

Uh oh!

coderabbitai bot Jun 14, 2025

Uh oh!

coderabbitai bot commented Jun 14, 2025

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jun 14, 2025

Uh oh!

Uh oh!

feat: Expose the task flag from dynamo-run cli for trtllm #1529

Are you sure you want to change the base?

feat: Expose the task flag from dynamo-run cli for trtllm #1529

Uh oh!

Conversation

tanmayv25 commented Jun 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jun 14, 2025

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Jun 14, 2025

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tanmayv25 commented Jun 14, 2025 •

edited by coderabbitai bot

Loading