Skip to content

feat: Expose the task flag from dynamo-run cli for trtllm #1529

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tanmayv25
Copy link
Contributor

@tanmayv25 tanmayv25 commented Jun 14, 2025

Overview:

Allow passing task flag from dynamo-run cli.
Tested the instructions locally.
Follow-up work from: #1355

Details:

There is planned future work for describing the workflow via --next argument and self-discovery for remote prefill worker.
Currently, I have described the limitation in the instructions.

Graham had ideas to express component graph in dynamo run which is somewhat more work. We are having more conversations around how to better synchronize dynamo-run and dynamo-serve here.

Summary by CodeRabbit

  • New Features
    • Added support for specifying a task mode (prefill_and_decode, decode, or prefill) when running the TensorRT-LLM engine, enabling more flexible serving configurations.
  • Documentation
    • Expanded instructions for using the TensorRT-LLM engine in multi-GPU, multinode, and disaggregated serving setups, including detailed examples and important usage notes.
  • Configuration
    • Introduced a new disable_overlap_scheduler option in the sample configuration file for advanced engine tuning.

Copy link
Contributor

coderabbitai bot commented Jun 14, 2025

Walkthrough

This update introduces a new --task CLI flag for the dynamo-run tool when using the TensorRT-LLM (TRTLLM) engine, enabling disaggregated serving modes ("prefill", "decode", or "prefill_and_decode"). Documentation is expanded to cover multi-GPU, multinode, and disaggregated serving configurations. A configuration YAML disables the overlap scheduler.

Changes

File(s) Change Summary
docs/guides/dynamo_run.md Expanded documentation for TRTLLM engine: added multi-GPU, multinode, and disaggregated serving instructions.
launch/dynamo-run/src/flags.rs Added optional task field to Flags struct for TRTLLM, enabling task selection via CLI.
launch/dynamo-run/src/main.rs Updated usage string to document new --task argument and its accepted values.
launch/dynamo-run/src/subprocess.rs Appends --task and its value to TRTLLM subprocess arguments if specified in flags.
launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml Added disable_overlap_scheduler: true to TRTLLM config YAML.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI
    participant Subprocess
    participant TRTLLM Engine

    User->>CLI: Run dynamo-run with --task argument
    CLI->>Subprocess: Build command-line args (includes --task if set)
    Subprocess->>TRTLLM Engine: Launch with specified task (prefill/decode/prefill_and_decode)
    TRTLLM Engine-->>Subprocess: Executes task-specific logic
    Subprocess-->>CLI: Report result
    CLI-->>User: Output / status
Loading

Possibly related PRs

Poem

In the warren of code, a new flag appears,
"Task" lets the engines split duties with cheers!
Prefill or decode, now each has its say,
On multi-GPU fields, the rabbits will play.
With docs freshly sprouted and configs anew,
Disaggregated dreams are coming true!
🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml (1)

23-23: Setting looks good but consider documenting the rationale inline

disable_overlap_scheduler: true is reasonable for disaggregated runs but it is not obvious why it should always be disabled.
A short inline YAML comment (one-liner) can help future readers understand why overlap scheduling hurts prefill/decoder separation.

docs/guides/dynamo_run.md (1)

458-486: Minor Markdown polish

  • Line 462 – missing comma after “to use”, and “the” before “environment variable”.
  • Code block at 471 should declare its language (e.g. ```bash) for correct
    syntax highlighting.
  • The bolded Multi-GPU/etc headers (MD036) could be converted to level-3
    headings for consistency with the rest of the guide.

Pure documentation, but quick fixes improve readability and linter hygiene.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~462-~462: A comma might be missing here.
Context: ...dynamo-run. To specify which GPUs to use set environment variable `CUDA_VISIBLE_...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)


[uncategorized] ~462-~462: You might be missing the article “the” here.
Context: ...run. To specify which GPUs to use set environment variable CUDA_VISIBLE_DEVICES`. **Mul...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[uncategorized] ~469-~469: You might be missing the article “a” here.
Context: ...tllm engine can be configured to run in disaggregated way using --task. ``` # Ingress dyna...

(AI_EN_LECTOR_MISSING_DETERMINER_A)

🪛 markdownlint-cli2 (0.17.2)

458-458: Emphasis used instead of a heading
null

(MD036, no-emphasis-as-heading)


471-471: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 219cfa1 and 5bf1ff7.

📒 Files selected for processing (5)
  • docs/guides/dynamo_run.md (1 hunks)
  • launch/dynamo-run/src/flags.rs (1 hunks)
  • launch/dynamo-run/src/main.rs (1 hunks)
  • launch/dynamo-run/src/subprocess.rs (1 hunks)
  • launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml (1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/guides/dynamo_run.md

[uncategorized] ~462-~462: A comma might be missing here.
Context: ...dynamo-run. To specify which GPUs to use set environment variable `CUDA_VISIBLE_...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)


[uncategorized] ~462-~462: You might be missing the article “the” here.
Context: ...run. To specify which GPUs to use set environment variable CUDA_VISIBLE_DEVICES`. **Mul...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[uncategorized] ~469-~469: You might be missing the article “a” here.
Context: ...tllm engine can be configured to run in disaggregated way using --task. ``` # Ingress dyna...

(AI_EN_LECTOR_MISSING_DETERMINER_A)

🪛 markdownlint-cli2 (0.17.2)
docs/guides/dynamo_run.md

458-458: Emphasis used instead of a heading
null

(MD036, no-emphasis-as-heading)


471-471: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: Mirror Repository to GitLab
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: Build and Test - vllm

@@ -35,7 +35,7 @@ Example:
- OR: ./dynamo-run /data/models/Llama-3.2-1B-Instruct-Q4_K_M.gguf
"#;

const USAGE: &str = "USAGE: dynamo-run in=[http|text|dyn://<path>|batch:<folder>] out=ENGINE_LIST|dyn [--http-port 8080] [--model-path <path>] [--model-name <served-model-name>] [--model-config <hf-repo>] [--tensor-parallel-size=1] [--context-length=N] [--kv-cache-block-size=16] [--num-nodes=1] [--node-rank=0] [--leader-addr=127.0.0.1:9876] [--base-gpu-id=0] [--extra-engine-args=args.json] [--router-mode random|round-robin|kv] [--kv-overlap-score-weight=2.0] [--kv-gpu-cache-usage-weight=1.0] [--kv-waiting-requests-weight=1.0] [--verbosity (-v|-vv)]";
const USAGE: &str = "USAGE: dynamo-run in=[http|text|dyn://<path>|batch:<folder>] out=ENGINE_LIST|dyn [--http-port 8080] [--model-path <path>] [--model-name <served-model-name>] [--model-config <hf-repo>] [--task prefill_and_decode|decode|prefill] [--tensor-parallel-size=1] [--context-length=N] [--kv-cache-block-size=16] [--num-nodes=1] [--node-rank=0] [--leader-addr=127.0.0.1:9876] [--base-gpu-id=0] [--extra-engine-args=args.json] [--router-mode random|round-robin|kv] [--kv-overlap-score-weight=2.0] [--kv-gpu-cache-usage-weight=1.0] [--kv-waiting-requests-weight=1.0] [--verbosity (-v|-vv)]";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Manual USAGE string is drifting again – delegate to clap

Every time a flag is added (--task in this PR) we must remember to update the hard-coded USAGE string. We’ve already missed this at least once in the past.

Instead, rely on Flags::command().render_usage() (or clap_complete) so the help/usage text is always in sync with the parser:

-const USAGE: &str = "USAGE: dynamo-run ...";
+lazy_static::lazy_static! {
+    static ref USAGE: String = {
+        use clap::CommandFactory;
+        dynamo_run::Flags::command().render_usage().to_string()
+    };
+}

This removes manual duplication and prevents future inconsistencies.

🤖 Prompt for AI Agents
In launch/dynamo-run/src/main.rs at line 38, the USAGE string is manually
maintained and can easily become outdated when flags change. Remove the
hard-coded USAGE string and instead generate the usage text dynamically by
calling Flags::command().render_usage() from the clap crate. This ensures the
usage information always matches the current command-line argument definitions
and eliminates manual duplication.

Comment on lines +56 to 66
//
// The worker node will only publish events and metrics if the router mode is KV
if flags.router_mode == RouterMode::KV {
args.push("--publish-events-and-metrics".to_string());
}
// Specific task for TRTLLM engine
if flags.task.is_some() {
args.push("--task".to_string());
args.push(flags.task.clone().unwrap());
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Safer & engine-specific propagation of --task

  1. unwrap() is unnecessary – use pattern-matching and avoid the extra clone:
-if flags.task.is_some() {
-    args.push("--task".to_string());
-    args.push(flags.task.clone().unwrap());
-}
+if let Some(task) = &flags.task {
+    args.push("--task".to_string());
+    args.push(task.to_string());
+}
  1. --task is silently forwarded even when the selected engine is not TRTLLM.
    The other Python shims (sglang_inc.py, vllm_inc.py) will fail on an
    unknown flag.

    Guard the push inside the TRTLLM code path (or assert in Flags that
    --task may only be supplied with out=trtllm).


🏁 Script executed:

#!/bin/bash
set -euxo pipefail

# Locate the definition of Flags to inspect its fields
rg -n "struct Flags" -C5

# Inspect any 'task' field within Flags
rg -n "task" -C2

# Check for any references to TRTLLM in the codebase
rg -n "trtllm" -C2

# Look for how the engine/output flag is defined (e.g., 'out')
rg -n "out" -C2

Length of output: 70880


🏁 Script executed:

#!/bin/bash
set -euxo pipefail

# Display the Flags struct definition to inspect its fields (incl. task, router_mode, out, engine, etc.)
rg -n "pub struct Flags" -A 30 launch/dynamo-run/src/flags.rs

# Locate any line where `task` is defined in Flags
rg -n "task" -C2 launch/dynamo-run/src/flags.rs

# Locate router_mode definition in Flags
rg -n "router_mode" -C2 launch/dynamo-run/src/flags.rs

# Show the first 100 lines of subprocess.rs to confirm usage context
sed -n '1,100p' launch/dynamo-run/src/subprocess.rs

Length of output: 5594


🏁 Script executed:

#!/bin/bash
set -euxo pipefail

# Show the beginning of the TRTLLM launcher to see how start() is invoked
sed -n '1,200p' launch/dynamo-run/src/trtllm.rs

# Show the beginning of the sglang launcher
sed -n '1,200p' launch/dynamo-run/src/sglang.rs

# Show the beginning of the vllm launcher
sed -n '1,200p' launch/dynamo-run/src/vllm.rs

Length of output: 143


🏁 Script executed:

#!/bin/bash
set -euxo pipefail

# Show full Flags struct, including any engine/subcommand fields
rg -n "pub struct Flags" -A200 launch/dynamo-run/src/flags.rs

# List top-level Rust files in launch/dynamo-run/src
find launch/dynamo-run/src -maxdepth 1 -type f -name "*.rs"

# List Rust files in launch/dynamo-run/src/subprocess directory (engine modules)
find launch/dynamo-run/src/subprocess -type f -name "*.rs"

Length of output: 9111


🏁 Script executed:

#!/bin/bash
set -euxo pipefail

# Show main.rs to understand engine subcommands and how Flags is used
sed -n '1,200p' launch/dynamo-run/src/main.rs

Length of output: 4635


🏁 Script executed:

#!/bin/bash
set -euxo pipefail

# Show script constant and start function in TRTLLM module
sed -n '1,50p' launch/dynamo-run/src/subprocess/trtllm.rs

# Show script constant and start function in SGLANG module
sed -n '1,50p' launch/dynamo-run/src/subprocess/sglang.rs

# Show script constant and start function in VLLM module
sed -n '1,50p' launch/dynamo-run/src/subprocess/vllm.rs

Length of output: 2567


🏁 Script executed:

#!/bin/bash
set -euxo pipefail

# List Python include scripts
find launch/dynamo-run/src/subprocess -maxdepth 1 -type f -name "*.py"

# Inspect TRTLLM Python shim for --task support
grep -R "add_argument.*task" -n launch/dynamo-run/src/subprocess/trtllm_inc.py || true
sed -n '1,200p' launch/dynamo-run/src/subprocess/trtllm_inc.py

# Inspect SGLANG Python shim for --task handling
grep -R "add_argument.*task" -n launch/dynamo-run/src/subprocess/sglang_inc.py || true
sed -n '1,200p' launch/dynamo-run/src/subprocess/sglang_inc.py

# Inspect VLLM Python shim for --task handling
grep -R "add_argument.*task" -n launch/dynamo-run/src/subprocess/vllm_inc.py || true
sed -n '1,200p' launch/dynamo-run/src/subprocess/vllm_inc.py

Length of output: 20737


Restrict and simplify --task propagation

  • Use pattern‐matching to eliminate the panic and extra clone:

    --- a/launch/dynamo-run/src/subprocess.rs
    +++ b/launch/dynamo-run/src/subprocess.rs
     // TRTLLM only
    -// Specific task for TRTLLM engine
    -if flags.task.is_some() {
    -    args.push("--task".to_string());
    -    args.push(flags.task.clone().unwrap());
    -}
    +// Specific task for TRTLLM engine
    +use crate::subprocess::trtllm;
    +
    +if py_script == trtllm::PY {
    +    if let Some(task) = &flags.task {
    +        args.push("--task".to_string());
    +        args.push(task.clone());
    +    }
    +} else if flags.task.is_some() {
    +    anyhow::bail!("`--task` is only supported with the TRTLLM engine");
    +}
  • This removes unwrap(), clones only once, and avoids forwarding --task to SGLANG/VLLM (whose Python shims don’t accept it).

Locations to update:

  • launch/dynamo-run/src/subprocess.rs (around the // Specific task for TRTLLM engine block)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
//
// The worker node will only publish events and metrics if the router mode is KV
if flags.router_mode == RouterMode::KV {
args.push("--publish-events-and-metrics".to_string());
}
// Specific task for TRTLLM engine
if flags.task.is_some() {
args.push("--task".to_string());
args.push(flags.task.clone().unwrap());
}
//
// The worker node will only publish events and metrics if the router mode is KV
if flags.router_mode == RouterMode::KV {
args.push("--publish-events-and-metrics".to_string());
}
// Specific task for TRTLLM engine
use crate::subprocess::trtllm;
if py_script == trtllm::PY {
if let Some(task) = &flags.task {
args.push("--task".to_string());
args.push(task.clone());
}
} else if flags.task.is_some() {
anyhow::bail!("`--task` is only supported with the TRTLLM engine");
}
🤖 Prompt for AI Agents
In launch/dynamo-run/src/subprocess.rs around lines 56 to 66, the code currently
uses unwrap() and clone() to forward the --task flag unconditionally, which can
cause panics and forwards the flag to engines that do not support it. Refactor
this by using pattern matching on flags.task to safely extract the value without
unwrap or extra clone, and only push the --task argument and its value if the
selected engine is TRTLLM. This ensures safer handling and prevents passing
unsupported flags to other engines.

Comment on lines +49 to +59
/// TRTLLM only
///
/// The task to perform with the model. Can be used to execute engines in disaggregated mode.
///
/// Supported tasks:
/// - prefill_and_decode: Prefill and decode (default)
/// - decode: Decode
/// - prefill: Prefill
#[arg(long)]
pub task: Option<String>,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Validate --task at parse-time with a ValueEnum

Accepting an arbitrary String allows typos that only surface deep inside
TensorRT-LLM. Convert it to a typed enum so clap handles validation and
auto-documents the choices:

 use clap::ValueEnum
@@
-#[arg(long)]
-pub task: Option<String>,
+#[derive(ValueEnum, Clone, Debug, PartialEq, Eq)]
+pub enum TrtllmTask {
+    #[value(name = "prefill_and_decode")]
+    PrefillAndDecode,
+    Decode,
+    Prefill,
+}
+
+#[arg(long, value_enum)]
+pub task: Option<TrtllmTask>,

Remember to update call-sites (subprocess.rs, Flags::as_vec) to convert
TrtllmTask to its lowercase string with task.to_string().

This prevents invalid values and makes --help self-documenting.

🤖 Prompt for AI Agents
In launch/dynamo-run/src/flags.rs around lines 49 to 59, the task field is
currently an Option<String>, which allows invalid values and lacks validation.
Change the task field to use a custom enum implementing clap's ValueEnum trait
to enforce valid choices at parse-time. Update all call-sites such as
subprocess.rs and Flags::as_vec to convert the enum variant to a lowercase
string using task.to_string() to maintain compatibility. This will ensure only
supported tasks are accepted and improve help documentation.

Copy link
Contributor

coderabbitai bot commented Jun 14, 2025

Walkthrough

This update introduces a new optional --task flag to the dynamo-run CLI, specifically for use with the TensorRT-LLM (trtllm) engine. Documentation is expanded to describe multi-GPU, multinode, and disaggregated serving configurations. The subprocess logic and configuration files are updated to support and document these new deployment patterns.

Changes

File(s) Summary
docs/guides/dynamo_run.md Extended documentation for trtllm engine: added multi-GPU, multinode, and disaggregated serving instructions.
launch/dynamo-run/src/flags.rs Added optional task field to Flags struct for specifying trtllm engine tasks.
launch/dynamo-run/src/main.rs Updated usage string to document new --task argument and its possible values.
launch/dynamo-run/src/subprocess.rs Updated subprocess argument construction to include --task if specified.
launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml Added disable_overlap_scheduler: true option to trtllm sample YAML config.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI (dynamo-run)
    participant Subprocess (trtllm engine)

    User->>CLI (dynamo-run): Run with --task <value>
    CLI (dynamo-run)->>Subprocess (trtllm engine): Launch subprocess with --task <value>
    Subprocess (trtllm engine)->>Subprocess (trtllm engine): Execute specified task (prefill, decode, etc.)
Loading

Possibly related PRs

Poem

A flag for every trtllm task,
Now dynamo-run wears a brand new mask!
Docs for GPUs, nodes, and more,
Disaggregated serving at your core.
With YAML tweaks and subprocess flair,
The rabbits cheer this thorough care! 🐇


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
launch/dynamo-run/src/main.rs (1)

38-38: Keep top-level docs in sync with the new --task flag

USAGE now advertises --task …, but the first Usage block in docs/guides/dynamo_run.md (line 11) still lacks it. Users will see two conflicting sources of truth.

Add the flag to the docs or point to --help instead of duplicating the list.

launch/dynamo-run/src/subprocess.rs (1)

61-65: Prefer if let and avoid the unnecessary unwrap()

The current pattern clones then unwraps the Option. Slightly clearer/safer:

-if flags.task.is_some() {
-    args.push("--task".to_string());
-    args.push(flags.task.clone().unwrap());
-}
+if let Some(task) = &flags.task {
+    args.push("--task".to_string());
+    args.push(task.clone());
+}

No functional change, but eliminates the unwrap and the redundant is_some check.

docs/guides/dynamo_run.md (1)

458-486: Minor docs polish: punctuation, heading style, fenced-code language

Lint warnings:

  1. “Multi-GPU” line should be a heading (### Multi-GPU) rather than bold text (MD036).
  2. Missing commas/articles in sentences 462 & 469.
  3. Specify a language for the fenced block starting line 471 (MD040).

Proposed patch:

-**Multi-GPU**
-
-Pass `--tensor-parallel-size <NUM-GPUS>` to `dynamo-run`.
+### Multi-GPU
+
+Pass `--tensor-parallel-size <NUM-GPUS>` to `dynamo-run`.

-To specify which GPUs to use set environment variable `CUDA_VISIBLE_DEVICES`.
+To specify which GPUs to use, set the environment variable `CUDA_VISIBLE_DEVICES`.

-**Disaggregated Serving:**
-The trtllm engine can be configured to run in disaggregated way using `--task`.
+### Disaggregated Serving
+The trtllm engine can be configured to run in a disaggregated way using `--task`.

 ```bash
 # Ingress
🧰 Tools
🪛 LanguageTool

[uncategorized] ~462-~462: A comma might be missing here.
Context: ...dynamo-run. To specify which GPUs to use set environment variable `CUDA_VISIBLE_...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)


[uncategorized] ~462-~462: You might be missing the article “the” here.
Context: ...run. To specify which GPUs to use set environment variable CUDA_VISIBLE_DEVICES`. **Mul...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[uncategorized] ~469-~469: You might be missing the article “a” here.
Context: ...tllm engine can be configured to run in disaggregated way using --task. ``` # Ingress dyna...

(AI_EN_LECTOR_MISSING_DETERMINER_A)

🪛 markdownlint-cli2 (0.17.2)

458-458: Emphasis used instead of a heading
null

(MD036, no-emphasis-as-heading)


471-471: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 219cfa1 and 5bf1ff7.

📒 Files selected for processing (5)
  • docs/guides/dynamo_run.md (1 hunks)
  • launch/dynamo-run/src/flags.rs (1 hunks)
  • launch/dynamo-run/src/main.rs (1 hunks)
  • launch/dynamo-run/src/subprocess.rs (1 hunks)
  • launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml (1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/guides/dynamo_run.md

[uncategorized] ~462-~462: A comma might be missing here.
Context: ...dynamo-run. To specify which GPUs to use set environment variable `CUDA_VISIBLE_...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)


[uncategorized] ~462-~462: You might be missing the article “the” here.
Context: ...run. To specify which GPUs to use set environment variable CUDA_VISIBLE_DEVICES`. **Mul...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[uncategorized] ~469-~469: You might be missing the article “a” here.
Context: ...tllm engine can be configured to run in disaggregated way using --task. ``` # Ingress dyna...

(AI_EN_LECTOR_MISSING_DETERMINER_A)

🪛 markdownlint-cli2 (0.17.2)
docs/guides/dynamo_run.md

458-458: Emphasis used instead of a heading
null

(MD036, no-emphasis-as-heading)


471-471: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (1)
launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml (1)

23-25:

❓ Verification inconclusive

Confirm TRT-LLM really supports disable_overlap_scheduler

Nice to see the flag exposed, but the trtllm Python entry points we ship (trtllm_inc.py) do not appear to parse this key today. If the engine silently ignores unknown YAML keys, users may assume the scheduler is disabled when it is not.

  1. Verify the current TensorRT-LLM release recognises this option.
  2. If it is not yet upstream, guard it with a comment indicating the minimum version required.

I’ll check the codebase for any references to disable_overlap_scheduler or its handling in trtllm_inc.py.


🏁 Script executed:

#!/bin/bash
# 1. Find any occurrences of the flag in code
rg "disable_overlap_scheduler" -n

# 2. Find any overlap_scheduler logic
rg "overlap_scheduler" -n

# 3. Locate the Python entrypoint and inspect scheduler handling
rg --files-with-matches "trtllm_inc.py" -n | xargs -I{} sh -c 'echo "----- {} -----"; sed -n "1,200p" {}'

Length of output: 13212


Verify TRT-LLM support for disable_overlap_scheduler

I searched the code and didn’t find any parsing of this key in our Python entrypoint:

  • launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml:23 (defines disable_overlap_scheduler)
  • launch/dynamo-run/src/subprocess/trtllm_inc.py (no references to disable_overlap_scheduler or enable_overlap_scheduler)
  • Examples show the flag in YAML but only as comments or user-provided fields

Please confirm whether the TRT-LLM release you’re using actually recognizes disable_overlap_scheduler. If it doesn’t, guard this setting with a comment indicating the minimum TRT-LLM version required, for example:

# Requires TRT-LLM ≥ X.Y.Z to honor this flag
disable_overlap_scheduler: true

Comment on lines +49 to +58
/// TRTLLM only
///
/// The task to perform with the model. Can be used to execute engines in disaggregated mode.
///
/// Supported tasks:
/// - prefill_and_decode: Prefill and decode (default)
/// - decode: Decode
/// - prefill: Prefill
#[arg(long)]
pub task: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use an enum instead of a raw String for task

Accepting an arbitrary string postpones validation until the Python side fails. Clap already supports ValueEnum; we can exhaustively validate at parse-time and get --help text for free.

-#[arg(long)]
-pub task: Option<String>,
+#[derive(ValueEnum, Clone, Debug)]
+pub enum TrtTask {
+    #[value(name = "prefill_and_decode")]
+    PrefillAndDecode,
+    Decode,
+    Prefill,
+}
+
+#[arg(long)]
+pub task: Option<TrtTask>,

Downstream (subprocess.rs) you’d convert with task.to_string().

This removes a whole class of “unknown task” runtime errors.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In launch/dynamo-run/src/flags.rs around lines 49 to 58, replace the raw String
type for the `task` field with a custom enum that derives `ValueEnum` from Clap.
Define the enum with variants for each supported task ("prefill_and_decode",
"decode", "prefill") and update the `task` field to use `Option<YourEnum>`. This
enables compile-time validation of input values and automatic help text
generation. Downstream code that uses `task` should convert the enum variant to
a string with `to_string()` as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant