-
Notifications
You must be signed in to change notification settings - Fork 419
feat: Expose the task flag from dynamo-run cli for trtllm #1529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThis update introduces a new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant Subprocess
participant TRTLLM Engine
User->>CLI: Run dynamo-run with --task argument
CLI->>Subprocess: Build command-line args (includes --task if set)
Subprocess->>TRTLLM Engine: Launch with specified task (prefill/decode/prefill_and_decode)
TRTLLM Engine-->>Subprocess: Executes task-specific logic
Subprocess-->>CLI: Report result
CLI-->>User: Output / status
Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml (1)
23-23
: Setting looks good but consider documenting the rationale inline
disable_overlap_scheduler: true
is reasonable for disaggregated runs but it is not obvious why it should always be disabled.
A short inline YAML comment (one-liner) can help future readers understand why overlap scheduling hurts prefill/decoder separation.docs/guides/dynamo_run.md (1)
458-486
: Minor Markdown polish
- Line 462 – missing comma after “to use”, and “the” before “environment variable”.
- Code block at 471 should declare its language (e.g. ```bash) for correct
syntax highlighting.- The bolded Multi-GPU/etc headers (MD036) could be converted to level-3
headings for consistency with the rest of the guide.Pure documentation, but quick fixes improve readability and linter hygiene.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~462-~462: A comma might be missing here.
Context: ...dynamo-run
. To specify which GPUs to use set environment variable `CUDA_VISIBLE_...(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
[uncategorized] ~462-~462: You might be missing the article “the” here.
Context: ...run. To specify which GPUs to use set environment variable
CUDA_VISIBLE_DEVICES`. **Mul...(AI_EN_LECTOR_MISSING_DETERMINER_THE)
[uncategorized] ~469-~469: You might be missing the article “a” here.
Context: ...tllm engine can be configured to run in disaggregated way using--task
. ``` # Ingress dyna...(AI_EN_LECTOR_MISSING_DETERMINER_A)
🪛 markdownlint-cli2 (0.17.2)
458-458: Emphasis used instead of a heading
null(MD036, no-emphasis-as-heading)
471-471: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/guides/dynamo_run.md
(1 hunks)launch/dynamo-run/src/flags.rs
(1 hunks)launch/dynamo-run/src/main.rs
(1 hunks)launch/dynamo-run/src/subprocess.rs
(1 hunks)launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml
(1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/guides/dynamo_run.md
[uncategorized] ~462-~462: A comma might be missing here.
Context: ...dynamo-run
. To specify which GPUs to use set environment variable `CUDA_VISIBLE_...
(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
[uncategorized] ~462-~462: You might be missing the article “the” here.
Context: ...run. To specify which GPUs to use set environment variable
CUDA_VISIBLE_DEVICES`. **Mul...
(AI_EN_LECTOR_MISSING_DETERMINER_THE)
[uncategorized] ~469-~469: You might be missing the article “a” here.
Context: ...tllm engine can be configured to run in disaggregated way using --task
. ``` # Ingress dyna...
(AI_EN_LECTOR_MISSING_DETERMINER_A)
🪛 markdownlint-cli2 (0.17.2)
docs/guides/dynamo_run.md
458-458: Emphasis used instead of a heading
null
(MD036, no-emphasis-as-heading)
471-471: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: Mirror Repository to GitLab
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: Build and Test - vllm
@@ -35,7 +35,7 @@ Example: | |||
- OR: ./dynamo-run /data/models/Llama-3.2-1B-Instruct-Q4_K_M.gguf | |||
"#; | |||
|
|||
const USAGE: &str = "USAGE: dynamo-run in=[http|text|dyn://<path>|batch:<folder>] out=ENGINE_LIST|dyn [--http-port 8080] [--model-path <path>] [--model-name <served-model-name>] [--model-config <hf-repo>] [--tensor-parallel-size=1] [--context-length=N] [--kv-cache-block-size=16] [--num-nodes=1] [--node-rank=0] [--leader-addr=127.0.0.1:9876] [--base-gpu-id=0] [--extra-engine-args=args.json] [--router-mode random|round-robin|kv] [--kv-overlap-score-weight=2.0] [--kv-gpu-cache-usage-weight=1.0] [--kv-waiting-requests-weight=1.0] [--verbosity (-v|-vv)]"; | |||
const USAGE: &str = "USAGE: dynamo-run in=[http|text|dyn://<path>|batch:<folder>] out=ENGINE_LIST|dyn [--http-port 8080] [--model-path <path>] [--model-name <served-model-name>] [--model-config <hf-repo>] [--task prefill_and_decode|decode|prefill] [--tensor-parallel-size=1] [--context-length=N] [--kv-cache-block-size=16] [--num-nodes=1] [--node-rank=0] [--leader-addr=127.0.0.1:9876] [--base-gpu-id=0] [--extra-engine-args=args.json] [--router-mode random|round-robin|kv] [--kv-overlap-score-weight=2.0] [--kv-gpu-cache-usage-weight=1.0] [--kv-waiting-requests-weight=1.0] [--verbosity (-v|-vv)]"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Manual USAGE
string is drifting again – delegate to clap
Every time a flag is added (--task
in this PR) we must remember to update the hard-coded USAGE
string. We’ve already missed this at least once in the past.
Instead, rely on Flags::command().render_usage()
(or clap_complete
) so the help/usage text is always in sync with the parser:
-const USAGE: &str = "USAGE: dynamo-run ...";
+lazy_static::lazy_static! {
+ static ref USAGE: String = {
+ use clap::CommandFactory;
+ dynamo_run::Flags::command().render_usage().to_string()
+ };
+}
This removes manual duplication and prevents future inconsistencies.
🤖 Prompt for AI Agents
In launch/dynamo-run/src/main.rs at line 38, the USAGE string is manually
maintained and can easily become outdated when flags change. Remove the
hard-coded USAGE string and instead generate the usage text dynamically by
calling Flags::command().render_usage() from the clap crate. This ensures the
usage information always matches the current command-line argument definitions
and eliminates manual duplication.
// | ||
// The worker node will only publish events and metrics if the router mode is KV | ||
if flags.router_mode == RouterMode::KV { | ||
args.push("--publish-events-and-metrics".to_string()); | ||
} | ||
// Specific task for TRTLLM engine | ||
if flags.task.is_some() { | ||
args.push("--task".to_string()); | ||
args.push(flags.task.clone().unwrap()); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Safer & engine-specific propagation of --task
unwrap()
is unnecessary – use pattern-matching and avoid the extra clone:
-if flags.task.is_some() {
- args.push("--task".to_string());
- args.push(flags.task.clone().unwrap());
-}
+if let Some(task) = &flags.task {
+ args.push("--task".to_string());
+ args.push(task.to_string());
+}
-
--task
is silently forwarded even when the selected engine is not TRTLLM.
The other Python shims (sglang_inc.py
,vllm_inc.py
) will fail on an
unknown flag.Guard the push inside the TRTLLM code path (or assert in
Flags
that
--task
may only be supplied without=trtllm
).
🏁 Script executed:
#!/bin/bash
set -euxo pipefail
# Locate the definition of Flags to inspect its fields
rg -n "struct Flags" -C5
# Inspect any 'task' field within Flags
rg -n "task" -C2
# Check for any references to TRTLLM in the codebase
rg -n "trtllm" -C2
# Look for how the engine/output flag is defined (e.g., 'out')
rg -n "out" -C2
Length of output: 70880
🏁 Script executed:
#!/bin/bash
set -euxo pipefail
# Display the Flags struct definition to inspect its fields (incl. task, router_mode, out, engine, etc.)
rg -n "pub struct Flags" -A 30 launch/dynamo-run/src/flags.rs
# Locate any line where `task` is defined in Flags
rg -n "task" -C2 launch/dynamo-run/src/flags.rs
# Locate router_mode definition in Flags
rg -n "router_mode" -C2 launch/dynamo-run/src/flags.rs
# Show the first 100 lines of subprocess.rs to confirm usage context
sed -n '1,100p' launch/dynamo-run/src/subprocess.rs
Length of output: 5594
🏁 Script executed:
#!/bin/bash
set -euxo pipefail
# Show the beginning of the TRTLLM launcher to see how start() is invoked
sed -n '1,200p' launch/dynamo-run/src/trtllm.rs
# Show the beginning of the sglang launcher
sed -n '1,200p' launch/dynamo-run/src/sglang.rs
# Show the beginning of the vllm launcher
sed -n '1,200p' launch/dynamo-run/src/vllm.rs
Length of output: 143
🏁 Script executed:
#!/bin/bash
set -euxo pipefail
# Show full Flags struct, including any engine/subcommand fields
rg -n "pub struct Flags" -A200 launch/dynamo-run/src/flags.rs
# List top-level Rust files in launch/dynamo-run/src
find launch/dynamo-run/src -maxdepth 1 -type f -name "*.rs"
# List Rust files in launch/dynamo-run/src/subprocess directory (engine modules)
find launch/dynamo-run/src/subprocess -type f -name "*.rs"
Length of output: 9111
🏁 Script executed:
#!/bin/bash
set -euxo pipefail
# Show main.rs to understand engine subcommands and how Flags is used
sed -n '1,200p' launch/dynamo-run/src/main.rs
Length of output: 4635
🏁 Script executed:
#!/bin/bash
set -euxo pipefail
# Show script constant and start function in TRTLLM module
sed -n '1,50p' launch/dynamo-run/src/subprocess/trtllm.rs
# Show script constant and start function in SGLANG module
sed -n '1,50p' launch/dynamo-run/src/subprocess/sglang.rs
# Show script constant and start function in VLLM module
sed -n '1,50p' launch/dynamo-run/src/subprocess/vllm.rs
Length of output: 2567
🏁 Script executed:
#!/bin/bash
set -euxo pipefail
# List Python include scripts
find launch/dynamo-run/src/subprocess -maxdepth 1 -type f -name "*.py"
# Inspect TRTLLM Python shim for --task support
grep -R "add_argument.*task" -n launch/dynamo-run/src/subprocess/trtllm_inc.py || true
sed -n '1,200p' launch/dynamo-run/src/subprocess/trtllm_inc.py
# Inspect SGLANG Python shim for --task handling
grep -R "add_argument.*task" -n launch/dynamo-run/src/subprocess/sglang_inc.py || true
sed -n '1,200p' launch/dynamo-run/src/subprocess/sglang_inc.py
# Inspect VLLM Python shim for --task handling
grep -R "add_argument.*task" -n launch/dynamo-run/src/subprocess/vllm_inc.py || true
sed -n '1,200p' launch/dynamo-run/src/subprocess/vllm_inc.py
Length of output: 20737
Restrict and simplify --task
propagation
-
Use pattern‐matching to eliminate the panic and extra clone:
--- a/launch/dynamo-run/src/subprocess.rs +++ b/launch/dynamo-run/src/subprocess.rs // TRTLLM only -// Specific task for TRTLLM engine -if flags.task.is_some() { - args.push("--task".to_string()); - args.push(flags.task.clone().unwrap()); -} +// Specific task for TRTLLM engine +use crate::subprocess::trtllm; + +if py_script == trtllm::PY { + if let Some(task) = &flags.task { + args.push("--task".to_string()); + args.push(task.clone()); + } +} else if flags.task.is_some() { + anyhow::bail!("`--task` is only supported with the TRTLLM engine"); +}
-
This removes
unwrap()
, clones only once, and avoids forwarding--task
to SGLANG/VLLM (whose Python shims don’t accept it).
Locations to update:
- launch/dynamo-run/src/subprocess.rs (around the
// Specific task for TRTLLM engine
block)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// | |
// The worker node will only publish events and metrics if the router mode is KV | |
if flags.router_mode == RouterMode::KV { | |
args.push("--publish-events-and-metrics".to_string()); | |
} | |
// Specific task for TRTLLM engine | |
if flags.task.is_some() { | |
args.push("--task".to_string()); | |
args.push(flags.task.clone().unwrap()); | |
} | |
// | |
// The worker node will only publish events and metrics if the router mode is KV | |
if flags.router_mode == RouterMode::KV { | |
args.push("--publish-events-and-metrics".to_string()); | |
} | |
// Specific task for TRTLLM engine | |
use crate::subprocess::trtllm; | |
if py_script == trtllm::PY { | |
if let Some(task) = &flags.task { | |
args.push("--task".to_string()); | |
args.push(task.clone()); | |
} | |
} else if flags.task.is_some() { | |
anyhow::bail!("`--task` is only supported with the TRTLLM engine"); | |
} |
🤖 Prompt for AI Agents
In launch/dynamo-run/src/subprocess.rs around lines 56 to 66, the code currently
uses unwrap() and clone() to forward the --task flag unconditionally, which can
cause panics and forwards the flag to engines that do not support it. Refactor
this by using pattern matching on flags.task to safely extract the value without
unwrap or extra clone, and only push the --task argument and its value if the
selected engine is TRTLLM. This ensures safer handling and prevents passing
unsupported flags to other engines.
/// TRTLLM only | ||
/// | ||
/// The task to perform with the model. Can be used to execute engines in disaggregated mode. | ||
/// | ||
/// Supported tasks: | ||
/// - prefill_and_decode: Prefill and decode (default) | ||
/// - decode: Decode | ||
/// - prefill: Prefill | ||
#[arg(long)] | ||
pub task: Option<String>, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Validate --task
at parse-time with a ValueEnum
Accepting an arbitrary String
allows typos that only surface deep inside
TensorRT-LLM. Convert it to a typed enum so clap handles validation and
auto-documents the choices:
use clap::ValueEnum
@@
-#[arg(long)]
-pub task: Option<String>,
+#[derive(ValueEnum, Clone, Debug, PartialEq, Eq)]
+pub enum TrtllmTask {
+ #[value(name = "prefill_and_decode")]
+ PrefillAndDecode,
+ Decode,
+ Prefill,
+}
+
+#[arg(long, value_enum)]
+pub task: Option<TrtllmTask>,
Remember to update call-sites (subprocess.rs
, Flags::as_vec
) to convert
TrtllmTask
to its lowercase string with task.to_string()
.
This prevents invalid values and makes --help
self-documenting.
🤖 Prompt for AI Agents
In launch/dynamo-run/src/flags.rs around lines 49 to 59, the task field is
currently an Option<String>, which allows invalid values and lacks validation.
Change the task field to use a custom enum implementing clap's ValueEnum trait
to enforce valid choices at parse-time. Update all call-sites such as
subprocess.rs and Flags::as_vec to convert the enum variant to a lowercase
string using task.to_string() to maintain compatibility. This will ensure only
supported tasks are accepted and improve help documentation.
WalkthroughThis update introduces a new optional Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI (dynamo-run)
participant Subprocess (trtllm engine)
User->>CLI (dynamo-run): Run with --task <value>
CLI (dynamo-run)->>Subprocess (trtllm engine): Launch subprocess with --task <value>
Subprocess (trtllm engine)->>Subprocess (trtllm engine): Execute specified task (prefill, decode, etc.)
Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
launch/dynamo-run/src/main.rs (1)
38-38
: Keep top-level docs in sync with the new--task
flag
USAGE
now advertises--task …
, but the first Usage block indocs/guides/dynamo_run.md
(line 11) still lacks it. Users will see two conflicting sources of truth.Add the flag to the docs or point to
--help
instead of duplicating the list.launch/dynamo-run/src/subprocess.rs (1)
61-65
: Preferif let
and avoid the unnecessaryunwrap()
The current pattern clones then unwraps the Option. Slightly clearer/safer:
-if flags.task.is_some() { - args.push("--task".to_string()); - args.push(flags.task.clone().unwrap()); -} +if let Some(task) = &flags.task { + args.push("--task".to_string()); + args.push(task.clone()); +}No functional change, but eliminates the
unwrap
and the redundantis_some
check.docs/guides/dynamo_run.md (1)
458-486
: Minor docs polish: punctuation, heading style, fenced-code languageLint warnings:
- “Multi-GPU” line should be a heading (
### Multi-GPU
) rather than bold text (MD036).- Missing commas/articles in sentences 462 & 469.
- Specify a language for the fenced block starting line 471 (MD040).
Proposed patch:
-**Multi-GPU** - -Pass `--tensor-parallel-size <NUM-GPUS>` to `dynamo-run`. +### Multi-GPU + +Pass `--tensor-parallel-size <NUM-GPUS>` to `dynamo-run`. -To specify which GPUs to use set environment variable `CUDA_VISIBLE_DEVICES`. +To specify which GPUs to use, set the environment variable `CUDA_VISIBLE_DEVICES`. -**Disaggregated Serving:** -The trtllm engine can be configured to run in disaggregated way using `--task`. +### Disaggregated Serving +The trtllm engine can be configured to run in a disaggregated way using `--task`. ```bash # Ingress🧰 Tools
🪛 LanguageTool
[uncategorized] ~462-~462: A comma might be missing here.
Context: ...dynamo-run
. To specify which GPUs to use set environment variable `CUDA_VISIBLE_...(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
[uncategorized] ~462-~462: You might be missing the article “the” here.
Context: ...run. To specify which GPUs to use set environment variable
CUDA_VISIBLE_DEVICES`. **Mul...(AI_EN_LECTOR_MISSING_DETERMINER_THE)
[uncategorized] ~469-~469: You might be missing the article “a” here.
Context: ...tllm engine can be configured to run in disaggregated way using--task
. ``` # Ingress dyna...(AI_EN_LECTOR_MISSING_DETERMINER_A)
🪛 markdownlint-cli2 (0.17.2)
458-458: Emphasis used instead of a heading
null(MD036, no-emphasis-as-heading)
471-471: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/guides/dynamo_run.md
(1 hunks)launch/dynamo-run/src/flags.rs
(1 hunks)launch/dynamo-run/src/main.rs
(1 hunks)launch/dynamo-run/src/subprocess.rs
(1 hunks)launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml
(1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/guides/dynamo_run.md
[uncategorized] ~462-~462: A comma might be missing here.
Context: ...dynamo-run
. To specify which GPUs to use set environment variable `CUDA_VISIBLE_...
(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
[uncategorized] ~462-~462: You might be missing the article “the” here.
Context: ...run. To specify which GPUs to use set environment variable
CUDA_VISIBLE_DEVICES`. **Mul...
(AI_EN_LECTOR_MISSING_DETERMINER_THE)
[uncategorized] ~469-~469: You might be missing the article “a” here.
Context: ...tllm engine can be configured to run in disaggregated way using --task
. ``` # Ingress dyna...
(AI_EN_LECTOR_MISSING_DETERMINER_A)
🪛 markdownlint-cli2 (0.17.2)
docs/guides/dynamo_run.md
458-458: Emphasis used instead of a heading
null
(MD036, no-emphasis-as-heading)
471-471: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (1)
launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml (1)
23-25
:❓ Verification inconclusive
Confirm TRT-LLM really supports
disable_overlap_scheduler
Nice to see the flag exposed, but the trtllm Python entry points we ship (
trtllm_inc.py
) do not appear to parse this key today. If the engine silently ignores unknown YAML keys, users may assume the scheduler is disabled when it is not.
- Verify the current TensorRT-LLM release recognises this option.
- If it is not yet upstream, guard it with a comment indicating the minimum version required.
I’ll check the codebase for any references to
disable_overlap_scheduler
or its handling intrtllm_inc.py
.
🏁 Script executed:
#!/bin/bash # 1. Find any occurrences of the flag in code rg "disable_overlap_scheduler" -n # 2. Find any overlap_scheduler logic rg "overlap_scheduler" -n # 3. Locate the Python entrypoint and inspect scheduler handling rg --files-with-matches "trtllm_inc.py" -n | xargs -I{} sh -c 'echo "----- {} -----"; sed -n "1,200p" {}'Length of output: 13212
Verify TRT-LLM support for
disable_overlap_scheduler
I searched the code and didn’t find any parsing of this key in our Python entrypoint:
- launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml:23 (defines
disable_overlap_scheduler
)- launch/dynamo-run/src/subprocess/trtllm_inc.py (no references to
disable_overlap_scheduler
orenable_overlap_scheduler
)- Examples show the flag in YAML but only as comments or user-provided fields
Please confirm whether the TRT-LLM release you’re using actually recognizes
disable_overlap_scheduler
. If it doesn’t, guard this setting with a comment indicating the minimum TRT-LLM version required, for example:# Requires TRT-LLM ≥ X.Y.Z to honor this flag disable_overlap_scheduler: true
/// TRTLLM only | ||
/// | ||
/// The task to perform with the model. Can be used to execute engines in disaggregated mode. | ||
/// | ||
/// Supported tasks: | ||
/// - prefill_and_decode: Prefill and decode (default) | ||
/// - decode: Decode | ||
/// - prefill: Prefill | ||
#[arg(long)] | ||
pub task: Option<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use an enum instead of a raw String
for task
Accepting an arbitrary string postpones validation until the Python side fails. Clap already supports ValueEnum
; we can exhaustively validate at parse-time and get --help
text for free.
-#[arg(long)]
-pub task: Option<String>,
+#[derive(ValueEnum, Clone, Debug)]
+pub enum TrtTask {
+ #[value(name = "prefill_and_decode")]
+ PrefillAndDecode,
+ Decode,
+ Prefill,
+}
+
+#[arg(long)]
+pub task: Option<TrtTask>,
Downstream (subprocess.rs) you’d convert with task.to_string()
.
This removes a whole class of “unknown task” runtime errors.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In launch/dynamo-run/src/flags.rs around lines 49 to 58, replace the raw String
type for the `task` field with a custom enum that derives `ValueEnum` from Clap.
Define the enum with variants for each supported task ("prefill_and_decode",
"decode", "prefill") and update the `task` field to use `Option<YourEnum>`. This
enables compile-time validation of input values and automatic help text
generation. Downstream code that uses `task` should convert the enum variant to
a string with `to_string()` as needed.
Overview:
Allow passing task flag from dynamo-run cli.
Tested the instructions locally.
Follow-up work from: #1355
Details:
There is planned future work for describing the workflow via
--next
argument and self-discovery for remote prefill worker.Currently, I have described the limitation in the instructions.
Graham had ideas to express component graph in dynamo run which is somewhat more work. We are having more conversations around how to better synchronize dynamo-run and dynamo-serve here.
Summary by CodeRabbit
prefill_and_decode
,decode
, orprefill
) when running the TensorRT-LLM engine, enabling more flexible serving configurations.disable_overlap_scheduler
option in the sample configuration file for advanced engine tuning.