Skip to content

feat: Dynamic Endpoint Exposure Based on Model Type #1447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

abrarshivani
Copy link
Contributor

@abrarshivani abrarshivani commented Jun 9, 2025

Overview:

This PR enables dynamic management of HTTP endpoints (chat, completion, embeddings) based on model availability.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

  • New Features

    • HTTP service endpoints for chat, completion, and embeddings can now be dynamically enabled or disabled at runtime based on model updates.
    • Endpoints automatically adjust their availability as models are added or removed, providing a more responsive and adaptive user experience.
  • Bug Fixes

    • Improved error handling for endpoint availability, returning appropriate HTTP status codes when endpoints are disabled.
  • Refactor

    • Endpoint availability is now managed dynamically through runtime flags and middleware, replacing previous static configuration.

Copy link

copy-pr-bot bot commented Jun 9, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link

github-actions bot commented Jun 9, 2025

👋 Hi abrarshivani! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added external-contribution Pull request is from an external contributor feat labels Jun 9, 2025
Copy link
Contributor

coderabbitai bot commented Jun 9, 2025

Walkthrough

The changes implement dynamic HTTP endpoint control based on model lifecycle events. Model addition or removal triggers asynchronous notifications, which update the availability of chat, completion, and embedding endpoints in the HTTP service at runtime. This is achieved through new runtime flags and middleware, replacing previous static configuration and route inclusion.

Changes

File(s) Change Summary
launch/dynamo-run/src/input/http.rs Refactored HTTP service setup to use dynamic endpoint enabling/disabling via model watcher notifications. Added helper for endpoint toggling.
lib/llm/src/discovery/watcher.rs Added ModelUpdate enum, notification channel to ModelWatcher, and logic to send updates on model add/remove.
lib/llm/src/discovery.rs Re-exported ModelUpdate alongside ModelWatcher.
lib/llm/src/http/service/service_v2.rs Added runtime flags and methods for toggling endpoint availability. Switched to middleware-based gating for endpoint routes.

Sequence Diagram(s)

sequenceDiagram
    participant ModelWatcher
    participant HttpService
    participant Client

    ModelWatcher->>HttpService: Send ModelUpdate (Added/Removed)
    HttpService->>HttpService: update_http_endpoints()
    Client->>HttpService: HTTP request to endpoint
    HttpService->>HttpService: Check endpoint flag (middleware)
    alt Endpoint enabled
        HttpService->>Client: Process request normally
    else Endpoint disabled
        HttpService->>Client: Return 503/404 error
    end
Loading

Poem

In the meadow of code where models appear,
Endpoints now dance—enabled by cheer!
A watcher on duty, flags in its paw,
Turns routes on and off with a hop and a jaw.
Dynamic and nimble, the service now sings,
As models come and go—see what change brings!
🐇✨


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🔭 Outside diff range comments (1)
lib/llm/src/http/service/service_v2.rs (1)

307-311: ⚠️ Potential issue

Build-breaking bug: with_request_template nests Option
request_template already has type Option<RequestTemplate>, but the setter wraps it in Some(_), yielding Option<Option<_>> and causing a type mismatch.

-pub fn with_request_template(mut self, request_template: Option<RequestTemplate>) -> Self {
-    self.request_template = Some(request_template);
+pub fn with_request_template(mut self, request_template: Option<RequestTemplate>) -> Self {
+    self.request_template = request_template;
     self
 }
🧹 Nitpick comments (7)
lib/llm/src/discovery/watcher.rs (3)

35-39: Add documentation for the public ModelUpdate enum.

This enum is part of the public API and should be documented to help users understand its purpose and usage.

Apply this diff to add documentation:

+/// Represents a model lifecycle event for dynamic endpoint management.
+///
+/// This enum is used to notify external components about model additions
+/// and removals, enabling dynamic control of HTTP endpoints based on
+/// available model types.
 #[derive(Debug, Clone, Copy, PartialEq)]
 pub enum ModelUpdate {
+    /// A model of the specified type has been added to the system.
     Added(ModelType),
+    /// A model of the specified type has been removed from the system.
     Removed(ModelType),
 }

67-69: Add documentation for the public method.

This method is part of the public API and should be documented.

Apply this diff to add documentation:

+    /// Sets the channel sender for model update notifications.
+    ///
+    /// When set, the watcher will send `ModelUpdate` events through this channel
+    /// whenever models are added or removed from the system.
     pub fn set_notify_on_model_update(&mut self, tx: Sender<ModelUpdate>) {
         self.model_update_tx = Some(tx);
     }

120-124: Consider logging when channel send fails.

The .ok() silently ignores send errors, which could occur if the receiver is dropped. Consider logging these failures for better observability.

Apply this diff to log send failures:

             if let Some(tx) = &self.model_update_tx {
-                tx.send(ModelUpdate::Added(model_entry.model_type))
-                    .await
-                    .ok();
+                if let Err(e) = tx.send(ModelUpdate::Added(model_entry.model_type)).await {
+                    tracing::warn!("Failed to send model update notification: {}", e);
+                }
             }
launch/dynamo-run/src/input/http.rs (2)

125-126: Consider increasing the channel buffer size.

The channel buffer is set to 1, which might cause backpressure if multiple model updates occur in quick succession. Consider using a larger buffer or an unbounded channel.

Apply this diff to increase the buffer size:

     // Create a channel to receive model type updates
-    let (tx, mut rx) = tokio::sync::mpsc::channel(1);
+    let (tx, mut rx) = tokio::sync::mpsc::channel(32);

156-157: Consider logging when Backend model updates are received.

Since Backend models don't trigger endpoint changes, it might be helpful to log when they're received to aid in debugging.

Apply this diff to add logging:

-            ModelType::Backend => {}
+            ModelType::Backend => {
+                tracing::debug!("Backend model added - no endpoint changes required");
+            }
lib/llm/src/http/service/service_v2.rs (2)

145-182: enable_* helpers are triplicated – extract a common setter
The three nearly-identical methods differ only in the field that is toggled and the log message. Consolidating them into one generic helper (or macro) will reduce boilerplate and prevent future inconsistencies.

Example sketch:

fn set_flag<F>(&self, f: F, flag_name: &str, enable: bool)
where
    F: FnOnce(&mut StateFlags) -> &mut bool,
{
    match self.state.flags.write() {
        Ok(mut flags) => {
            *f(&mut *flags) = enable;
            tracing::info!("{flag_name} {}", if enable { "enabled" } else { "disabled" });
        }
        Err(e) => tracing::error!("Failed to acquire write lock for {flag_name}: {e}"),
    }
}

214-292: Different HTTP codes for the same “disabled” condition
Chat returns 503 Service Unavailable, while completions/embeddings return 404 Not Found. The difference is confusing for clients toggling models dynamically. Pick one status (typically 503 with a Retry-After header) for all disabled endpoints, or document the rationale.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d49970 and c680058.

📒 Files selected for processing (4)
  • launch/dynamo-run/src/input/http.rs (5 hunks)
  • lib/llm/src/discovery.rs (1 hunks)
  • lib/llm/src/discovery/watcher.rs (5 hunks)
  • lib/llm/src/http/service/service_v2.rs (6 hunks)
🔇 Additional comments (3)
lib/llm/src/discovery.rs (1)

10-11: LGTM! Clean re-export of the new ModelUpdate enum.

The change appropriately expands the public API to include the ModelUpdate enum alongside ModelWatcher, enabling external modules to subscribe to model lifecycle events.

launch/dynamo-run/src/input/http.rs (1)

65-68: 💡 Verification agent

🧩 Analysis chain

Verify that chat completion endpoints include both /v1/chat/completions and /chat/completions.

The code enables chat endpoints after adding models. Ensure that both OpenAI-compatible and legacy endpoints are properly enabled.


🏁 Script executed:

#!/bin/bash
# Description: Check the implementation of enable_chat_endpoints to verify it enables all expected routes

# Search for the enable_chat_endpoints implementation
ast-grep --pattern 'fn enable_chat_endpoints($$$) {
  $$$
}'

Length of output: 62


Let’s locate the actual implementation to confirm which routes are registered:


🏁 Script executed:

#!/bin/bash
# Search for the enable_chat_endpoints method and show surrounding lines
rg -n "enable_chat_endpoints" -C5 --type rs

Length of output: 70


Let’s try locating the function definition in all Rust files without restricting by type:


🏁 Script executed:

#!/bin/bash
# Find the enable_chat_endpoints function in any .rs file
rg -n "fn enable_chat_endpoints" -C5 -g '*.rs'

Length of output: 964


Let’s inspect where the flag is used and whether both routes are registered:


🏁 Script executed:

#!/bin/bash
# Locate flag usage in service_v2.rs
rg -n 'chat_endpoints_enabled' -C10 lib/llm/src/http/service/service_v2.rs

# Search for explicit registration of each endpoint
rg -n '/v1/chat/completions' -C5 -g '*.rs'
rg -n '/chat/completions' -C5 -g '*.rs'

Length of output: 16955


Add legacy /chat/completions route alongside /v1/chat/completions

The current build registers only the default /v1/chat/completions endpoint. To maintain backward compatibility, you should also register the legacy /chat/completions path.

Locations to update:

  • In lib/llm/src/http/service/service_v2.rs inside HttpServiceConfigBuilder::build (around line 213): after you configure chat_route, invoke chat_completions_router again with Some("/chat/completions".into()) and merge it into your router.
  • Optionally, in lib/llm/src/http/service/openai.rs you could update chat_completions_router so that when path is None it registers both paths automatically.

Example diff in service_v2.rs:

@@ // Add chat completions route with conditional middleware
- let (chat_docs, chat_route) =
-     super::openai::chat_completions_router(state.clone(), config.request_template, None);
+ let (chat_docs, chat_route) =
+     super::openai::chat_completions_router(state.clone(), config.request_template, None);
  let chat_route = chat_route.route_layer(axum::middleware::from_fn(...));

+ // Support legacy endpoint
+ let (_legacy_docs, legacy_route) =
+     super::openai::chat_completions_router(
+         state.clone(),
+         config.request_template,
+         Some("/chat/completions".into()),
+     );
+ let legacy_route = legacy_route.route_layer(axum::middleware::from_fn(...));
+
+ // Merge both routers
+ router = router
+     .merge(chat_route)
+     .merge(legacy_route);

Likely an incorrect or invalid review comment.

lib/llm/src/http/service/service_v2.rs (1)

193-197: Lock-poisoning is silently swallowed
If a previous panic poisoned the flags lock, the write attempt fails and the configuration silently remains stale. Consider propagating the error or at least returning a Result so callers can react instead of continuing in an undefined state.

Comment on lines +130 to +137
let _endpoint_enabler_task = tokio::spawn(async move {
while let Some(model_type) = rx.recv().await {
tracing::debug!("Received model type update: {:?}", model_type);
update_http_endpoints(http_service.clone(), model_type);
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Store the spawned task handle for proper lifecycle management.

The spawned task is not stored, which means it cannot be properly joined or cancelled. This could lead to the task continuing to run after the watcher has stopped.

You should store the task handle and ensure proper cleanup. Consider returning it from the function or storing it in a struct field for proper lifecycle management.

🤖 Prompt for AI Agents
In launch/dynamo-run/src/input/http.rs around lines 130 to 135, the spawned task
handle from tokio::spawn is not stored, preventing proper lifecycle management
such as joining or cancelling the task. To fix this, assign the result of
tokio::spawn to a variable that is either returned from the function or stored
in a struct field, ensuring the task handle can be awaited or aborted as needed
for cleanup.

Comment on lines +145 to +167
/// Updates HTTP service endpoints based on available model types
fn update_http_endpoints(service: Arc<HttpService>, model_type: ModelUpdate) {
tracing::debug!(
"Updating HTTP service endpoints for model type: {:?}",
model_type
);
match model_type {
ModelUpdate::Added(model_type) => match model_type {
ModelType::Chat => service.enable_chat_endpoints(true),
ModelType::Completion => service.enable_cmpl_endpoints(true),
ModelType::Embedding => service.enable_embeddings_endpoints(true),
ModelType::Backend => {}
},
ModelUpdate::Removed(model_type) => match model_type {
ModelType::Chat => service.enable_chat_endpoints(false),
ModelType::Completion => service.enable_cmpl_endpoints(false),
ModelType::Embedding => service.enable_embeddings_endpoints(false),
ModelType::Backend => {}
},
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Consider handling Backend model type for endpoint management.

The Backend model type doesn't trigger any endpoint changes, but according to the watcher code, Backend models support both chat and completions. Verify if this is intentional.

Based on the watcher implementation, Backend models register both chat and completion engines. Should we enable both endpoints when a Backend model is added?


🏁 Script executed:

#!/bin/bash
# Description: Verify how Backend models are handled in the watcher

# Search for Backend model handling in handle_put
ast-grep --pattern 'ModelType::Backend => {
  $$$
}'

Length of output: 54


I’ll search for any references to ModelType::Backend and inspect the watcher’s handle_put implementation to confirm how Backend models are treated.


🏁 Script executed:

#!/bin/bash
# Search for direct references to Backend in the ModelType enum usage
rg "ModelType::Backend" -n .

# Locate the watcher’s handle_put function and inspect surrounding code
rg "fn handle_put" -n launch

# If found, show context around ModelType matching in handle_put
rg -A5 -B5 "ModelType::" -n launch

Length of output: 5535


I’ll inspect how the watcher handles ModelType::Backend to confirm which engines it registers:


🏁 Script executed:

#!/bin/bash
# Show the watcher’s handling of Backend models
rg -A5 -B5 "ModelType::Backend =>" -n lib/llm/src/discovery/watcher.rs

Length of output: 527


Enable chat and completion endpoints for Backend models

The watcher explicitly registers both chat and completion engines for ModelType::Backend, so the HTTP service should mirror that by toggling both endpoint sets when a Backend model is added or removed.

Please update launch/dynamo-run/src/input/http.rs accordingly:

fn update_http_endpoints(service: Arc<HttpService>, model_type: ModelUpdate) {
    tracing::debug!(
        "Updating HTTP service endpoints for model type: {:?}",
        model_type
    );
    match model_type {
        ModelUpdate::Added(model_type) => match model_type {
            ModelType::Chat => service.enable_chat_endpoints(true),
            ModelType::Completion => service.enable_cmpl_endpoints(true),
            ModelType::Embedding => service.enable_embeddings_endpoints(true),
-           ModelType::Backend => {}
+           ModelType::Backend => {
+               service.enable_chat_endpoints(true);
+               service.enable_cmpl_endpoints(true);
+           }
        },
        ModelUpdate::Removed(model_type) => match model_type {
            ModelType::Chat => service.enable_chat_endpoints(false),
            ModelType::Completion => service.enable_cmpl_endpoints(false),
            ModelType::Embedding => service.enable_embeddings_endpoints(false),
-           ModelType::Backend => {}
+           ModelType::Backend => {
+               service.enable_chat_endpoints(false);
+               service.enable_cmpl_endpoints(false);
+           }
        },
    }
}

This ensures the HTTP layer stays in sync with the watcher’s multi‐engine Backend handling.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Updates HTTP service endpoints based on available model types
fn update_http_endpoints(service: Arc<HttpService>, model_type: ModelUpdate) {
tracing::debug!(
"Updating HTTP service endpoints for model type: {:?}",
model_type
);
match model_type {
ModelUpdate::Added(model_type) => match model_type {
ModelType::Chat => service.enable_chat_endpoints(true),
ModelType::Completion => service.enable_cmpl_endpoints(true),
ModelType::Embedding => service.enable_embeddings_endpoints(true),
ModelType::Backend => {}
},
ModelUpdate::Removed(model_type) => match model_type {
ModelType::Chat => service.enable_chat_endpoints(false),
ModelType::Completion => service.enable_cmpl_endpoints(false),
ModelType::Embedding => service.enable_embeddings_endpoints(false),
ModelType::Backend => {}
},
}
}
/// Updates HTTP service endpoints based on available model types
fn update_http_endpoints(service: Arc<HttpService>, model_type: ModelUpdate) {
tracing::debug!(
"Updating HTTP service endpoints for model type: {:?}",
model_type
);
match model_type {
ModelUpdate::Added(model_type) => match model_type {
ModelType::Chat => service.enable_chat_endpoints(true),
ModelType::Completion => service.enable_cmpl_endpoints(true),
ModelType::Embedding => service.enable_embeddings_endpoints(true),
ModelType::Backend => {
service.enable_chat_endpoints(true);
service.enable_cmpl_endpoints(true);
}
},
ModelUpdate::Removed(model_type) => match model_type {
ModelType::Chat => service.enable_chat_endpoints(false),
ModelType::Completion => service.enable_cmpl_endpoints(false),
ModelType::Embedding => service.enable_embeddings_endpoints(false),
ModelType::Backend => {
service.enable_chat_endpoints(false);
service.enable_cmpl_endpoints(false);
}
},
}
}
🤖 Prompt for AI Agents
In launch/dynamo-run/src/input/http.rs around lines 145 to 165, the Backend
model type currently does not trigger any endpoint changes in
update_http_endpoints. Since the watcher registers both chat and completion
engines for Backend models, update the function to enable or disable both chat
and completion endpoints when a Backend model is added or removed. Modify the
match arms for ModelType::Backend to call service.enable_chat_endpoints and
service.enable_cmpl_endpoints with the appropriate boolean value.

@@ -1,7 +1,7 @@
// SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0

use std::sync::Arc;
use std::sync::{Arc, RwLock};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use an async-friendly RwLock to avoid blocking the Tokio runtime
Every request hits state.flags.read(); with std::sync::RwLock this can park the entire executor thread when contention occurs, defeating cooperative multitasking and hurting tail-latency.
Switch to tokio::sync::RwLock (or parking_lot::RwLock) and update imports/usages accordingly.

-use std::sync::{Arc, RwLock};
+use std::sync::Arc;
+use tokio::sync::RwLock;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
use std::sync::{Arc, RwLock};
-use std::sync::{Arc, RwLock};
+use std::sync::Arc;
+use tokio::sync::RwLock;
🤖 Prompt for AI Agents
In lib/llm/src/http/service/service_v2.rs at line 4, replace the import of
std::sync::RwLock with tokio::sync::RwLock to use an async-friendly lock. Update
all usages of RwLock in the file to work with the async version, ensuring that
calls to read() or write() are awaited properly to prevent blocking the Tokio
runtime and improve concurrency.

@abrarshivani abrarshivani force-pushed the shivaniab/dyn_endpoints branch from c680058 to 11c31be Compare June 9, 2025 21:06
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
launch/dynamo-run/src/input/http.rs (2)

126-137: ⚠️ Potential issue

Critical issue: Task handle not stored for proper lifecycle management.

The spawned task handle is not stored, preventing proper cleanup and potentially causing the task to run indefinitely.

Apply this diff to store the task handle:

-    // Spawn a task to watch for model type changes and update HTTP service endpoints
-    let _endpoint_enabler_task = tokio::spawn(async move {
+    // Spawn a task to watch for model type changes and update HTTP service endpoints
+    let endpoint_enabler_task = tokio::spawn(async move {
         while let Some(model_type) = rx.recv().await {
             tracing::debug!("Received model type update: {:?}", model_type);
             update_http_endpoints(http_service.clone(), model_type);
         }
     });

Consider returning the task handle from this function or storing it in a struct for proper lifecycle management.


158-158: ⚠️ Potential issue

Missing functionality: Backend model type should enable both chat and completion endpoints.

The ModelType::Backend case is empty but should enable both chat and completion endpoints since the watcher registers both engines for Backend models.

Apply this diff to handle Backend models correctly:

            ModelType::Backend => {}
+           ModelType::Backend => {
+               service.enable_chat_endpoints(true);
+               service.enable_cmpl_endpoints(true);
+           }
        },
        ModelUpdate::Removed(model_type) => match model_type {
            ModelType::Chat => service.enable_chat_endpoints(false),
            ModelType::Completion => service.enable_cmpl_endpoints(false),
            ModelType::Embedding => service.enable_embeddings_endpoints(false),
-           ModelType::Backend => {}
+           ModelType::Backend => {
+               service.enable_chat_endpoints(false);
+               service.enable_cmpl_endpoints(false);
+           }

Also applies to: 164-164

lib/llm/src/http/service/service_v2.rs (1)

4-4: 🛠️ Refactor suggestion

Use async-friendly RwLock to avoid blocking the Tokio runtime.

The middleware functions acquire read locks on every request, and std::sync::RwLock can block the entire executor thread under contention, hurting performance.

Switch to tokio::sync::RwLock:

-use std::sync::{Arc, RwLock};
+use std::sync::Arc;
+use tokio::sync::RwLock;

Update all lock operations to use .await:

  • state.flags.read()state.flags.read().await
  • state.flags.write()state.flags.write().await
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c680058 and 11c31be.

📒 Files selected for processing (4)
  • launch/dynamo-run/src/input/http.rs (5 hunks)
  • lib/llm/src/discovery.rs (1 hunks)
  • lib/llm/src/discovery/watcher.rs (5 hunks)
  • lib/llm/src/http/service/service_v2.rs (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • lib/llm/src/discovery.rs
  • lib/llm/src/discovery/watcher.rs
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (13)
launch/dynamo-run/src/input/http.rs (6)

10-14: LGTM: Clean import additions for dynamic endpoint management.

The new imports for ModelUpdate, HttpService, and ModelType are necessary for the dynamic endpoint functionality and properly organized.


51-51: LGTM: Proper Arc wrapping for shared HTTP service.

The Arc::new(http_service.clone()) correctly wraps the HttpService for shared ownership across async tasks.


65-69: LGTM: Appropriate static endpoint enabling placement.

Moving the endpoint enabling after model registration ensures the endpoints are available when the models are ready to serve requests.


90-94: LGTM: Consistent endpoint enabling for StaticCore configuration.

The endpoint enabling pattern is consistently applied across both StaticFull and StaticCore configurations.


119-119: LGTM: Function signature updated for dynamic endpoint management.

The addition of the http_service: Arc<HttpService> parameter enables the watcher to dynamically control HTTP endpoints.


147-167: LGTM: Well-structured endpoint update function with proper logging.

The function correctly matches on ModelUpdate variants and toggles the appropriate endpoints. The debug logging provides good visibility into the dynamic endpoint management.

lib/llm/src/http/service/service_v2.rs (7)

25-30: LGTM: Clean state flags structure for dynamic endpoint control.

The StateFlags struct provides a clear and organized way to track endpoint enablement state with appropriate defaults.


37-41: LGTM: Consistent initialization of state flags.

The flags are properly initialized to disabled state, aligning with the new dynamic enablement approach.


86-89: LGTM: Appropriate default value changes for dynamic endpoint management.

Changing defaults to false for chat and completion endpoints aligns with the new dynamic enablement model where endpoints are enabled based on available models.


145-156: LGTM: Well-implemented endpoint toggle method with proper error handling.

The enable_chat_endpoints method correctly uses write locks and provides appropriate logging for both success and failure cases.


158-169: LGTM: Consistent implementation for completion endpoint control.

The enable_cmpl_endpoints method follows the same pattern as the chat method with proper locking and logging.


171-182: LGTM: Complete endpoint control API with consistent error handling.

The enable_embeddings_endpoints method completes the endpoint control API with the same reliable pattern.


192-197: LGTM: Proper initialization of runtime flags from configuration.

The initial flag setting correctly transfers configuration values to the runtime state flags during service construction.

Comment on lines +241 to +266
// Add completions route with conditional middleware
let state_cmpl = state.clone();
let (cmpl_docs, cmpl_route) = super::openai::completions_router(state_cmpl, None);

let state_cmpl_route = state.clone();
let cmpl_route = cmpl_route.route_layer(axum::middleware::from_fn(
move |req, next: axum::middleware::Next| {
let state_api = state_cmpl_route.clone();
async move {
let enabled = {
let guard = state_api
.flags
.read()
.map_err(|_| axum::http::StatusCode::INTERNAL_SERVER_ERROR)?;
guard.cmpl_endpoints_enabled
};

if enabled {
Ok(next.run(req).await)
} else {
Err(axum::http::StatusCode::NOT_FOUND)
}
}
},
));
routes.push((cmpl_docs, cmpl_route));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Apply consistent error handling pattern for completion endpoints.

The completion endpoint middleware has the same issues as the chat middleware regarding async locks and error handling.

Apply the same fixes as suggested for the chat middleware, using async locks and proper response conversion.

🤖 Prompt for AI Agents
In lib/llm/src/http/service/service_v2.rs around lines 241 to 266, the
completion endpoint middleware uses synchronous locks and lacks proper error
handling similar to the chat middleware. Refactor the middleware to use async
locks for accessing the flags and ensure errors are converted into appropriate
HTTP responses consistently. Follow the pattern used in the chat middleware to
handle locking asynchronously and convert errors into proper axum responses.

Comment on lines +268 to +291
// Add embeddings route with conditional middleware
let (embed_docs, embed_route) = super::openai::embeddings_router(state.clone(), None);
let state_embed_route = state.clone();
let embed_route = embed_route.route_layer(axum::middleware::from_fn(
move |req, next: axum::middleware::Next| {
let state_api = state_embed_route.clone();
async move {
let enabled = {
let guard = state_api
.flags
.read()
.map_err(|_| axum::http::StatusCode::INTERNAL_SERVER_ERROR)?;
guard.embeddings_endpoints_enabled
};

if enabled {
Ok(next.run(req).await)
} else {
Err(axum::http::StatusCode::NOT_FOUND)
}
}
},
));
routes.push((embed_docs, embed_route));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Apply consistent error handling pattern for embeddings endpoints.

The embeddings endpoint middleware should use the same async lock and error handling improvements as the other endpoints.

Apply the same fixes as suggested for the chat middleware, using async locks and proper response conversion.

🤖 Prompt for AI Agents
In lib/llm/src/http/service/service_v2.rs between lines 268 and 291, the
embeddings endpoint middleware uses a synchronous read lock and returns raw
status codes on error, which is inconsistent with other endpoints. Refactor the
middleware to use an async lock (e.g., tokio::sync::RwLock's async read) for
accessing the flags, and convert errors into proper HTTP responses using the
same error handling pattern as the chat middleware. This includes awaiting the
lock asynchronously and mapping errors to appropriate axum response types.

Comment on lines +213 to +238
// Add chat completions route with conditional middleware
let (chat_docs, chat_route) =
super::openai::chat_completions_router(state.clone(), config.request_template, None);
let state_chat_route = state.clone();
let chat_route = chat_route.route_layer(axum::middleware::from_fn(
move |req, next: axum::middleware::Next| {
let state = state_chat_route.clone();
async move {
// Read the flag value and drop the lock before async operations
let enabled = match state.flags.read() {
Ok(guard) => guard.chat_endpoints_enabled,
Err(e) => {
tracing::error!("Failed to read feature flags: {}", e);
return Err(axum::http::StatusCode::INTERNAL_SERVER_ERROR);
}
};

if enabled {
Ok(next.run(req).await)
} else {
tracing::debug!("Chat endpoints are disabled");
Err(axum::http::StatusCode::SERVICE_UNAVAILABLE)
}
}
},
));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling and consider async lock usage.

The middleware correctly implements runtime endpoint checking, but has error handling concerns and uses blocking locks in async context.

Current issues:

  1. Uses blocking RwLock::read() in async middleware
  2. Inconsistent error return type (Err(StatusCode) instead of Response)

Consider this pattern with async locks:

let chat_route = chat_route.route_layer(axum::middleware::from_fn(
    move |req, next: axum::middleware::Next| {
        let state = state_chat_route.clone();
        async move {
-           let enabled = match state.flags.read() {
+           let enabled = match state.flags.read().await {
                Ok(guard) => guard.chat_endpoints_enabled,
                Err(e) => {
                    tracing::error!("Failed to read feature flags: {}", e);
-                   return Err(axum::http::StatusCode::INTERNAL_SERVER_ERROR);
+                   return Ok(axum::http::StatusCode::INTERNAL_SERVER_ERROR.into_response());
                }
            };

            if enabled {
                Ok(next.run(req).await)
            } else {
                tracing::debug!("Chat endpoints are disabled");
-               Err(axum::http::StatusCode::SERVICE_UNAVAILABLE)
+               Ok(axum::http::StatusCode::SERVICE_UNAVAILABLE.into_response())
            }
        }
    },
));

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In lib/llm/src/http/service/service_v2.rs around lines 213 to 238, the
middleware uses a blocking RwLock::read() in an async context and returns
Err(StatusCode) which is inconsistent with expected response types. Replace the
blocking lock with an async-compatible lock (e.g., tokio::sync::RwLock) and
await the read lock acquisition. Also, modify the error handling to return a
proper HTTP response instead of Err(StatusCode) to maintain consistent return
types in the middleware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external-contribution Pull request is from an external contributor feat size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant