-
Notifications
You must be signed in to change notification settings - Fork 419
feat: Dynamic Endpoint Exposure Based on Model Type #1447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👋 Hi abrarshivani! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughThe changes implement dynamic HTTP endpoint control based on model lifecycle events. Model addition or removal triggers asynchronous notifications, which update the availability of chat, completion, and embedding endpoints in the HTTP service at runtime. This is achieved through new runtime flags and middleware, replacing previous static configuration and route inclusion. Changes
Sequence Diagram(s)sequenceDiagram
participant ModelWatcher
participant HttpService
participant Client
ModelWatcher->>HttpService: Send ModelUpdate (Added/Removed)
HttpService->>HttpService: update_http_endpoints()
Client->>HttpService: HTTP request to endpoint
HttpService->>HttpService: Check endpoint flag (middleware)
alt Endpoint enabled
HttpService->>Client: Process request normally
else Endpoint disabled
HttpService->>Client: Return 503/404 error
end
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🔭 Outside diff range comments (1)
lib/llm/src/http/service/service_v2.rs (1)
307-311
:⚠️ Potential issueBuild-breaking bug:
with_request_template
nestsOption
request_template
already has typeOption<RequestTemplate>
, but the setter wraps it inSome(_)
, yieldingOption<Option<_>>
and causing a type mismatch.-pub fn with_request_template(mut self, request_template: Option<RequestTemplate>) -> Self { - self.request_template = Some(request_template); +pub fn with_request_template(mut self, request_template: Option<RequestTemplate>) -> Self { + self.request_template = request_template; self }
🧹 Nitpick comments (7)
lib/llm/src/discovery/watcher.rs (3)
35-39
: Add documentation for the publicModelUpdate
enum.This enum is part of the public API and should be documented to help users understand its purpose and usage.
Apply this diff to add documentation:
+/// Represents a model lifecycle event for dynamic endpoint management. +/// +/// This enum is used to notify external components about model additions +/// and removals, enabling dynamic control of HTTP endpoints based on +/// available model types. #[derive(Debug, Clone, Copy, PartialEq)] pub enum ModelUpdate { + /// A model of the specified type has been added to the system. Added(ModelType), + /// A model of the specified type has been removed from the system. Removed(ModelType), }
67-69
: Add documentation for the public method.This method is part of the public API and should be documented.
Apply this diff to add documentation:
+ /// Sets the channel sender for model update notifications. + /// + /// When set, the watcher will send `ModelUpdate` events through this channel + /// whenever models are added or removed from the system. pub fn set_notify_on_model_update(&mut self, tx: Sender<ModelUpdate>) { self.model_update_tx = Some(tx); }
120-124
: Consider logging when channel send fails.The
.ok()
silently ignores send errors, which could occur if the receiver is dropped. Consider logging these failures for better observability.Apply this diff to log send failures:
if let Some(tx) = &self.model_update_tx { - tx.send(ModelUpdate::Added(model_entry.model_type)) - .await - .ok(); + if let Err(e) = tx.send(ModelUpdate::Added(model_entry.model_type)).await { + tracing::warn!("Failed to send model update notification: {}", e); + } }launch/dynamo-run/src/input/http.rs (2)
125-126
: Consider increasing the channel buffer size.The channel buffer is set to 1, which might cause backpressure if multiple model updates occur in quick succession. Consider using a larger buffer or an unbounded channel.
Apply this diff to increase the buffer size:
// Create a channel to receive model type updates - let (tx, mut rx) = tokio::sync::mpsc::channel(1); + let (tx, mut rx) = tokio::sync::mpsc::channel(32);
156-157
: Consider logging when Backend model updates are received.Since Backend models don't trigger endpoint changes, it might be helpful to log when they're received to aid in debugging.
Apply this diff to add logging:
- ModelType::Backend => {} + ModelType::Backend => { + tracing::debug!("Backend model added - no endpoint changes required"); + }lib/llm/src/http/service/service_v2.rs (2)
145-182
:enable_*
helpers are triplicated – extract a common setter
The three nearly-identical methods differ only in the field that is toggled and the log message. Consolidating them into one generic helper (or macro) will reduce boilerplate and prevent future inconsistencies.Example sketch:
fn set_flag<F>(&self, f: F, flag_name: &str, enable: bool) where F: FnOnce(&mut StateFlags) -> &mut bool, { match self.state.flags.write() { Ok(mut flags) => { *f(&mut *flags) = enable; tracing::info!("{flag_name} {}", if enable { "enabled" } else { "disabled" }); } Err(e) => tracing::error!("Failed to acquire write lock for {flag_name}: {e}"), } }
214-292
: Different HTTP codes for the same “disabled” condition
Chat returns503 Service Unavailable
, while completions/embeddings return404 Not Found
. The difference is confusing for clients toggling models dynamically. Pick one status (typically503
with aRetry-After
header) for all disabled endpoints, or document the rationale.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
launch/dynamo-run/src/input/http.rs
(5 hunks)lib/llm/src/discovery.rs
(1 hunks)lib/llm/src/discovery/watcher.rs
(5 hunks)lib/llm/src/http/service/service_v2.rs
(6 hunks)
🔇 Additional comments (3)
lib/llm/src/discovery.rs (1)
10-11
: LGTM! Clean re-export of the newModelUpdate
enum.The change appropriately expands the public API to include the
ModelUpdate
enum alongsideModelWatcher
, enabling external modules to subscribe to model lifecycle events.launch/dynamo-run/src/input/http.rs (1)
65-68
: 💡 Verification agent🧩 Analysis chain
Verify that chat completion endpoints include both
/v1/chat/completions
and/chat/completions
.The code enables chat endpoints after adding models. Ensure that both OpenAI-compatible and legacy endpoints are properly enabled.
🏁 Script executed:
#!/bin/bash # Description: Check the implementation of enable_chat_endpoints to verify it enables all expected routes # Search for the enable_chat_endpoints implementation ast-grep --pattern 'fn enable_chat_endpoints($$$) { $$$ }'Length of output: 62
Let’s locate the actual implementation to confirm which routes are registered:
🏁 Script executed:
#!/bin/bash # Search for the enable_chat_endpoints method and show surrounding lines rg -n "enable_chat_endpoints" -C5 --type rsLength of output: 70
Let’s try locating the function definition in all Rust files without restricting by type:
🏁 Script executed:
#!/bin/bash # Find the enable_chat_endpoints function in any .rs file rg -n "fn enable_chat_endpoints" -C5 -g '*.rs'Length of output: 964
Let’s inspect where the flag is used and whether both routes are registered:
🏁 Script executed:
#!/bin/bash # Locate flag usage in service_v2.rs rg -n 'chat_endpoints_enabled' -C10 lib/llm/src/http/service/service_v2.rs # Search for explicit registration of each endpoint rg -n '/v1/chat/completions' -C5 -g '*.rs' rg -n '/chat/completions' -C5 -g '*.rs'Length of output: 16955
Add legacy
/chat/completions
route alongside/v1/chat/completions
The current build registers only the default
/v1/chat/completions
endpoint. To maintain backward compatibility, you should also register the legacy/chat/completions
path.Locations to update:
- In
lib/llm/src/http/service/service_v2.rs
insideHttpServiceConfigBuilder::build
(around line 213): after you configurechat_route
, invokechat_completions_router
again withSome("/chat/completions".into())
and merge it into your router.- Optionally, in
lib/llm/src/http/service/openai.rs
you could updatechat_completions_router
so that whenpath
isNone
it registers both paths automatically.Example diff in
service_v2.rs
:@@ // Add chat completions route with conditional middleware - let (chat_docs, chat_route) = - super::openai::chat_completions_router(state.clone(), config.request_template, None); + let (chat_docs, chat_route) = + super::openai::chat_completions_router(state.clone(), config.request_template, None); let chat_route = chat_route.route_layer(axum::middleware::from_fn(...)); + // Support legacy endpoint + let (_legacy_docs, legacy_route) = + super::openai::chat_completions_router( + state.clone(), + config.request_template, + Some("/chat/completions".into()), + ); + let legacy_route = legacy_route.route_layer(axum::middleware::from_fn(...)); + + // Merge both routers + router = router + .merge(chat_route) + .merge(legacy_route);Likely an incorrect or invalid review comment.
lib/llm/src/http/service/service_v2.rs (1)
193-197
: Lock-poisoning is silently swallowed
If a previous panic poisoned theflags
lock, the write attempt fails and the configuration silently remains stale. Consider propagating the error or at least returning aResult
so callers can react instead of continuing in an undefined state.
let _endpoint_enabler_task = tokio::spawn(async move { | ||
while let Some(model_type) = rx.recv().await { | ||
tracing::debug!("Received model type update: {:?}", model_type); | ||
update_http_endpoints(http_service.clone(), model_type); | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Store the spawned task handle for proper lifecycle management.
The spawned task is not stored, which means it cannot be properly joined or cancelled. This could lead to the task continuing to run after the watcher has stopped.
You should store the task handle and ensure proper cleanup. Consider returning it from the function or storing it in a struct field for proper lifecycle management.
🤖 Prompt for AI Agents
In launch/dynamo-run/src/input/http.rs around lines 130 to 135, the spawned task
handle from tokio::spawn is not stored, preventing proper lifecycle management
such as joining or cancelling the task. To fix this, assign the result of
tokio::spawn to a variable that is either returned from the function or stored
in a struct field, ensuring the task handle can be awaited or aborted as needed
for cleanup.
/// Updates HTTP service endpoints based on available model types | ||
fn update_http_endpoints(service: Arc<HttpService>, model_type: ModelUpdate) { | ||
tracing::debug!( | ||
"Updating HTTP service endpoints for model type: {:?}", | ||
model_type | ||
); | ||
match model_type { | ||
ModelUpdate::Added(model_type) => match model_type { | ||
ModelType::Chat => service.enable_chat_endpoints(true), | ||
ModelType::Completion => service.enable_cmpl_endpoints(true), | ||
ModelType::Embedding => service.enable_embeddings_endpoints(true), | ||
ModelType::Backend => {} | ||
}, | ||
ModelUpdate::Removed(model_type) => match model_type { | ||
ModelType::Chat => service.enable_chat_endpoints(false), | ||
ModelType::Completion => service.enable_cmpl_endpoints(false), | ||
ModelType::Embedding => service.enable_embeddings_endpoints(false), | ||
ModelType::Backend => {} | ||
}, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Consider handling Backend model type for endpoint management.
The Backend
model type doesn't trigger any endpoint changes, but according to the watcher code, Backend models support both chat and completions. Verify if this is intentional.
Based on the watcher implementation, Backend models register both chat and completion engines. Should we enable both endpoints when a Backend model is added?
🏁 Script executed:
#!/bin/bash
# Description: Verify how Backend models are handled in the watcher
# Search for Backend model handling in handle_put
ast-grep --pattern 'ModelType::Backend => {
$$$
}'
Length of output: 54
I’ll search for any references to ModelType::Backend
and inspect the watcher’s handle_put
implementation to confirm how Backend models are treated.
🏁 Script executed:
#!/bin/bash
# Search for direct references to Backend in the ModelType enum usage
rg "ModelType::Backend" -n .
# Locate the watcher’s handle_put function and inspect surrounding code
rg "fn handle_put" -n launch
# If found, show context around ModelType matching in handle_put
rg -A5 -B5 "ModelType::" -n launch
Length of output: 5535
I’ll inspect how the watcher handles ModelType::Backend
to confirm which engines it registers:
🏁 Script executed:
#!/bin/bash
# Show the watcher’s handling of Backend models
rg -A5 -B5 "ModelType::Backend =>" -n lib/llm/src/discovery/watcher.rs
Length of output: 527
Enable chat and completion endpoints for Backend models
The watcher explicitly registers both chat and completion engines for ModelType::Backend
, so the HTTP service should mirror that by toggling both endpoint sets when a Backend model is added or removed.
Please update launch/dynamo-run/src/input/http.rs
accordingly:
fn update_http_endpoints(service: Arc<HttpService>, model_type: ModelUpdate) {
tracing::debug!(
"Updating HTTP service endpoints for model type: {:?}",
model_type
);
match model_type {
ModelUpdate::Added(model_type) => match model_type {
ModelType::Chat => service.enable_chat_endpoints(true),
ModelType::Completion => service.enable_cmpl_endpoints(true),
ModelType::Embedding => service.enable_embeddings_endpoints(true),
- ModelType::Backend => {}
+ ModelType::Backend => {
+ service.enable_chat_endpoints(true);
+ service.enable_cmpl_endpoints(true);
+ }
},
ModelUpdate::Removed(model_type) => match model_type {
ModelType::Chat => service.enable_chat_endpoints(false),
ModelType::Completion => service.enable_cmpl_endpoints(false),
ModelType::Embedding => service.enable_embeddings_endpoints(false),
- ModelType::Backend => {}
+ ModelType::Backend => {
+ service.enable_chat_endpoints(false);
+ service.enable_cmpl_endpoints(false);
+ }
},
}
}
This ensures the HTTP layer stays in sync with the watcher’s multi‐engine Backend handling.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
/// Updates HTTP service endpoints based on available model types | |
fn update_http_endpoints(service: Arc<HttpService>, model_type: ModelUpdate) { | |
tracing::debug!( | |
"Updating HTTP service endpoints for model type: {:?}", | |
model_type | |
); | |
match model_type { | |
ModelUpdate::Added(model_type) => match model_type { | |
ModelType::Chat => service.enable_chat_endpoints(true), | |
ModelType::Completion => service.enable_cmpl_endpoints(true), | |
ModelType::Embedding => service.enable_embeddings_endpoints(true), | |
ModelType::Backend => {} | |
}, | |
ModelUpdate::Removed(model_type) => match model_type { | |
ModelType::Chat => service.enable_chat_endpoints(false), | |
ModelType::Completion => service.enable_cmpl_endpoints(false), | |
ModelType::Embedding => service.enable_embeddings_endpoints(false), | |
ModelType::Backend => {} | |
}, | |
} | |
} | |
/// Updates HTTP service endpoints based on available model types | |
fn update_http_endpoints(service: Arc<HttpService>, model_type: ModelUpdate) { | |
tracing::debug!( | |
"Updating HTTP service endpoints for model type: {:?}", | |
model_type | |
); | |
match model_type { | |
ModelUpdate::Added(model_type) => match model_type { | |
ModelType::Chat => service.enable_chat_endpoints(true), | |
ModelType::Completion => service.enable_cmpl_endpoints(true), | |
ModelType::Embedding => service.enable_embeddings_endpoints(true), | |
ModelType::Backend => { | |
service.enable_chat_endpoints(true); | |
service.enable_cmpl_endpoints(true); | |
} | |
}, | |
ModelUpdate::Removed(model_type) => match model_type { | |
ModelType::Chat => service.enable_chat_endpoints(false), | |
ModelType::Completion => service.enable_cmpl_endpoints(false), | |
ModelType::Embedding => service.enable_embeddings_endpoints(false), | |
ModelType::Backend => { | |
service.enable_chat_endpoints(false); | |
service.enable_cmpl_endpoints(false); | |
} | |
}, | |
} | |
} |
🤖 Prompt for AI Agents
In launch/dynamo-run/src/input/http.rs around lines 145 to 165, the Backend
model type currently does not trigger any endpoint changes in
update_http_endpoints. Since the watcher registers both chat and completion
engines for Backend models, update the function to enable or disable both chat
and completion endpoints when a Backend model is added or removed. Modify the
match arms for ModelType::Backend to call service.enable_chat_endpoints and
service.enable_cmpl_endpoints with the appropriate boolean value.
@@ -1,7 +1,7 @@ | |||
// SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |||
// SPDX-License-Identifier: Apache-2.0 | |||
|
|||
use std::sync::Arc; | |||
use std::sync::{Arc, RwLock}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use an async-friendly RwLock
to avoid blocking the Tokio runtime
Every request hits state.flags.read()
; with std::sync::RwLock
this can park the entire executor thread when contention occurs, defeating cooperative multitasking and hurting tail-latency.
Switch to tokio::sync::RwLock
(or parking_lot::RwLock
) and update imports/usages accordingly.
-use std::sync::{Arc, RwLock};
+use std::sync::Arc;
+use tokio::sync::RwLock;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
use std::sync::{Arc, RwLock}; | |
-use std::sync::{Arc, RwLock}; | |
+use std::sync::Arc; | |
+use tokio::sync::RwLock; |
🤖 Prompt for AI Agents
In lib/llm/src/http/service/service_v2.rs at line 4, replace the import of
std::sync::RwLock with tokio::sync::RwLock to use an async-friendly lock. Update
all usages of RwLock in the file to work with the async version, ensuring that
calls to read() or write() are awaited properly to prevent blocking the Tokio
runtime and improve concurrency.
c680058
to
11c31be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (3)
launch/dynamo-run/src/input/http.rs (2)
126-137
:⚠️ Potential issueCritical issue: Task handle not stored for proper lifecycle management.
The spawned task handle is not stored, preventing proper cleanup and potentially causing the task to run indefinitely.
Apply this diff to store the task handle:
- // Spawn a task to watch for model type changes and update HTTP service endpoints - let _endpoint_enabler_task = tokio::spawn(async move { + // Spawn a task to watch for model type changes and update HTTP service endpoints + let endpoint_enabler_task = tokio::spawn(async move { while let Some(model_type) = rx.recv().await { tracing::debug!("Received model type update: {:?}", model_type); update_http_endpoints(http_service.clone(), model_type); } });Consider returning the task handle from this function or storing it in a struct for proper lifecycle management.
158-158
:⚠️ Potential issueMissing functionality: Backend model type should enable both chat and completion endpoints.
The
ModelType::Backend
case is empty but should enable both chat and completion endpoints since the watcher registers both engines for Backend models.Apply this diff to handle Backend models correctly:
ModelType::Backend => {} + ModelType::Backend => { + service.enable_chat_endpoints(true); + service.enable_cmpl_endpoints(true); + } }, ModelUpdate::Removed(model_type) => match model_type { ModelType::Chat => service.enable_chat_endpoints(false), ModelType::Completion => service.enable_cmpl_endpoints(false), ModelType::Embedding => service.enable_embeddings_endpoints(false), - ModelType::Backend => {} + ModelType::Backend => { + service.enable_chat_endpoints(false); + service.enable_cmpl_endpoints(false); + }Also applies to: 164-164
lib/llm/src/http/service/service_v2.rs (1)
4-4
: 🛠️ Refactor suggestionUse async-friendly RwLock to avoid blocking the Tokio runtime.
The middleware functions acquire read locks on every request, and
std::sync::RwLock
can block the entire executor thread under contention, hurting performance.Switch to
tokio::sync::RwLock
:-use std::sync::{Arc, RwLock}; +use std::sync::Arc; +use tokio::sync::RwLock;Update all lock operations to use
.await
:
state.flags.read()
→state.flags.read().await
state.flags.write()
→state.flags.write().await
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
launch/dynamo-run/src/input/http.rs
(5 hunks)lib/llm/src/discovery.rs
(1 hunks)lib/llm/src/discovery/watcher.rs
(5 hunks)lib/llm/src/http/service/service_v2.rs
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- lib/llm/src/discovery.rs
- lib/llm/src/discovery/watcher.rs
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (13)
launch/dynamo-run/src/input/http.rs (6)
10-14
: LGTM: Clean import additions for dynamic endpoint management.The new imports for
ModelUpdate
,HttpService
, andModelType
are necessary for the dynamic endpoint functionality and properly organized.
51-51
: LGTM: Proper Arc wrapping for shared HTTP service.The
Arc::new(http_service.clone())
correctly wraps the HttpService for shared ownership across async tasks.
65-69
: LGTM: Appropriate static endpoint enabling placement.Moving the endpoint enabling after model registration ensures the endpoints are available when the models are ready to serve requests.
90-94
: LGTM: Consistent endpoint enabling for StaticCore configuration.The endpoint enabling pattern is consistently applied across both StaticFull and StaticCore configurations.
119-119
: LGTM: Function signature updated for dynamic endpoint management.The addition of the
http_service: Arc<HttpService>
parameter enables the watcher to dynamically control HTTP endpoints.
147-167
: LGTM: Well-structured endpoint update function with proper logging.The function correctly matches on
ModelUpdate
variants and toggles the appropriate endpoints. The debug logging provides good visibility into the dynamic endpoint management.lib/llm/src/http/service/service_v2.rs (7)
25-30
: LGTM: Clean state flags structure for dynamic endpoint control.The
StateFlags
struct provides a clear and organized way to track endpoint enablement state with appropriate defaults.
37-41
: LGTM: Consistent initialization of state flags.The flags are properly initialized to disabled state, aligning with the new dynamic enablement approach.
86-89
: LGTM: Appropriate default value changes for dynamic endpoint management.Changing defaults to
false
for chat and completion endpoints aligns with the new dynamic enablement model where endpoints are enabled based on available models.
145-156
: LGTM: Well-implemented endpoint toggle method with proper error handling.The
enable_chat_endpoints
method correctly uses write locks and provides appropriate logging for both success and failure cases.
158-169
: LGTM: Consistent implementation for completion endpoint control.The
enable_cmpl_endpoints
method follows the same pattern as the chat method with proper locking and logging.
171-182
: LGTM: Complete endpoint control API with consistent error handling.The
enable_embeddings_endpoints
method completes the endpoint control API with the same reliable pattern.
192-197
: LGTM: Proper initialization of runtime flags from configuration.The initial flag setting correctly transfers configuration values to the runtime state flags during service construction.
// Add completions route with conditional middleware | ||
let state_cmpl = state.clone(); | ||
let (cmpl_docs, cmpl_route) = super::openai::completions_router(state_cmpl, None); | ||
|
||
let state_cmpl_route = state.clone(); | ||
let cmpl_route = cmpl_route.route_layer(axum::middleware::from_fn( | ||
move |req, next: axum::middleware::Next| { | ||
let state_api = state_cmpl_route.clone(); | ||
async move { | ||
let enabled = { | ||
let guard = state_api | ||
.flags | ||
.read() | ||
.map_err(|_| axum::http::StatusCode::INTERNAL_SERVER_ERROR)?; | ||
guard.cmpl_endpoints_enabled | ||
}; | ||
|
||
if enabled { | ||
Ok(next.run(req).await) | ||
} else { | ||
Err(axum::http::StatusCode::NOT_FOUND) | ||
} | ||
} | ||
}, | ||
)); | ||
routes.push((cmpl_docs, cmpl_route)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Apply consistent error handling pattern for completion endpoints.
The completion endpoint middleware has the same issues as the chat middleware regarding async locks and error handling.
Apply the same fixes as suggested for the chat middleware, using async locks and proper response conversion.
🤖 Prompt for AI Agents
In lib/llm/src/http/service/service_v2.rs around lines 241 to 266, the
completion endpoint middleware uses synchronous locks and lacks proper error
handling similar to the chat middleware. Refactor the middleware to use async
locks for accessing the flags and ensure errors are converted into appropriate
HTTP responses consistently. Follow the pattern used in the chat middleware to
handle locking asynchronously and convert errors into proper axum responses.
// Add embeddings route with conditional middleware | ||
let (embed_docs, embed_route) = super::openai::embeddings_router(state.clone(), None); | ||
let state_embed_route = state.clone(); | ||
let embed_route = embed_route.route_layer(axum::middleware::from_fn( | ||
move |req, next: axum::middleware::Next| { | ||
let state_api = state_embed_route.clone(); | ||
async move { | ||
let enabled = { | ||
let guard = state_api | ||
.flags | ||
.read() | ||
.map_err(|_| axum::http::StatusCode::INTERNAL_SERVER_ERROR)?; | ||
guard.embeddings_endpoints_enabled | ||
}; | ||
|
||
if enabled { | ||
Ok(next.run(req).await) | ||
} else { | ||
Err(axum::http::StatusCode::NOT_FOUND) | ||
} | ||
} | ||
}, | ||
)); | ||
routes.push((embed_docs, embed_route)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Apply consistent error handling pattern for embeddings endpoints.
The embeddings endpoint middleware should use the same async lock and error handling improvements as the other endpoints.
Apply the same fixes as suggested for the chat middleware, using async locks and proper response conversion.
🤖 Prompt for AI Agents
In lib/llm/src/http/service/service_v2.rs between lines 268 and 291, the
embeddings endpoint middleware uses a synchronous read lock and returns raw
status codes on error, which is inconsistent with other endpoints. Refactor the
middleware to use an async lock (e.g., tokio::sync::RwLock's async read) for
accessing the flags, and convert errors into proper HTTP responses using the
same error handling pattern as the chat middleware. This includes awaiting the
lock asynchronously and mapping errors to appropriate axum response types.
// Add chat completions route with conditional middleware | ||
let (chat_docs, chat_route) = | ||
super::openai::chat_completions_router(state.clone(), config.request_template, None); | ||
let state_chat_route = state.clone(); | ||
let chat_route = chat_route.route_layer(axum::middleware::from_fn( | ||
move |req, next: axum::middleware::Next| { | ||
let state = state_chat_route.clone(); | ||
async move { | ||
// Read the flag value and drop the lock before async operations | ||
let enabled = match state.flags.read() { | ||
Ok(guard) => guard.chat_endpoints_enabled, | ||
Err(e) => { | ||
tracing::error!("Failed to read feature flags: {}", e); | ||
return Err(axum::http::StatusCode::INTERNAL_SERVER_ERROR); | ||
} | ||
}; | ||
|
||
if enabled { | ||
Ok(next.run(req).await) | ||
} else { | ||
tracing::debug!("Chat endpoints are disabled"); | ||
Err(axum::http::StatusCode::SERVICE_UNAVAILABLE) | ||
} | ||
} | ||
}, | ||
)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve error handling and consider async lock usage.
The middleware correctly implements runtime endpoint checking, but has error handling concerns and uses blocking locks in async context.
Current issues:
- Uses blocking
RwLock::read()
in async middleware - Inconsistent error return type (
Err(StatusCode)
instead ofResponse
)
Consider this pattern with async locks:
let chat_route = chat_route.route_layer(axum::middleware::from_fn(
move |req, next: axum::middleware::Next| {
let state = state_chat_route.clone();
async move {
- let enabled = match state.flags.read() {
+ let enabled = match state.flags.read().await {
Ok(guard) => guard.chat_endpoints_enabled,
Err(e) => {
tracing::error!("Failed to read feature flags: {}", e);
- return Err(axum::http::StatusCode::INTERNAL_SERVER_ERROR);
+ return Ok(axum::http::StatusCode::INTERNAL_SERVER_ERROR.into_response());
}
};
if enabled {
Ok(next.run(req).await)
} else {
tracing::debug!("Chat endpoints are disabled");
- Err(axum::http::StatusCode::SERVICE_UNAVAILABLE)
+ Ok(axum::http::StatusCode::SERVICE_UNAVAILABLE.into_response())
}
}
},
));
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In lib/llm/src/http/service/service_v2.rs around lines 213 to 238, the
middleware uses a blocking RwLock::read() in an async context and returns
Err(StatusCode) which is inconsistent with expected response types. Replace the
blocking lock with an async-compatible lock (e.g., tokio::sync::RwLock) and
await the read lock acquisition. Also, modify the error handling to return a
proper HTTP response instead of Err(StatusCode) to maintain consistent return
types in the middleware.
Overview:
This PR enables dynamic management of HTTP endpoints (chat, completion, embeddings) based on model availability.
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Bug Fixes
Refactor