Skip to content

feat: add embedding support to sgl backend #1481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Conversation

ishandhanani
Copy link
Contributor

@ishandhanani ishandhanani commented Jun 11, 2025

Summary by CodeRabbit

  • New Features
    • Introduced an embedding service with GPU acceleration supporting single and batch inputs.
    • Added an API endpoint for generating embeddings compatible with OpenAI's format.
    • Provided a new configuration file for easy setup and customization of the embedding service.
    • Updated documentation with instructions for running the aggregated embedding service.
  • Enhancements
    • Integrated the embedding worker into the frontend service for streamlined access.
    • Added support for flexible embedding input types and encoding formats.

closes: DYN-532

Copy link

copy-pr-bot bot commented Jun 11, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link
Contributor

coderabbitai bot commented Jun 11, 2025

Warning

Rate limit exceeded

@ishandhanani has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 0 minutes and 21 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 0499a6a and 72aca24.

📒 Files selected for processing (2)
  • examples/sglang/configs/embedding.yaml (1 hunks)
  • examples/sglang/utils/protocol.py (2 hunks)

Walkthrough

A new embedding service infrastructure is introduced, including a worker class to serve embedding models, a frontend integration, protocol definitions for embedding requests, a configuration YAML, and a graph linkage file. The changes collectively enable serving embedding models via SGLang and Dynamo, with OpenAI-compatible request and response handling.

Changes

File(s) Change Summary
examples/sglang/components/embedding_worker.py Added SGLangEmbeddingWorker class to serve embedding models, with async initialization, request handling, and response transformation.
examples/sglang/components/frontend.py Integrated SGLangEmbeddingWorker as a dependency in the Frontend class.
examples/sglang/configs/embedding.yaml Added configuration file for embedding service, specifying model, worker, and frontend parameters.
examples/sglang/graphs/embedding.py Added file linking SGLangEmbeddingWorker to Frontend for embedding service graph setup.
examples/sglang/utils/protocol.py Added type aliases for embedding input and encoding format; introduced EmbeddingRequest Pydantic model.
examples/sglang/README.md Added "Embedding Models" section with example command to run embedding service using SGLang and Dynamo.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Frontend
    participant SGLangEmbeddingWorker
    participant SGLangEngine

    Client->>Frontend: POST /generate (EmbeddingRequest)
    Frontend->>SGLangEmbeddingWorker: generate(request)
    SGLangEmbeddingWorker->>SGLangEngine: async_encode(input)
    SGLangEngine-->>SGLangEmbeddingWorker: Embedding vectors
    SGLangEmbeddingWorker-->>Frontend: OpenAI-compatible response
    Frontend-->>Client: Embedding result
Loading

Poem

In the meadow of code, embeddings now bloom,
SGLang and Dynamo dispel any gloom.
With YAML for guidance and protocols neat,
Vectors are summoned—responses complete!
The worker and frontend, now linked as a pair,
Bring OpenAI magic to rabbits everywhere.
🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (6)
examples/sglang/utils/protocol.py (1)

73-78: Consider honouring encoding_format and dimensions

The spec adds encoding_format (float | base64) and dimensions, but no component in this PR uses them.
At minimum, document that they are ignored or raise NotImplementedError when the caller sets them; silent ignore leads to subtle bugs.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 73-73: Too few public methods (0/2)

(R0903)

examples/sglang/components/frontend.py (1)

21-22: Unused import can be dropped

SGLangEmbeddingWorker is already referenced through depends() below; the explicit import is unnecessary unless you access the class elsewhere in this module.

-from components.embedding_worker import SGLangEmbeddingWorker
+from components.embedding_worker import SGLangEmbeddingWorker  # noqa: F401 – keep if PyDynamo’s import-time registration relies on it

If registration does not rely on the import, consider removing it.

examples/sglang/graphs/embedding.py (1)

17-20: File is fine but missing __all__ guard

Very small nit: add __all__ = [] (or similar) to avoid accidental re-exports when this module is imported elsewhere.

examples/sglang/components/embedding_worker.py (3)

20-24: Remove unused imports

asyncio, random, socket, Any, and depends are unused – they trigger Ruff F401 and increase cold-start time.

-import asyncio
-import random
-import socket
-from typing import Any
-
-from dynamo.sdk import async_on_start, depends, dynamo_context, endpoint, service
+from dynamo.sdk import async_on_start, dynamo_context, endpoint, service
🧰 Tools
🪛 Ruff (0.11.9)

20-20: asyncio imported but unused

Remove unused import: asyncio

(F401)


22-22: random imported but unused

Remove unused import: random

(F401)


23-23: socket imported but unused

Remove unused import: socket

(F401)


24-24: typing.Any imported but unused

Remove unused import: typing.Any

(F401)


82-97: Robustness of _transform_response

ret_item["meta_info"]["prompt_tokens"] will raise KeyError if the engine omits stats. Use .get() with fallback or guard explicitly.

-            prompt_tokens += ret_item["meta_info"]["prompt_tokens"]
+            prompt_tokens += ret_item.get("meta_info", {}).get("prompt_tokens", 0)

82-97: Consider returning encoding_format-aware embeddings

If encoding_format == "base64" the embedding vectors must be base64-encoded per OpenAI spec. Handle here after converting format.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e924a7c and 1dfb1a8.

📒 Files selected for processing (5)
  • examples/sglang/components/embedding_worker.py (1 hunks)
  • examples/sglang/components/frontend.py (2 hunks)
  • examples/sglang/configs/embedding.yaml (1 hunks)
  • examples/sglang/graphs/embedding.py (1 hunks)
  • examples/sglang/utils/protocol.py (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
examples/sglang/graphs/embedding.py (2)
examples/sglang/components/frontend.py (1)
  • Frontend (59-89)
examples/sglang/components/embedding_worker.py (1)
  • SGLangEmbeddingWorker (43-106)
🪛 YAMLlint (1.37.1)
examples/sglang/configs/embedding.yaml

[error] 11-11: duplication of key "is-embedding" in mapping

(key-duplicates)

🪛 Ruff (0.11.9)
examples/sglang/components/embedding_worker.py

20-20: asyncio imported but unused

Remove unused import: asyncio

(F401)


22-22: random imported but unused

Remove unused import: random

(F401)


23-23: socket imported but unused

Remove unused import: socket

(F401)


24-24: typing.Any imported but unused

Remove unused import: typing.Any

(F401)


31-31: dynamo.sdk.depends imported but unused

Remove unused import: dynamo.sdk.depends

(F401)

🪛 Pylint (3.3.7)
examples/sglang/components/embedding_worker.py

[error] 56-56: Class 'SGLangEmbeddingWorker' has no 'dynamo_address' member

(E1101)


[refactor] 70-70: Unnecessary use of a comprehension, use list(request.input) instead.

(R1721)

examples/sglang/utils/protocol.py

[refactor] 73-73: Too few public methods (0/2)

(R0903)

🔇 Additional comments (2)
examples/sglang/components/frontend.py (1)

60-62: Dependency injected but never referenced

embedding_worker = depends(SGLangEmbeddingWorker) sets up the wiring, yet the Frontend class never uses the dependency.
Verify that:

  1. Dynamo still spins the worker even if the attribute is unused (it usually does, but double-check).
  2. You expose an HTTP route or RPC that actually forwards embedding requests to this worker, otherwise the frontend cannot serve embeddings.
examples/sglang/components/embedding_worker.py (1)

56-58: Static analysis flags missing dynamo_address

SGLangEmbeddingWorker.dynamo_address() is called but not defined in this class.
If the method is injected by Dynamo at runtime the call is safe; otherwise, implement it or import the mix-in that provides it.

🧰 Tools
🪛 Pylint (3.3.7)

[error] 56-56: Class 'SGLangEmbeddingWorker' has no 'dynamo_address' member

(E1101)

Copy link
Contributor

coderabbitai bot commented Jun 11, 2025

Walkthrough

A new embedding service infrastructure is added, introducing an embedding worker component, its configuration, and protocol definitions. The worker is integrated with the frontend and registered in the service graph. A new YAML config and protocol types for embedding requests are provided, supporting both string and tokenized inputs, and aligning the response format with OpenAI's API.

Changes

File(s) Change Summary
examples/sglang/components/embedding_worker.py Added SGLangEmbeddingWorker service class for serving embedding models, with async initialization and request handling.
examples/sglang/components/frontend.py Imported SGLangEmbeddingWorker and added as a dependency in the Frontend class.
examples/sglang/configs/embedding.yaml Added new YAML config for embedding service, specifying model, endpoints, and worker resources.
examples/sglang/graphs/embedding.py New file linking SGLangEmbeddingWorker to Frontend in the service graph.
examples/sglang/utils/protocol.py Added embedding protocol types: EmbeddingInput, EncodingFormat, and EmbeddingRequest Pydantic model.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Frontend
    participant SGLangEmbeddingWorker
    participant SGLangEngine
    participant DynamoRuntime

    Client->>Frontend: POST /generate (EmbeddingRequest)
    Frontend->>SGLangEmbeddingWorker: generate(request)
    SGLangEmbeddingWorker->>SGLangEngine: encode(input)
    SGLangEngine-->>SGLangEmbeddingWorker: embedding result
    SGLangEmbeddingWorker->>SGLangEmbeddingWorker: _transform_response(result)
    SGLangEmbeddingWorker-->>Frontend: OpenAI-compatible embedding response
    Frontend-->>Client: Embedding response

    Note over SGLangEmbeddingWorker, DynamoRuntime: On startup, SGLangEmbeddingWorker registers the model with DynamoRuntime
Loading

Poem

In the warren, code now sings,
Embeddings leap on rabbit wings.
A worker wakes, requests to greet,
With tokens, strings, and tidy feet.
Configs bloom, protocols grow—
Now embeddings swiftly flow!
🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
examples/sglang/components/frontend.py (2)

21-22: Prefer explicit-relative import to avoid package-resolution surprises

Inside examples/sglang/components/, importing a sibling module via the absolute name components.embedding_worker only works if examples/sglang is on PYTHONPATH at runtime.
Using a relative import (from .embedding_worker import SGLangEmbeddingWorker) is safer and keeps the component self-contained.

-from components.embedding_worker import SGLangEmbeddingWorker
+from .embedding_worker import SGLangEmbeddingWorker

60-62: Unused dependency: wire it or drop it

embedding_worker is declared as a dependency but never referenced inside Frontend.
If the intent is only service discovery, add a short comment so future readers know it is intentional; otherwise, remove it to keep the dependency graph minimal.

examples/sglang/components/embedding_worker.py (2)

20-25: Remove unused imports to keep module clean

asyncio, random, socket, Any, and depends are never referenced.

-import asyncio
-import random
-import socket
-from typing import Any
-...
-from dynamo.sdk import async_on_start, depends, dynamo_context, endpoint, service
+from dynamo.sdk import async_on_start, dynamo_context, endpoint, service
🧰 Tools
🪛 Ruff (0.11.9)

20-20: asyncio imported but unused

Remove unused import: asyncio

(F401)


22-22: random imported but unused

Remove unused import: random

(F401)


23-23: socket imported but unused

Remove unused import: socket

(F401)


24-24: typing.Any imported but unused

Remove unused import: typing.Any

(F401)


82-106: Consider returning a plain dict instead of yielding once

@endpoint() can accept either a return value or an async generator. If only one response is produced, return response avoids the extra generator wrapper and is easier for clients.

-        response = self._transform_response(g, request.model)
-        yield response
+        response = self._transform_response(g, request.model)
+        return response

Verify that upstream expectations (e.g., streaming vs. single reply) align with this.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e924a7c and 1dfb1a8.

📒 Files selected for processing (5)
  • examples/sglang/components/embedding_worker.py (1 hunks)
  • examples/sglang/components/frontend.py (2 hunks)
  • examples/sglang/configs/embedding.yaml (1 hunks)
  • examples/sglang/graphs/embedding.py (1 hunks)
  • examples/sglang/utils/protocol.py (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
examples/sglang/graphs/embedding.py (2)
examples/sglang/components/frontend.py (1)
  • Frontend (59-89)
examples/sglang/components/embedding_worker.py (1)
  • SGLangEmbeddingWorker (43-106)
🪛 YAMLlint (1.37.1)
examples/sglang/configs/embedding.yaml

[error] 11-11: duplication of key "is-embedding" in mapping

(key-duplicates)

🪛 Pylint (3.3.7)
examples/sglang/utils/protocol.py

[refactor] 73-73: Too few public methods (0/2)

(R0903)

examples/sglang/components/embedding_worker.py

[error] 56-56: Class 'SGLangEmbeddingWorker' has no 'dynamo_address' member

(E1101)


[refactor] 70-70: Unnecessary use of a comprehension, use list(request.input) instead.

(R1721)

🪛 Ruff (0.11.9)
examples/sglang/components/embedding_worker.py

20-20: asyncio imported but unused

Remove unused import: asyncio

(F401)


22-22: random imported but unused

Remove unused import: random

(F401)


23-23: socket imported but unused

Remove unused import: socket

(F401)


24-24: typing.Any imported but unused

Remove unused import: typing.Any

(F401)


31-31: dynamo.sdk.depends imported but unused

Remove unused import: dynamo.sdk.depends

(F401)

🔇 Additional comments (2)
examples/sglang/graphs/embedding.py (1)

17-21: Graph looks correct – link established
The single-line link cleanly connects Frontend to SGLangEmbeddingWorker; no issues spotted.

examples/sglang/utils/protocol.py (1)

73-79: Minor: zero-method Pydantic model warning can be ignored

Pylint flags too-few-public-methods on EmbeddingRequest; that’s normal for schema classes. No change needed.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 73-73: Too few public methods (0/2)

(R0903)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
examples/sglang/utils/protocol.py (1)

66-66: ⚠️ Potential issue

EmbeddingInput still exposes numeric variants that are not supported downstream
The worker implementation (SGLangEmbeddingWorker.generate) still only distinguishes str vs list and forwards the items as-is to async_encode, which expects text. Keeping List[int] / List[List[int]] here will crash at runtime if a client really sends token-ids.

-EmbeddingInput = Union[str, List[str], List[int], List[List[int]]]
+# NOTE: keep numeric inputs disabled until the worker properly converts them
+EmbeddingInput = Union[str, List[str]]

If you want to keep the numeric forms, extend the worker to detect integers and tokenizer.decode() them before calling async_encode.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6899bc8 and 4090901.

📒 Files selected for processing (2)
  • examples/sglang/README.md (1 hunks)
  • examples/sglang/utils/protocol.py (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • examples/sglang/README.md
🧰 Additional context used
🪛 Pylint (3.3.7)
examples/sglang/utils/protocol.py

[refactor] 71-71: Too few public methods (0/2)

(R0903)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Test - vllm

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
examples/sglang/components/embedding_worker.py (1)

60-82: ⚠️ Potential issue

encoding_format still ignored – honour spec or reject unsupported values

This block never inspects request.encoding_format; callers asking for "base64" will silently receive raw float vectors, violating the OpenAI contract raised in the previous review.

@@
-        g = await self.engine.async_encode(
-            prompt=prompt,
-        )
-
-        response = self._transform_response(g, request.model)
+        g = await self.engine.async_encode(prompt=prompt)
+
+        # honour encoding_format per OpenAI spec
+        if request.encoding_format is None or request.encoding_format == "float":
+            transformed = self._transform_response(g, request.model)
+        elif request.encoding_format == "base64":
+            import base64, struct
+            for item in g:
+                item["embedding"] = base64.b64encode(
+                    struct.pack(f"{len(item['embedding'])}f", *item["embedding"])
+                ).decode()
+            transformed = self._transform_response(g, request.model)
+        else:
+            raise ValueError(
+                f"Unsupported encoding_format '{request.encoding_format}'. "
+                "Use 'float' (default) or 'base64'."
+            )
+
+        response = transformed
         yield response

[follows previous feedback]

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 72-72: Unnecessary use of a comprehension, use list(request.input) instead.

(R1721)

🧹 Nitpick comments (1)
examples/sglang/components/embedding_worker.py (1)

66-74: Minor refactor: avoid comprehension & validate nested inputs

A simple list(request.input) avoids the unnecessary comprehension and pylint warning (R1721).
Additionally, nested lists or non-string items are quietly forwarded to SGLang, which will likely fail deeper in the stack. Consider a stricter validation.

-            if len(request.input) == 1:
-                prompt = request.input[0]
-            else:
-                prompt = [i for i in request.input]
+            if len(request.input) == 1:
+                prompt = request.input[0]
+            else:
+                if not all(isinstance(i, str) for i in request.input):
+                    raise ValueError("Embedding input list must contain only strings")
+                prompt = list(request.input)
🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 72-72: Unnecessary use of a comprehension, use list(request.input) instead.

(R1721)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4090901 and 7218854.

📒 Files selected for processing (1)
  • examples/sglang/components/embedding_worker.py (1 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
examples/sglang/components/embedding_worker.py

[error] 51-51: Class 'SGLangEmbeddingWorker' has no 'dynamo_address' member

(E1101)


[refactor] 72-72: Unnecessary use of a comprehension, use list(request.input) instead.

(R1721)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (2)
examples/sglang/components/embedding_worker.py (2)

48-58: Possible runtime AttributeError – double-check dynamo_address helper

SGLangEmbeddingWorker.dynamo_address() is invoked as a class attribute, yet no such member is defined in this class or its bases. The decorator may inject it at runtime, but static analysis flags it (E1101) and IDEs will treat it as an error.

comp_ns, comp_name = self.dynamo_address()     # safer if decorator binds to the *instance*

If the helper is not injected, register_llm will raise at start-up.
Please verify the mix-in behaviour or switch to the instance form as shown.

🧰 Tools
🪛 Pylint (3.3.7)

[error] 51-51: Class 'SGLangEmbeddingWorker' has no 'dynamo_address' member

(E1101)


80-82: Confirm whether yield is intentional

OpenAI’s embedding endpoint is non-streaming; returning a plain dict would suffice and avoids an extra async generator layer. If Dynamo expects streaming, keep as is; otherwise return response is simpler.

@ishandhanani ishandhanani enabled auto-merge (squash) June 12, 2025 06:10
if len(request.input) == 1:
prompt = request.input[0]
else:
prompt = [i for i in request.input]
Copy link
Contributor

@biswapanda biswapanda Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need list comprehension? we've already asserted isinstance(request.input, list)

should we concat the strings or take first string in the list?

Copy link
Contributor

@biswapanda biswapanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm.

There is a redundant list comprehension logic - we need to check/test this code path

@ishandhanani ishandhanani disabled auto-merge June 12, 2025 07:22
@@ -57,6 +58,7 @@ class FrontendConfig(BaseModel):
)
class Frontend:
worker = depends(SGLangWorker)
embedding_worker = depends(SGLangEmbeddingWorker)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need embedding_worker for all deployment?

Copy link
Contributor Author

@ishandhanani ishandhanani Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's required to be here for things to work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants