Replies: 1 comment
-
Hi @khasinski and thank you for the interest in RubyLLM! RubyLLM is designed to be a client for LLMs, not a model host. Adding model serving capabilities would completely change the performance profile - from being IO-bound to CPU/GPU/memory-bound. That's a fundamentally different library with different concerns. I'd recommend keeping your ONNX runtime implementation separate, and if you want, build a provider for RubyLLM that speaks to your server over HTTP. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey, I'm currently using ONNX models that I run with ONNX runtime for faster embeddings. I'm planning on extending it a bit to support token generating models (something with an API similar to ONNX genai). If you add the docs on adding providers I'd be happy to write a wrapper that translates both formats so I could use your DSL for interacting with those models.
Config would probably be just a link to a HF repo and calling those models wouldn't need any HTTP, just regular function calls.
Beta Was this translation helpful? Give feedback.
All reactions