Running two different models on one GPU #1046
Unanswered
daytonturner
asked this question in
Q&A
Replies: 1 comment 1 reply
-
This is the correct way to go about it. Use |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have an A6000 48GB, and I'd like to be able to serve a quantized Llama-2 and WizardCoder, both of which can easily fit inside the 48GB available, but I'm unsure the best way to go about this - or if its a bad idea for some reason?
Initially, I thought simply running two TGI instances, each pointing to the respective model would be a reasonable approach, but I'm wondering if my assumptions are correct? Any thoughts?
Beta Was this translation helpful? Give feedback.
All reactions