H100 & TGI getting 25 tokens per second? #1952
daz-williams
started this conversation in
General
Replies: 1 comment
-
@OlivierDehaene any ideas please? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
According to this HF blog, you could get up to 1200 tokens per second:
https://huggingface.co/blog/optimum-nvidia
However, in my custom app with TGI, I'm seeing 25 tokens per second with a H100 on vast.ai .... while with the same app and using a 4090, I'm seeing 135 tokens per second?
This doesn't seem right? If you personally get more than that with a H100, please share your TGI config.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions