Do you have any plan to expand the scope to CPU? #1118
mengniwang95
started this conversation in
General
Replies: 1 comment
-
Any updates on this? Looking for one click installers to connect multiple llm models for testing on Intel iris xe gpu on a Lenovo i7 seems impossible to get anything running even 50% to what the other nvda are capable of running. I just got this thing as a gift otherwise would replace it. Thanks in advance |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I find this repo mainly focus on LLM inference on GPUs currently. Do you have any plan to expand the scope to CPU?
Our team develop the Intel® Extension for Transformers, which is an innovative toolkit to accelerate Transformer-based models on Intel platforms, in particular effective on 4th Intel Xeon Scalable processor Sapphire Rapids (codenamed Sapphire Rapids). The toolkit provides the below key features:
Seamless user experience of model compressions (include RTN, AWQ, GPTQ, bitsandbytes and other our own algorithms in the future for weight-only quantization) on Transformer-based models by extending Hugging Face transformers APIs and leveraging Intel® Neural Compressor
Advanced software optimizations and unique compression-aware runtime.
Optimized Transformer-based model packages.
NeuralChat, a customizable chatbot framework to create your own chatbot within minutes by leveraging a rich set of plugins and SOTA optimizations.
Inference of Large Language Model (LLM) in pure C/C++ with weight-only quantization kernels.
We want to make some contributions to LLM ecosystem sincerely and TGI is a really popular project. So, is there any chance to integrate part of our work into TGI?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions