🚀 Optimum-ExecuTorch: Major Out-of-the-box Performance Breakthrough for using HuggingFace Transformers on-device #12124
guangy10
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
We're thrilled to announce significant performance improvements in Optimum-ExecuTorch that make deploying Hugging Face models on mobile using ExecuTorch more efficient than ever before!
🔥 Outstanding Out-of-the-box Performance
Our latest optimizations deliver the following performance improvements on CPU backend:
8da4w
quantization while maintain good generation qualityNote that those optimizations are composable and the relative perf grain of each is quantitatively measured by the CI in
optimum-executorch
. Through the combination of all these optimizations, you're able to run for examplegemma-3-1b-it
with about 25 tokens/s on Android phone S22 and 20 tokens/s on basic iPhone 15 out-of-the-box. You can read more in the optimum-executorch repo. Ready to experience these performance gains? Getting started is as simple as following this instruction.🔮 What's Next
We're continuously pushing the boundaries of on-device AI performance. Stay tuned for:
🛠️ Want to Contribute?
🙏 Acknowledgments
Huge thanks to the incredible Hugging Face teams and ExecuTorch developers who made this breakthrough possible!
Beta Was this translation helpful? Give feedback.
All reactions