We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 64d83e1 commit a81d4a5Copy full SHA for a81d4a5
README.md
@@ -1,11 +1,13 @@
1
# PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
2
---
3
4
-*Demo* 🔥
+## Demo 🔥
5
6
-https://github.com/hodlen/PowerInfer/assets/34213478/b782ccc8-0a2a-42b6-a6aa-07b2224a66f7
+https://github.com/SJTU-IPADS/PowerInfer/assets/34213478/d26ae05b-d0cf-40b6-8788-bda3fe447e28
7
8
-<sub>The demo is running with a single 24G 4090 GPU, the model is Falcon (ReLU)-40B, and the precision is FP16.</sub>
+PowerInfer v.s. llama.cpp on a single RTX 4090(24G) running Falcon(ReLU)-40B-FP16 with a 11x speedup!
9
+
10
+<sub>Both PowerInfer and llama.cpp were running on the same hardware and fully utilized VRAM on RTX 4090.</sub>
11
12
13
## Abstract
0 commit comments