-
Notifications
You must be signed in to change notification settings - Fork 49
Examples
Benson Wong edited this page May 26, 2025
·
6 revisions
Contributor | Link | OS | Server | Model | VRAM | Description |
---|---|---|---|---|---|---|
@mostlygeek | view | linux | llama.cpp | llama3.3 70B | 52.5GB over 3 gpus | 13 to 20 tok/sec with speculative decoding |
@mostlygeek | view | linux | llama.cpp | qwen3-30B-3A | 24GB | Running the latest Qwen3 models with thinking and no thinking |
@mostlygeek | view | linux | llama.cpp | various VLMs | 8GB to 24GB | Running various VLLMs with llama-server |