Skip to content
Benson Wong edited this page May 26, 2025 · 6 revisions

Guides and Configuration

Contributor Link OS Server Model VRAM Description
@mostlygeek view linux llama.cpp llama3.3 70B 52.5GB over 3 gpus 13 to 20 tok/sec with speculative decoding
@mostlygeek view linux llama.cpp qwen3-30B-3A 24GB Running the latest Qwen3 models with thinking and no thinking
@mostlygeek view linux llama.cpp various VLMs 8GB to 24GB Running various VLLMs with llama-server
Clone this wiki locally