Skip to content

whisper cpp large v3 turbo

Benson Wong edited this page May 29, 2025 · 2 revisions

Config

models: 
  "whisper":
    checkEndpoint: /v1/audio/transcriptions/
    cmd: |
      /path/to/llama-server/whisper-server-30cf30c
        --host 127.0.0.1 --port ${PORT}
        -m ggml-large-v3-turbo-q8_0.bin
        # required to be compatible w/ OpenAI's API
        --request-path /v1/audio/transcriptions --inference-path ""

Testing

$ curl 10.0.1.50:8080/v1/audio/transcriptions \
  -H "Content-Type: multipart/form-data" \
  -F file="@jfk.wav" \
  -F temperature="0.0" \
  -F temperature_inc="0.2" \
  -F response_format="json" \
  -F model="whisper"

Note

-F model="whisper"

Required for llama-swap to load the right configuration.

Compiling whisper.cpp

#!/bin/sh

# git clone https://github.com/ggml-org/whisper.cpp

# pull latest code
cd $HOME/whisper.cpp
git pull

# For Reference, configure the build. 
# CUDACXX=/usr/local/cuda-12.6/bin/nvcc cmake -B build -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=1

cmake --build build --config Release -j 16

# Copy new version with hash in its filename
VERSION=$(git rev-parse --short HEAD)
NEW_FILE="whisper-server-$VERSION"
echo "New version: $NEW_FILE"
cp ./build/bin/whisper-server "/mnt/nvme/llama-server/$NEW_FILE"
Clone this wiki locally