-
Notifications
You must be signed in to change notification settings - Fork 51
whisper cpp large v3 turbo
Benson Wong edited this page May 29, 2025
·
2 revisions
models:
"whisper":
checkEndpoint: /v1/audio/transcriptions/
cmd: |
/path/to/llama-server/whisper-server-30cf30c
--host 127.0.0.1 --port ${PORT}
-m ggml-large-v3-turbo-q8_0.bin
# required to be compatible w/ OpenAI's API
--request-path /v1/audio/transcriptions --inference-path ""
$ curl 10.0.1.50:8080/v1/audio/transcriptions \
-H "Content-Type: multipart/form-data" \
-F file="@jfk.wav" \
-F temperature="0.0" \
-F temperature_inc="0.2" \
-F response_format="json" \
-F model="whisper"
Note
-F model="whisper"
Required for llama-swap to load the right configuration.
#!/bin/sh
# git clone https://github.com/ggml-org/whisper.cpp
# pull latest code
cd $HOME/whisper.cpp
git pull
# For Reference, configure the build.
# CUDACXX=/usr/local/cuda-12.6/bin/nvcc cmake -B build -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=1
cmake --build build --config Release -j 16
# Copy new version with hash in its filename
VERSION=$(git rev-parse --short HEAD)
NEW_FILE="whisper-server-$VERSION"
echo "New version: $NEW_FILE"
cp ./build/bin/whisper-server "/mnt/nvme/llama-server/$NEW_FILE"