Skip to content

llama server embedding

Benson Wong edited this page May 29, 2025 · 1 revision

Config

models:
  "embedding":
    unlisted: true
    cmd: |
      /path/to/llama-server-latest --port ${PORT}
      -m /models/nomic-embed-text-v1.5.Q8_0.gguf
      --ctx-size 8192
      --batch-size 8192
      --rope-scaling yarn
      --rope-freq-scale 0.75
      -ngl 99
      --embeddings
      --no-mmap

Testing Model

$ curl -s 10.0.1.50:8080/v1/embeddings \
    -X POST \
    -H "Content-Type: application/json" \
    -d '{"model": "embedding", "input": "the text to embed"}' | jq .data;
Clone this wiki locally