llama cpp reranker

Configuration for supporting the v1/rerank endpoint with llama-server and BGE reranker V2

Download model at gpustack/bge-reranker-v2-m3-GGUF

Config

models:
  "reranker":
    env:
      - "CUDA_VISIBLE_DEVICES=GPU-eb1"
    cmd: |
      /path/to/llama-server/llama-server-latest
      --port ${PORT}
      -ngl 99
      -m /path/to/models/bge-reranker-v2-m3-Q4_K_M.gguf
      --ctx-size 8192
      --reranking
      --no-mmap

Tip

path.to.sh used for /path/to/models/... paths in example.

Testing

$ curl -s http://10.0.1.50:8080/v1/rerank \
  -H 'Content-Type: application/json' \
  -d '{
"model": "reranker",
"query": "What is the best way to learn Python?",
"documents": [
    "Python is a popular programming language used for web development and data analysis.",
    "The best way to learn Python is through online courses and practice.",
    "Python is also used for artificial intelligence and machine learning applications.",
    "To learn Python, start with the basics and build small projects to gain experience."
], "max_reranked": 2}' | jq .

Output

{
  "model": "reranker",
  "object": "list",
  "usage": {
    "prompt_tokens": 110,
    "total_tokens": 110
  },
  "results": [
    {
      "index": 0,
      "relevance_score": -2.9403347969055176
    },
    {
      "index": 1,
      "relevance_score": 7.181779861450195
    },
    {
      "index": 2,
      "relevance_score": -4.595512866973877
    },
    {
      "index": 3,
      "relevance_score": 3.0560922622680664
    }
  ]
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama cpp reranker

Config

Testing

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally