-
Notifications
You must be signed in to change notification settings - Fork 49
llama cpp reranker
Benson Wong edited this page May 29, 2025
·
5 revisions
Configuration for supporting the v1/rerank
endpoint with llama-server and BGE reranker V2
- Download model at gpustack/bge-reranker-v2-m3-GGUF
models:
"reranker":
env:
- "CUDA_VISIBLE_DEVICES=GPU-eb1"
cmd: |
/path/to/llama-server/llama-server-latest
--port ${PORT}
-ngl 99
-m /path/to/models/bge-reranker-v2-m3-Q4_K_M.gguf
--ctx-size 8192
--reranking
--no-mmap
Tip
path.to.sh used for /path/to/models/...
paths in example.
$ curl -s http://10.0.1.50:8080/v1/rerank \
-H 'Content-Type: application/json' \
-d '{
"model": "reranker",
"query": "What is the best way to learn Python?",
"documents": [
"Python is a popular programming language used for web development and data analysis.",
"The best way to learn Python is through online courses and practice.",
"Python is also used for artificial intelligence and machine learning applications.",
"To learn Python, start with the basics and build small projects to gain experience."
], "max_reranked": 2}' | jq .
Output
{
"model": "reranker",
"object": "list",
"usage": {
"prompt_tokens": 110,
"total_tokens": 110
},
"results": [
{
"index": 0,
"relevance_score": -2.9403347969055176
},
{
"index": 1,
"relevance_score": 7.181779861450195
},
{
"index": 2,
"relevance_score": -4.595512866973877
},
{
"index": 3,
"relevance_score": 3.0560922622680664
}
]
}