Skip to content

output normalize embedding in '/v1/embeddings' #5956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 9, 2024
Merged

output normalize embedding in '/v1/embeddings' #5956

merged 3 commits into from
Mar 9, 2024

Conversation

redlion0929
Copy link
Contributor

@redlion0929 redlion0929 commented Mar 9, 2024

Description

Configure server.cpp to output normalized embeddings with the endpoint set to "v1/embeddings".
This PR is related with issue #5954

Test Method

I use below code to check normalized results.

def check_normalize(embedding):
    s = 0
    for i in embedding:
        s += i * i
    print(f"sum: {s}")


response = client.embeddings.create(input = input_texts, model="test")

print(response)

check_normalize(response.data[0].embedding)

Before

CreateEmbeddingResponse(data=[Embedding(embedding=[0.779549777507782, -1.7770930528640747, -0.5943143963813782, ... ,], index=0, object='embedding')], model='test', object='list', usage=Usage(prompt_tokens=0, total_tokens=0))

sum: 85301.80714120051

After

CreateEmbeddingResponse(data=[Embedding(embedding=[0.0026690999511629343, -0.006084587424993515, -0.0020348725374788046, ..., ], index=0, object='embedding')], model='test', object='list', usage=Usage(prompt_tokens=0, total_tokens=0))

sum: 1.0000004424428977

@ngxson
Copy link
Collaborator

ngxson commented Mar 9, 2024

Quick note: this is maybe a breaking change. Can we add an option to skip normalize, something like --embeddings-skip-norm ?
By default, openai normalize the vector so I prefer to have normalization enable by default in llama.cpp

@ggerganov
Copy link
Member

The flag should better be a request option instead of CLI option

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I avoided an extra copy of the data and moved the normalization into common so it can be reused

We can add the request option for disabling the normalization in a separate PR

@ggerganov ggerganov merged commit fb215c3 into ggml-org:master Mar 9, 2024
hazelnutcloud pushed a commit to hazelnutcloud/llama.cpp that referenced this pull request Mar 10, 2024
* output normalize embedding in '/v1/embeddings'

* common : reuse llama_embd_normalize

* common : better normalize impl

---------

Co-authored-by: Georgi Gerganov <[email protected]>
NeoZhangJianyu pushed a commit to NeoZhangJianyu/llama.cpp that referenced this pull request Mar 12, 2024
* output normalize embedding in '/v1/embeddings'

* common : reuse llama_embd_normalize

* common : better normalize impl

---------

Co-authored-by: Georgi Gerganov <[email protected]>
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
* output normalize embedding in '/v1/embeddings'

* common : reuse llama_embd_normalize

* common : better normalize impl

---------

Co-authored-by: Georgi Gerganov <[email protected]>
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* output normalize embedding in '/v1/embeddings'

* common : reuse llama_embd_normalize

* common : better normalize impl

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants