-
Notifications
You must be signed in to change notification settings - Fork 6.6k
Update model name to gemini-embedding-001 in code snipets #13388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
cf33c74
1c31328
48b1046
41e3221
e6b2843
62b985c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,24 +17,31 @@ | |
# [START generativeaionvertexai_embedding_code_retrieval] | ||
from vertexai.language_models import TextEmbeddingInput, TextEmbeddingModel | ||
|
||
MODEL_NAME = "text-embedding-005" | ||
DIMENSIONALITY = 256 | ||
MODEL_NAME = "gemini-embedding-001" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With Different models can have varying support for task types or optimal dimensionalities, so confirming this would help ensure the example remains accurate and effective. |
||
DIMENSIONALITY = 3072 | ||
|
||
|
||
def embed_text( | ||
texts: list[str] = ["Retrieve a function that adds two numbers"], | ||
task: str = "CODE_RETRIEVAL_QUERY", | ||
model_name: str = "text-embedding-005", | ||
dimensionality: int | None = 256, | ||
model_name: str = "gemini-embedding-001", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The default |
||
dimensionality: int | None = 3072, | ||
) -> list[list[float]]: | ||
"""Embeds texts with a pre-trained, foundational model.""" | ||
model = TextEmbeddingModel.from_pretrained(model_name) | ||
inputs = [TextEmbeddingInput(text, task) for text in texts] | ||
kwargs = dict(output_dimensionality=dimensionality) if dimensionality else {} | ||
embeddings = model.get_embeddings(inputs, **kwargs) | ||
# Example response: | ||
# [[0.025890009477734566, -0.05553026497364044, 0.006374752148985863,...], | ||
return [embedding.values for embedding in embeddings] | ||
|
||
embeddings = [] | ||
# gemini-embedding-001 takes one input at a time | ||
for text in texts: | ||
text_input = TextEmbeddingInput(text, task) | ||
embedding = model.get_embeddings([text_input], **kwargs) | ||
print(embedding) | ||
# Example response: | ||
# [[0.006135190837085247, -0.01462465338408947, 0.004978656303137541, ...]] | ||
embeddings.append(embedding[0].values) | ||
|
||
return embeddings | ||
|
||
|
||
if __name__ == "__main__": | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,19 +28,24 @@ def embed_text() -> list[list[float]]: | |
# A list of texts to be embedded. | ||
texts = ["banana muffins? ", "banana bread? banana muffins?"] | ||
# The dimensionality of the output embeddings. | ||
dimensionality = 256 | ||
dimensionality = 3072 | ||
# The task type for embedding. Check the available tasks in the model's documentation. | ||
task = "RETRIEVAL_DOCUMENT" | ||
|
||
model = TextEmbeddingModel.from_pretrained("text-embedding-005") | ||
inputs = [TextEmbeddingInput(text, task) for text in texts] | ||
model = TextEmbeddingModel.from_pretrained("gemini-embedding-001") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updating the model to |
||
kwargs = dict(output_dimensionality=dimensionality) if dimensionality else {} | ||
embeddings = model.get_embeddings(inputs, **kwargs) | ||
|
||
print(embeddings) | ||
# Example response: | ||
# [[0.006135190837085247, -0.01462465338408947, 0.004978656303137541, ...], [0.1234434666, ...]], | ||
return [embedding.values for embedding in embeddings] | ||
embeddings = [] | ||
# gemini-embedding-001 takes one input at a time | ||
for text in texts: | ||
text_input = TextEmbeddingInput(text, task) | ||
embedding = model.get_embeddings([text_input], **kwargs) | ||
print(embedding) | ||
# Example response: | ||
# [[0.006135190837085247, -0.01462465338408947, 0.004978656303137541, ...]] | ||
embeddings.append(embedding[0].values) | ||
|
||
return embeddings | ||
|
||
|
||
# [END generativeaionvertexai_embedding] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model is updated to
gemini-embedding-001
. This is a key change. Could you please confirm if this new model has been verified for full compatibility with the existing usage patterns in this and other updated examples?Specifically for the examples touched in this PR:
task
types used incode_retrieval_example.py
(CODE_RETRIEVAL_QUERY
,RETRIEVAL_DOCUMENT
) anddocument_retrieval_example.py
(RETRIEVAL_DOCUMENT
) supported and appropriate forgemini-embedding-001
?output_dimensionality=256
(used incode_retrieval_example.py
anddocument_retrieval_example.py
) a valid, supported, and optimal setting forgemini-embedding-001
?gemini-embedding-001
(when used viaTextEmbeddingModel.from_pretrained
) behave as expected with thebatch_predict
method and the input/output formats used in this specificbatch_example.py
file?Ensuring these compatibilities are verified is important for the correctness and reliability of these samples. If the new model has different characteristics or requirements, further code adjustments might be needed.