Update `embedders` settings, hybrid search, and add tests for AI search methods #1087

Strift · 2025-03-08T09:36:27Z

Pull Request

Related issue

Requires #1086

What does this PR do?

Update settings to handle embedders

Docs: https://www.meilisearch.com/docs/reference/api/settings#embedders

Add embedders setting: Update methods getEmbedders, updateEmbedders, resetEmbedders. Also, the method updateSettings should be able to accept the new embedders field. Here is the list of the acceptable sub fields:

Review comment:

Add a test to check if the format of each embedder type has the fields you need them to have

Update search to handle vector search and hybrid search

Docs: https://www.meilisearch.com/docs/reference/api/search

Update search method:

hybrid search parameter, with sub fields semanticRatio and embedder. embedder is mandatory if hybrid is set.
vector parameter is available
retrieveVectors parameter available
semanticHitCount in search response
Accept _semanticScore in the search response (optional)
_vectors are returned in the hit objects, when retrieveVectors is true
_vectors NOT present in the search response

Add similar documents endpoint

Docs: https://www.meilisearch.com/docs/reference/api/similar

Implement searchSimilarDocuments associated with the POST /indexes/:uid/similar. Do NOT implement with GET.

PR checklist

Please check if your PR fulfills the following requirements:

Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
Have you read the contributing guidelines?
Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

Summary by CodeRabbit

New Features
- Added support for configuring and managing various embedder types, enabling advanced vector and hybrid search capabilities.
- Introduced hybrid search functionality that combines keyword and semantic search, with adjustable balance between the two.
Documentation
- Updated documentation to include a new section on hybrid search, detailing its usage and configuration options.
Tests
- Added comprehensive tests for vector and hybrid search, as well as for validating the format and configuration of different embedder types.

meilisearch/index.py

brunoocasali · 2025-03-18T13:36:46Z

meilisearch/index.py

+            Supported embedder sources:
+            - 'openAi': OpenAI embedder
+            - 'huggingFace': HuggingFace embedder
+            - 'ollama': Ollama embedder
+            - 'rest': REST API embedder
+            - 'userProvided': User-provided embedder
+
+            Required fields depend on the embedder source:
+            - 'rest' requires 'request' and 'response' fields
+            - 'userProvided' requires 'dimensions' field
+
+            Optional fields (availability depends on source):
+            - 'url': The URL Meilisearch contacts when querying the embedder
+            - 'apiKey': Authentication token for the embedder
+            - 'model': The model used for generating vectors
+            - 'documentTemplate': Template defining the data sent to the embedder
+            - 'documentTemplateMaxBytes': Maximum size of rendered document template
+            - 'dimensions': Number of dimensions in the chosen model
+            - 'revision': Model revision hash (only for 'huggingFace')
+            - 'distribution': Object with 'mean' and 'sigma' fields
+            - 'binaryQuantized': Boolean to convert vector dimensions to 1-bit values


I wonder if this docs follow any pattern to describe the fields and if this is the correct definition, can you confirm?

It's AI-generated based on my input (the meilisearch docs). It does not follow any particular conventions.

I think it might be better to have less information here, and let the user refer to the documentation. I will remove it.

brunoocasali · 2025-03-18T13:38:00Z

meilisearch/index.py

-
-        if body:
-            for _, v in body.items():
-                if "documentTemplateMaxBytes" in v and v["documentTemplateMaxBytes"] is None:


Is this handling done by Meili now?

Removing it did not trigger any test failure but it might simply be untested, so I added it back to avoid any unwanted side effects

meilisearch/index.py

meilisearch/models/embedders.py

brunoocasali · 2025-03-18T13:44:06Z

tests/index/test_index_search_meilisearch.py

+    assert "default" in response["hits"][0]["_vectors"]
+
+
+def test_get_similar_documents_with_identical_vectors(empty_index):


If I understood it correctly, you're manually creating the vector for that given document, so you don't need to define any model in your test instance before, right?

If you're referring to the test_get_similar_documents_with_identical_vectors test, that's correct.

I'm only creating an embedder so Meilisearch knows which embedder to use to compute the vector similarity:

# Configure the embedder settings_update_task = index.update_embedders( { "default": { "source": "userProvided", "dimensions": 2, } } )

Strift

Ty for the review @brunoocasali, I answered your comments and made the changes

Strift · 2025-03-20T06:47:59Z

meilisearch/index.py

+            Supported embedder sources:
+            - 'openAi': OpenAI embedder
+            - 'huggingFace': HuggingFace embedder
+            - 'ollama': Ollama embedder
+            - 'rest': REST API embedder
+            - 'userProvided': User-provided embedder
+
+            Required fields depend on the embedder source:
+            - 'rest' requires 'request' and 'response' fields
+            - 'userProvided' requires 'dimensions' field
+
+            Optional fields (availability depends on source):
+            - 'url': The URL Meilisearch contacts when querying the embedder
+            - 'apiKey': Authentication token for the embedder
+            - 'model': The model used for generating vectors
+            - 'documentTemplate': Template defining the data sent to the embedder
+            - 'documentTemplateMaxBytes': Maximum size of rendered document template
+            - 'dimensions': Number of dimensions in the chosen model
+            - 'revision': Model revision hash (only for 'huggingFace')
+            - 'distribution': Object with 'mean' and 'sigma' fields
+            - 'binaryQuantized': Boolean to convert vector dimensions to 1-bit values


It's AI-generated based on my input (the meilisearch docs). It does not follow any particular conventions.

I think it might be better to have less information here, and let the user refer to the documentation. I will remove it.

Strift · 2025-03-20T06:55:30Z

meilisearch/index.py

-
-        if body:
-            for _, v in body.items():
-                if "documentTemplateMaxBytes" in v and v["documentTemplateMaxBytes"] is None:


Removing it did not trigger any test failure but it might simply be untested, so I added it back to avoid any unwanted side effects

Co-authored-by: Bruno Casali <[email protected]>

sanders41

Sorry, I've been busy and only had a chance to take a really quick look at this. Can we test this against some older v1 meilisearch? I'm thinking there may be some changes here that make the package incompatible with older versions, but haven't had time to test myself.

curquiza · 2025-03-30T16:05:23Z

@Strift I merged the PR of Ellnix first: #1075 and now there are conflicts on your PR

Maybe I shouldn't have merged it
It's either his PR or your PR with conflict anyway...

Sorry again!

brunoocasali · 2025-03-30T19:54:20Z

I'm thinking there may be some changes here that make the package incompatible with older versions, but haven't had time to test myself.

This is expected @sanders41 since this version introduced the stabilization of the AI capabilities. @curquiza it looks good to me, but I would wait for @sanders41 review before merging :)

brunoocasali

@Strift let's move on with this PR it looks good enough to me, let's put it in production so users can guide us with further improvements :)

coderabbitai · 2025-05-15T04:43:55Z

Caution

Review failed

The pull request is closed.

Walkthrough

The changes introduce a new meilisearch.models.embedders module to define structured embedder configuration models, update the index and settings logic to use these models, and improve embedder handling and documentation. Test coverage is expanded for embedders, hybrid/vector search, and similar document retrieval. All embedder-related classes are removed from models/index.py and now reside in the new module.

Changes

File(s)	Change Summary
README.md	Added a "Hybrid Search" section documenting hybrid search usage, parameters, and example code.
meilisearch/models/embedders.py	New module defining Pydantic-based models for embedder configurations (OpenAI, HuggingFace, Ollama, Rest, UserProvided), a distribution class, and a container class for embedders.
meilisearch/models/index.py	Removed all embedder classes and the Embedders container (moved to `models/embedders.py`). Cleaned up imports.
meilisearch/index.py	Updated imports to use new embedders module. Improved docstrings for search and settings methods. Enhanced embedder handling in settings methods, including better type annotations and explicit handling of embedder sources. Added serialization for embedder models when updating settings.
tests/conftest.py tests/settings/test_settings.py	Updated imports to source embedder classes from `models/embedders.py` instead of `models/index.py`.
tests/settings/test_settings_embedders.py	Updated imports. Added comprehensive tests for each embedder type, verifying configuration, required/optional fields, and correct retrieval after update.
tests/index/test_index_search_meilisearch.py	Added/extended tests for vector search, hybrid search, retrieval of vectors in results, and similar document search with identical vectors.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Index
    participant EmbeddersModel

    Client->>Index: update_embedders(embedder_dict)
    Index->>EmbeddersModel: Parse embedder_dict by source
    EmbeddersModel-->>Index: Return structured embedder objects
    Index->>Index: Serialize embedders for API
    Index->>MeiliSearch API: PATCH /indexes/:uid/settings/embedders

    Client->>Index: get_embedders()
    Index->>MeiliSearch API: GET /indexes/:uid/settings/embedders
    MeiliSearch API-->>Index: Return embedder configs
    Index->>EmbeddersModel: Parse configs into objects
    EmbeddersModel-->>Index: Return structured embedders
    Index-->>Client: Return embedders object

sequenceDiagram
    participant Client
    participant Index

    Client->>Index: search(query, {hybrid: {semanticRatio, embedder}})
    Index->>MeiliSearch API: POST /indexes/:uid/search (with hybrid params)
    MeiliSearch API-->>Index: Return results with semanticHitCount, etc.
    Index-->>Client: Return search results

Assessment against linked issues

Objective	Addressed	Explanation
Embedders setting: support all sources and subfields (`embedders`, methods, fields, serialization) (#1081)	✅
Search: hybrid/vector params, retrieveVectors, semanticHitCount, response fields (#1081)	✅
Similar documents endpoint: implement `searchSimilarDocuments` via POST (#1081)	✅
Remove `_vectors` from search response, optional vector field (#1081)	✅

Poem

A bunny hopped through embedders anew,
With OpenAI, HuggingFace, and Ollama too!
Rest and UserProvided, all in a row,
Hybrid searches now smarter, results in tow.
Tests abound for every case,
This codebase leaps with rabbit grace!
🐇✨

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.

Note

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.
Enjoy the performance boost—your workflow just got faster.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 1603f44 and 44a68a5.

📒 Files selected for processing (8)

README.md (1 hunks)
meilisearch/index.py (10 hunks)
meilisearch/models/embedders.py (1 hunks)
meilisearch/models/index.py (1 hunks)
tests/conftest.py (1 hunks)
tests/index/test_index_search_meilisearch.py (2 hunks)
tests/settings/test_settings.py (1 hunks)
tests/settings/test_settings_embedders.py (2 hunks)

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Strift marked this pull request as draft March 8, 2025 09:36

Strift marked this pull request as ready for review March 9, 2025 05:58

Strift changed the title ~~Feat/add embedders settings~~ Update embedders settings, hybrid search, and add tests for AI search methods Mar 9, 2025

brunoocasali reviewed Mar 18, 2025

View reviewed changes

Strift commented Mar 20, 2025

View reviewed changes

Strift requested a review from brunoocasali March 21, 2025 08:50

Strift and others added 20 commits March 26, 2025 15:37

Update embedders

c125d64

Update embedders models

7f61d2c

Add docs

93e8f69

Allow updating embedders via update_settings

d3aa65b

Refactor config validation to avoid duplicate code

2fcbc47

Update validation code

f9258e9

Remove validation to let meilisearch handle it

8a4369d

Remove unused parameters

742ef5e

Add hybrid search

c4e26d7

Add test for retrieving vectors

5e954ac

Add semanticHitCount test

05291f4

Update comment

b064b0b

Add test for similar documents

d5d928e

Fix linters errors

b49cb42

Sort imports

ef1b771

Update meilisearch/models/embedders.py

297b3e4

Co-authored-by: Bruno Casali <[email protected]>

Avoid repeating embedder type

c7c1700

Remove docs

b1258c7

Add unintentionally removed

8960bc2

Fix mypy issues

268aa4c

Strift force-pushed the feat/add-embedders-settings branch from bf4e7cb to 268aa4c Compare March 26, 2025 07:39

Strift added 3 commits March 26, 2025 16:09

Add test for embedders fields

d8825aa

Add tests for fields presence

b324323

Split tests

057377b

sanders41 reviewed Mar 26, 2025

View reviewed changes

Strift added 2 commits April 2, 2025 16:17

Merge branch 'main' into feat/add-embedders-settings

8082344

Fix missing imports

e515f29

curquiza requested a review from sanders41 April 3, 2025 17:56

Strift and others added 2 commits April 8, 2025 10:45

Merge branch 'main' into feat/add-embedders-settings

dc98a1e

Merge branch 'main' into feat/add-embedders-settings

44a68a5

brunoocasali approved these changes May 14, 2025

View reviewed changes

Strift merged commit 2b0bd13 into main May 15, 2025
10 checks passed

Strift deleted the feat/add-embedders-settings branch May 15, 2025 04:42

Strift added the enhancement New feature or request label May 15, 2025

coderabbitai bot mentioned this pull request May 26, 2025

Add composite embedders and pooling for hf models #1104

Merged

3 tasks

		assert "default" in response["hits"][0]["_vectors"]


		def test_get_similar_documents_with_identical_vectors(empty_index):

Update embedders settings, hybrid search, and add tests for AI search methods #1087

Update embedders settings, hybrid search, and add tests for AI search methods #1087

Uh oh!

Conversation

Strift commented Mar 8, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Related issue

What does this PR do?

Update settings to handle embedders

Update search to handle vector search and hybrid search

Add similar documents endpoint

PR checklist

Summary by CodeRabbit

Uh oh!

Uh oh!

brunoocasali Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

Strift Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

brunoocasali Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

Strift Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

brunoocasali Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

Strift Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Strift left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Strift Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

Strift Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

sanders41 left a comment

Choose a reason for hiding this comment

Uh oh!

curquiza commented Mar 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brunoocasali commented Mar 30, 2025

Uh oh!

brunoocasali left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Assessment against linked issues

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

Uh oh!

Update `embedders` settings, hybrid search, and add tests for AI search methods #1087

Update `embedders` settings, hybrid search, and add tests for AI search methods #1087

Strift commented Mar 8, 2025 •

edited by coderabbitai bot

Loading

Strift Mar 20, 2025 •

edited

Loading

Strift left a comment •

edited

Loading

curquiza commented Mar 30, 2025 •

edited

Loading

coderabbitai bot commented May 15, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)