Skip to content

Adding unit tests for dspy.retrievers.Embeddings #8129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 3, 2025

Conversation

Krishn1412
Copy link
Contributor

Added three unit tests for the embeddings.

  1. test_embeddings_basic_search: Verifies that the retriever returns the correct top k relevant passages and their indices for a single query.

  2. test_embeddings_forward_batch: Ensures the retriever handles batch queries correctly, returning the top k relevant passages and indices for each query.

  3. test_normalization: Confirms that the embeddings are correctly normalized to have a norm close to 1 after processing.

Added three unit tests for the embeddings.

1) test_embeddings_basic_search: Verifies that the retriever returns the correct top k relevant passages and their indices for a single query.

2) test_embeddings_forward_batch: Ensures the retriever handles batch queries correctly, returning the top k relevant passages and indices for each query.

3) test_normalization: Confirms that the embeddings are correctly normalized to have a norm close to 1 after processing.
Copy link
Collaborator

@chenmoneygithub chenmoneygithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Left some comments


from dspy.retrievers.embeddings import Embeddings

@pytest.fixture
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we don't need fixture for the list since it's only used in this test

assert isinstance(passage, str)
assert passage in dummy_corpus

def test_embeddings_forward_batch(dummy_corpus, dummy_embedder):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove the test and the one below, since they are testing the private methods.

1) Removed tests that were calling private functions.

2) Added a new test to check robustness on high concurrency.

3) Updated dummy embedder by keeping similar data close and different data far away.
@Krishn1412
Copy link
Contributor Author

Hey @chenmoneygithub , can you take a look now? TIA.

Copy link
Collaborator

@chenmoneygithub chenmoneygithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I will do some minor cleanup and merge. Thanks for the contribution again!

@chenmoneygithub chenmoneygithub merged commit a3709ba into stanfordnlp:main May 3, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants