Skip to content

Commit 66ca46f

Browse files
authored
Merge pull request #12 from mongodb-labs/multi-index-support
PYTHON-4287: Support Creation of Multiple Vector Search Indexes for Database Generation Step
2 parents 5b6c4d7 + d82597f commit 66ca46f

File tree

6 files changed

+31
-20
lines changed

6 files changed

+31
-20
lines changed

.evergreen/provision-atlas.sh

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,8 @@ DATABASE=$DATABASE \
4242
$PYTHON_BINARY $SCAFFOLD_SCRIPT
4343

4444
# If a search index configuration can be found, create the index
45-
if [ -f "$TARGET_DIR/indexConfig.json" ]; then
46-
$atlas deployments search indexes create --file $TARGET_DIR/indexConfig.json --deploymentName $DIR
45+
if [ -d "$TARGET_DIR/indexes" ]; then
46+
for file in "$TARGET_DIR/indexes/*.json"; do
47+
$atlas deployments search indexes create --file $file --deploymentName $DIR
48+
done
4749
fi

README.md

Lines changed: 27 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,11 @@ This repository exists to test our integrations in Third-Party AI/ML libraries.
66

77
## Motivation
88

9-
With the public release of `$vectorSearch`, we have needed to integrate into these AI/ML sponsored libraries.
9+
With the public release of `$vectorSearch`, we have needed to integrate into these AI/ML sponsored libraries.
1010
([LangChain](https://github.com/langchain-ai/langchainjs), [LlamaIndex](https://github.com/run-llama/llama_index), [Semantic Kernel](https://github.com/microsoft/semantic-kernel)... etc) This repository runs continuous testing against each of these repos.
1111

1212
## How to add a test
13+
1314
> **NOTE** All tests run against this repo are now required to work against a local Atlas deployment. See details below to ensure proper setup.
1415
1516
### Test Layout
@@ -19,23 +20,26 @@ Each AI/ML pipeline is sorted by the composite of the name of the library, and t
1920
Each subdirectory is scoped to run only one AI/ML integration's suite of tests for one language within that cloned repository. For example, if an AI/ML integration has both a Python and C# implementation of Atlas Vector Search, two subdirectories need to be made: one for Python, titled `{repo}-python`, and one for C#, titled `{repo}-csharp`. See `semantic-kernel-*` subdirectories in the layout example below.
2021

2122
Within each subdirectory you should expect to have:
22-
- `run.sh` -- A script that should handle any additional library installations and steps for executing the test suite. This script should not populate the Atlas database with any required test data.
23+
24+
- `run.sh` -- A script that should handle any additional library installations and steps for executing the test suite. This script should not populate the Atlas database with any required test data.
2325
- `database/` -- An optional directory used by `.evergreen/scaffold_atlas.py` to populate a MongoDB database with test data. Only provide this if your tests require pre-populated data.
2426
- `database/{collection}.json` -- An optional JSON file containing one or more MongoDB documents that will be uploaded to `$DATABASE.{collection}` in the local Atlas instance. Only provide this if your tests require pre-populated data.
2527
- `indexConfig.json` -- An optional file containing configuration for a specified Atlas Search Index.
2628
- Additionally, you can add other useful files, including `.env` files, if required by your tests.
2729

28-
The general layout of this repo will looks like this:
30+
The general layout of this repo looks like this:
31+
2932
```bash
3033
├── LICENSE # License Agreeement
3134
├── README.md # This Document
3235
├── langchain-python # Folder scoped for one Integration
3336
│ └── run.sh # Script that executes test
3437
├── semantic-kernel-csharp # Folder scoped for one Integration
35-
│ ├── database # Optional database definition
38+
│ ├── database # Optional database definition directory
3639
│ │ └── nearestSearch.json # Populates $DATABASE.nearestSearch
3740
│ │ └── furthestSearch.json # Populates $DATABASE.furthestSearch
38-
│ ├── indexConfig.json # Creates Search Index on $DATABASE
41+
│ ├── indexes # Optional Index definitions directory
42+
│ │ └── indexConfig.json # Optional Search index definition
3943
│ └── run.sh # Script that executes test
4044
├── semantic-kernel-python # Folder scoped for one Integration
4145
│ ├── database # Optional database definition
@@ -46,18 +50,22 @@ The general layout of this repo will looks like this:
4650
```
4751

4852
### Configuring a Atlas CLI for testing
53+
4954
Each test subdirectory will automatically have its own local Atlas deployment. As a result, database and collection names will not conflict between different AI/ML integrations. To connect to your local Atlas using a connection string, call `$atlas` from the `run.sh` script within your subdirectory. This exposes the Atlas CLI binary. For example:
5055

5156
```bash
5257
CONN_STRING=$($atlas deployments connect $DIR --connectWith connectionString)
5358
```
59+
5460
Stores the local Atlas URI within the `CONN_STRING` var. The script can then pass `CONN_STRING` as an environment variable to the test suite.
5561

5662
#### Pre-populating the Local Atlas Deployment
63+
5764
You can pre-populate a test's local Atlas deployment before running the `run.sh` script by providing JSON files in the optional `database` directory of the created subdirectory. The `.evergreen/scaffold_atlas.py` file will search for every JSON file within this database directory and upload the documents to the database provided by the `DATABASE` expansion provided in the build variant of the `.evergreen/config.yml` setup. The collection the script uploads to is based on the name of your JSON file:
65+
5866
- `critical_search.json` with `DATABASE: app` will upload all of its contents to `app.critical_search` collection.
5967

60-
To create a search index, provide the search index configuration in the `indexConfig.json` file. The evergreen script will then create the search index before the `run.sh` script is executed. Adding multiple search indexes in the setup stage is not supported, but more indexes can be included in the `run.sh` by using the referenced `$atlas` binary. See the ["Create an Atlas Search Index and Run a Query"](https://www.mongodb.com/docs/atlas/cli/stable/atlas-cli-deploy-fts/#create-an-atlas-search-index-and-run-a-query) documentation instructions on how to create a search index using Atlas CLI.
68+
To create a search index, provide the search index configuration in the `indexes` dir; the file needs to be a json and for each individual index you wish to create, you must make a separate json file; for easier readability, please name each index configuration after the collection the index will be defined on or the name of the index itself. If multiple indexes will define one collection or multiples collections within a database share the name index name, append a name "purpose" (test_collection_vectorsearch_index) to favor readability. The evergreen script will then create the search index before the `run.sh` script is executed. See the ["Create an Atlas Search Index and Run a Query"](https://www.mongodb.com/docs/atlas/cli/stable/atlas-cli-deploy-fts/#create-an-atlas-search-index-and-run-a-query) documentation instructions on how to create a search index using Atlas CLI.
6169

6270
If you need more customized behavior when populating your database or configuring your local Atlas deployment, include that behavior in your `run.sh` script. The path to the `$atlas` binary is provided to that script. You can view more ways to configure local Atlas by visiting the [Atlas CLI local deployments documentation](https://www.mongodb.com/docs/atlas/cli/stable/atlas-cli-local-cloud/).
6371

@@ -67,22 +75,23 @@ Test execution flow is defined in `.evergreen/config.yml`. The test pipeline's c
6775

6876
**[Build variants](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files#build-variants)** -- This is the highest granularity we will use to define how and when a test pipeline will run. A build variant should only ever be scoped to service one test pipeline. There can be multiple tasks run within a build variant, but they should all only scope themselves to a singular test pipeline in order to maintain an ease of traceability for testing.
6977

70-
- `name` -- This should be in the format `test-{pipeline}-{language}-{os}`
71-
- `display_name` -- This can be named however you see fit. Ensure it is easy to understand. See `.evergreen/config.yml` for examples
72-
- [`expansions`](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files/#expansions) -- Build variant specific variables. Expansions that need to be maintained as secrets should be stored in [the Evergreen project settings](https://spruce.mongodb.com/project/ai-ml-pipeline-testing/settings/variables) using [variables](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-and-Distro-Settings#variables). Some common expansions needed are:
73-
- `DIR` -- The subdirectory where the tasks will run
74-
- `REPO_NAME` -- The name of the AI/ML framework repository that will get cloned
75-
- `CLONE_URL` -- The Github URL to clone into the specified `DIR`
76-
- `DATABASE` -- The optional database where the Atlas CLI will load your index configs
78+
- `name` -- This should be in the format `test-{pipeline}-{language}-{os}`
79+
- `display_name` -- This can be named however you see fit. Ensure it is easy to understand. See `.evergreen/config.yml` for examples
80+
- [`expansions`](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files/#expansions) -- Build variant specific variables. Expansions that need to be maintained as secrets should be stored in [the Evergreen project settings](https://spruce.mongodb.com/project/ai-ml-pipeline-testing/settings/variables) using [variables](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-and-Distro-Settings#variables). Some common expansions needed are:
81+
82+
- `DIR` -- The subdirectory where the tasks will run
83+
- `REPO_NAME` -- The name of the AI/ML framework repository that will get cloned
84+
- `CLONE_URL` -- The Github URL to clone into the specified `DIR`
85+
- `DATABASE` -- The optional database where the Atlas CLI will load your index configs
7786

78-
- `run_on` -- Specified platform to run on. `rhel87-small` should be used by default. Any other distro may fail Atlas CLI setup.
79-
- `tasks` -- Tasks to run. See below for more details
80-
- `cron` -- The tests are run via a cron job on a nightly cadence. This can be modified by setting a different cadence. Cron jobs can be scheduled using [cron syntax](https://crontab.guru/#0_0_*_*_*)
87+
- `run_on` -- Specified platform to run on. `rhel87-small` should be used by default. Any other distro may fail Atlas CLI setup.
88+
- `tasks` -- Tasks to run. See below for more details
89+
- `cron` -- The tests are run via a cron job on a nightly cadence. This can be modified by setting a different cadence. Cron jobs can be scheduled using [cron syntax](https://crontab.guru/#0_0_*_*_*)
8190

8291
**[Tasks](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files#tasks)** -- These are the "building blocks" of our runs. Here is where we consolidate the specific set of functions. The basic parameters to add are shown below
8392

84-
- `name` -- This should be in the format `test-{pipeline}-{language}`
85-
- `commands` -- See below.
93+
- `name` -- This should be in the format `test-{pipeline}-{language}`
94+
- `commands` -- See below.
8695

8796
**[Functions](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Configuration-Files#functions)** -- We've defined some common functions that will be used. See the `.evergreen/config.yml` for example cases. The standard procedure is to fetch the repository, provision Atlas as needed, and then execute the tests specified in the `run.sh` script you create. Ensure that the expansions are provided for these functions, otherwise the tests will run improperly and most likely fail.
8897

File renamed without changes.

0 commit comments

Comments
 (0)