Skip to content

PYTHON-4391: [Bugfix] Use Docker to run LocalAtlas #30

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Sep 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 3 additions & 18 deletions .evergreen/provision-atlas.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,13 @@ PYTHON_BINARY=$(find_python3)
EVERGREEN_PATH=$(pwd)/.evergreen
TARGET_DIR=$(pwd)/$DIR
PING_ATLAS=$EVERGREEN_PATH/ping_atlas.py
SCAFFOLD_SCRIPT=$EVERGREEN_PATH/scaffold_atlas.py
DEPLOYMENT_NAME=$DIR

# Download the mongodb tar and extract the binary into the atlas directory
set -ex
curl https://fastdl.mongodb.org/mongocli/mongodb-atlas-cli_1.18.0_linux_x86_64.tar.gz -o atlas.tgz
tar zxf atlas.tgz
mv mongodb-atlas-cli_1.18.0* atlas
mkdir atlas

# Create a local atlas deployment and store the connection string as an env var
$atlas deployments setup $DIR --type local --force --debug
$atlas deployments start $DIR
CONN_STRING=$($atlas deployments connect $DIR --connectWith connectionString)
setup_local_atlas

# Make the atlas directory hold the virtualenv for provisioning
cd atlas

$PYTHON_BINARY -m venv .
Expand All @@ -33,17 +25,10 @@ $PYTHON_BINARY -m pip install pymongo
CONN_STRING=$CONN_STRING \
$PYTHON_BINARY -c "from pymongo import MongoClient; import os; MongoClient(os.environ['CONN_STRING']).db.command('ping')"

# Add database configuration
# Add database and index configurations
DATABASE=$DATABASE \
CONN_STRING=$CONN_STRING \
REPO_NAME=$REPO_NAME \
DIR=$DIR \
TARGET_DIR=$TARGET_DIR \
$PYTHON_BINARY $SCAFFOLD_SCRIPT

# If a search index configuration can be found, create the index
if [ -d "$TARGET_DIR/indexes" ]; then
for file in $TARGET_DIR/indexes/*.json; do
$atlas deployments search indexes create --file $file --deploymentName $DIR
done
fi
68 changes: 62 additions & 6 deletions .evergreen/scaffold_atlas.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

from pymongo import MongoClient
from pymongo.database import Database
from pymongo.operations import SearchIndexModel
from pymongo.results import InsertManyResult

logging.basicConfig()
Expand All @@ -20,6 +21,7 @@
DIR = os.environ.get("DIR")
TARGET_DIR = os.environ.get("TARGET_DIR")
DB_PATH = "database"
INDEX_PATH = "indexes"


def upload_data(db: Database, filename: Path) -> None:
Expand Down Expand Up @@ -49,20 +51,49 @@ def upload_data(db: Database, filename: Path) -> None:
db.create_collection(collection_name)


def walk_collection_directory() -> list[str]:
def create_index(client: MongoClient, filename: Path) -> None:
"""Create indexes based on the JSONs provided from the index_json files

Args:
client (MongoClient): MongoClient
filename (Path): Index configuration filepath
"""
with filename.open() as f:
loaded_index_configuration = json.load(f)

collection_name = loaded_index_configuration.pop("collectionName")
database_name = loaded_index_configuration.pop("database")
index_name = loaded_index_configuration.pop("name")
index_type = loaded_index_configuration.pop("type", None)

collection = client[database_name][collection_name]

search_index = SearchIndexModel(
loaded_index_configuration, name=index_name, type=index_type
)
collection.create_search_index(search_index)


def walk_directory(filepath) -> list[str]:
"""Return all *.json filenames in the DB_PATH directory"""
database_dir = Path(TARGET_DIR).joinpath(DB_PATH)
database_dir = Path(TARGET_DIR).joinpath(filepath)
return (
[file for file in database_dir.iterdir() if file.suffix == ".json"]
if database_dir.exists()
else []
)


def main() -> None:
database = MongoClient(CONN_STRING)[DATABASE_NAME]
collection_jsons = walk_collection_directory()
logger.debug("%s files found: %s", len(collection_jsons), collection_jsons)
def generate_collections(database: Database, collection_jsons: list[Path]) -> None:
"""Generate collections based on the collection_json filepaths

Args:
database (Database): Mongo Database
collection_jsons (list[Path]): List of collection filepaths
"""
logger.debug(
"%s collection files found: %s", len(collection_jsons), collection_jsons
)
if not collection_jsons:
return logger.warning(
"No collections found in %s check if database folder exists", TARGET_DIR
Expand All @@ -71,5 +102,30 @@ def main() -> None:
upload_data(database, collection_json)


def generate_indexes(client: MongoClient, index_jsons: list[Path]) -> None:
"""_summary_

Args:
client (MongoClient): MongoClient
index_jsons (list[Path]): List of index configuration filepaths
"""
logger.debug("%s index files found: %s", len(index_jsons), index_jsons)
if not index_jsons:
return logger.warning(
"No indexes found in %s check if indexes folder exists", TARGET_DIR
)
for index_json in index_jsons:
create_index(client, index_json)


def main() -> None:
client = MongoClient(CONN_STRING)
database = client[DATABASE_NAME]
collection_jsons = walk_directory(DB_PATH)
index_jsons = walk_directory(INDEX_PATH)
generate_collections(database, collection_jsons)
generate_indexes(client, index_jsons)


if __name__ == "__main__":
main()
53 changes: 53 additions & 0 deletions .evergreen/utils.sh
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,56 @@ is_python_310() {
return 1
fi
}


# start mongodb-atlas-local container, because of a bug in podman we have to define the healtcheck ourselves (is the same as in the image)
# stores the connection string in .local_atlas_uri file
setup_local_atlas() {
echo "Starting the container"
CONTAINER_ID=$(podman run --rm -d -e DO_NOT_TRACK=1 -P --health-cmd "/usr/local/bin/runner healthcheck" mongodb/mongodb-atlas-local:latest)

echo "waiting for container to become healthy..."
function wait() {
CONTAINER_ID=$1
echo "waiting for container to become healthy..."
podman healthcheck run "$CONTAINER_ID"
for _ in $(seq 600); do
STATE=$(podman inspect -f '{{ .State.Health.Status }}' "$CONTAINER_ID")

case $STATE in
healthy)
echo "container is healthy"
return 0
;;
unhealthy)
echo "container is unhealthy"
podman logs "$CONTAINER_ID"
stop
exit 1
;;
*)
echo "Unrecognized state $STATE"
sleep 1
esac
done

echo "container did not get healthy within 120 seconds, quitting"
podman logs mongodb_atlas_local
stop
exit 2
}

wait "$CONTAINER_ID"
EXPOSED_PORT=$(podman inspect --format='{{ (index (index .NetworkSettings.Ports "27017/tcp") 0).HostPort }}' "$CONTAINER_ID")
export CONN_STRING="mongodb://127.0.0.1:$EXPOSED_PORT/?directConnection=true"
echo "CONN_STRING=mongodb://127.0.0.1:$EXPOSED_PORT/?directConnection=true" > $workdir/src/.evergreen/.local_atlas_uri
}

fetch_local_atlas_uri() {
. $workdir/src/.evergreen/.local_atlas_uri

export CONN_STRING=$CONN_STRING
echo "$CONN_STRING"
}


8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,12 @@ The general layout of this repo looks like this:

### Configuring a Atlas CLI for testing

Each test subdirectory will automatically have its own local Atlas deployment. As a result, database and collection names will not conflict between different AI/ML integrations. To connect to your local Atlas using a connection string, call `$atlas` from the `run.sh` script within your subdirectory. This exposes the Atlas CLI binary. For example:
Each test subdirectory will automatically have its own local Atlas deployment. As a result, database and collection names will not conflict between different AI/ML integrations. To connect to your local Atlas using a connection string, `utils.sh` has a `fetch_local_atlas_uri` that you can call from the `run.sh` script within your subdirectory. For example:

```bash
CONN_STRING=$($atlas deployments connect $DIR --connectWith connectionString)
. $workdir/src/.evergreen/utils.sh

CONN_STRING=$(fetch_local_atlas_uri)
```

Stores the local Atlas URI within the `CONN_STRING` var. The script can then pass `CONN_STRING` as an environment variable to the test suite.
Expand All @@ -67,7 +69,7 @@ You can pre-populate a test's local Atlas deployment before running the `run.sh`

To create a search index, provide the search index configuration in the `indexes` dir; the file needs to be a json and for each individual index you wish to create, you must make a separate json file; for easier readability, please name each index configuration after the collection the index will be defined on or the name of the index itself. If multiple indexes will define one collection or multiples collections within a database share the name index name, append a name "purpose" (test_collection_vectorsearch_index) to favor readability. The evergreen script will then create the search index before the `run.sh` script is executed. See the ["Create an Atlas Search Index and Run a Query"](https://www.mongodb.com/docs/atlas/cli/stable/atlas-cli-deploy-fts/#create-an-atlas-search-index-and-run-a-query) documentation instructions on how to create a search index using Atlas CLI.

If you need more customized behavior when populating your database or configuring your local Atlas deployment, include that behavior in your `run.sh` script. The path to the `$atlas` binary is provided to that script. You can view more ways to configure local Atlas by visiting the [Atlas CLI local deployments documentation](https://www.mongodb.com/docs/atlas/cli/stable/atlas-cli-local-cloud/).
If you need more customized behavior when populating your database or configuring your local Atlas deployment, include that behavior in your `run.sh` script.

### Unpacking the Evergreen config file

Expand Down
2 changes: 1 addition & 1 deletion langchain-python/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ pip install poetry

poetry install --with test

export MONGODB_ATLAS_URI=$($atlas deployments connect $DIR --connectWith connectionString)
export MONGODB_ATLAS_URI=$(fetch_local_atlas_uri)

make test

Expand Down
3 changes: 2 additions & 1 deletion semantic-kernel-csharp/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

set -x

. $workdir/src/.evergreen/utils.sh
# WORKING_DIR = src/semantic-kernel-csharp/semantic-kernel

# Install .NET
Expand All @@ -18,5 +19,5 @@ sed -i -e 's/"MongoDB Atlas cluster is required"/null/g' dotnet/src/IntegrationT

# Run tests
echo "Running MongoDBMemoryStoreTests"
MongoDB__ConnectionString=$($atlas deployments connect $DIR --connectWith connectionString) \
MongoDB__ConnectionString=$(fetch_local_atlas_uri) \
$DOTNET_SDK_PATH/dotnet test dotnet/src/IntegrationTests/IntegrationTests.csproj --filter SemanticKernel.IntegrationTests.Connectors.MongoDB.MongoDBMemoryStoreTests
2 changes: 1 addition & 1 deletion semantic-kernel-python/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ set -x

. $workdir/src/.evergreen/utils.sh

CONN_STRING=$($atlas deployments connect $DIR --connectWith connectionString)
CONN_STRING=$(fetch_local_atlas_uri)
PYTHON_BINARY=$(find_python3)


Expand Down