|
2 | 2 |
|
3 | 3 | ## What is it?
|
4 | 4 |
|
5 |
| -This repository exists to test our integrations in Third-Party AI/ML testing libraries. |
| 5 | +This repository exists to test our integrations in Third-Party AI/ML libraries. |
6 | 6 |
|
7 | 7 | ## Motivation
|
8 | 8 |
|
@@ -90,3 +90,51 @@ Test execution flow is defined in `.evergreen/config.yml`. The test pipeline's c
|
90 | 90 | - [`execute tests`](https://github.com/mongodb-labs/ai-ml-pipeline-testing/blob/main/.evergreen/config.yml#L51) -- Uses [subprocess.exec](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Commands#subprocessexec) to run the provided `run.sh` file. `run.sh` must be within the specified `DIR` path.
|
91 | 91 | - `fetch source` -- Retrieves the current (`ai-ml-pipeline-testing`) repo
|
92 | 92 | - `setup atlas cli` -- Sets up the local Atlas deployment
|
| 93 | + |
| 94 | +## Upstream Repo Considerations |
| 95 | + |
| 96 | +For better or worse, we do not maintain AI/ML libraries with which we integrate. |
| 97 | +We provide workarounds for a few common issues that we encounter. |
| 98 | + |
| 99 | +### Third-Party AI/ML library Maintainers have not merged our changes |
| 100 | + |
| 101 | +As we develop a testing infrastructure, we commonly make changes to our integrations with the third-party library. |
| 102 | +This is the case, in particular, when we add a new integration. |
| 103 | +Over time, we may make bug fixes, add new features, and update the API. |
| 104 | +At the start, we will hopefully add the integration tests themselves. |
| 105 | + |
| 106 | +The bad news is that the maintainers of the AI/ML packages may take considerable |
| 107 | +time to review and merge our changes. The good news is that we can begin testing |
| 108 | +without pointing to the main branch of the upstream repo. |
| 109 | +The parameter value of the `CLONE_URL` is very flexible. |
| 110 | +We literally just call `git clone $CLONE_URL`. |
| 111 | +As such, we can point to an arbitrary branch on an arbitrary repo. |
| 112 | +While developing, we encourage developers to point to a feature branch |
| 113 | +on their own fork, and add a TODO with the JIRA ticket to update the url |
| 114 | +once the pull-request has been merged. |
| 115 | + |
| 116 | +### Patching upstream repos |
| 117 | + |
| 118 | +We provide a simple mechanism to make changes to the third-party packages |
| 119 | +without requiring a pull-request (and acceptance by the upstream maintainers). |
| 120 | +This is done via Git Patch files. |
| 121 | + |
| 122 | +Patch files are created very simply: `git diff > mypatch.patch`. |
| 123 | +If you can believe it, this was the primary mechanism to share code with another maintainer |
| 124 | +before pull-requests existed! |
| 125 | +To apply patches, add them to a `patches` directory within the `$DIR` of your build variant. |
| 126 | +As of this writing, the `chatgpt-retrieval-plugin` contains an example that you may use as a reference. |
| 127 | +You can create a number of different patch files, which will be applied recursively. |
| 128 | +This is useful to describe rationale, or to separate out ones that will be removed |
| 129 | +upon a merged pull-request to the upstream repo. |
| 130 | + |
| 131 | +During ChatGPT Retrieval Plugin integration, we ran into build issues on Evergreen hosts. |
| 132 | +In this case, the package failed to build from source. |
| 133 | +It required a library that wasn't available on the host and had no wheel on PyPI. |
| 134 | +As it turned out, the package was actually an optional requirement, |
| 135 | +and so a one-line change to `pyproject.toml` solved our problem. |
| 136 | + |
| 137 | +We realized that we could easily get this working without changing the upstream |
| 138 | +simply by applying a git patch file. |
| 139 | +This is a standard practice used by `conda package` maintainers, |
| 140 | +as they often have to build for a more broad set of scenarios than the original authors intended. |
0 commit comments