Skip to content

Commit eb9c209

Browse files
[PYTHON-4289-patching-docs Added new section: Upstream Repo Considerations describing patching, and CLONE_URLs (#11)
1 parent bd6bc89 commit eb9c209

File tree

2 files changed

+50
-1
lines changed

2 files changed

+50
-1
lines changed

.evergreen/config.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ functions:
3636
echo '${REPO_NAME} could not be found' 1>&2
3737
exit 1
3838
fi
39+
# Apply patches to upstream repo if desired.
3940
cd ${DIR}
4041
git clone ${CLONE_URL}
4142
if [ -d "patches" ]; then

README.md

Lines changed: 49 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## What is it?
44

5-
This repository exists to test our integrations in Third-Party AI/ML testing libraries.
5+
This repository exists to test our integrations in Third-Party AI/ML libraries.
66

77
## Motivation
88

@@ -90,3 +90,51 @@ Test execution flow is defined in `.evergreen/config.yml`. The test pipeline's c
9090
- [`execute tests`](https://github.com/mongodb-labs/ai-ml-pipeline-testing/blob/main/.evergreen/config.yml#L51) -- Uses [subprocess.exec](https://docs.devprod.prod.corp.mongodb.com/evergreen/Project-Configuration/Project-Commands#subprocessexec) to run the provided `run.sh` file. `run.sh` must be within the specified `DIR` path.
9191
- `fetch source` -- Retrieves the current (`ai-ml-pipeline-testing`) repo
9292
- `setup atlas cli` -- Sets up the local Atlas deployment
93+
94+
## Upstream Repo Considerations
95+
96+
For better or worse, we do not maintain AI/ML libraries with which we integrate.
97+
We provide workarounds for a few common issues that we encounter.
98+
99+
### Third-Party AI/ML library Maintainers have not merged our changes
100+
101+
As we develop a testing infrastructure, we commonly make changes to our integrations with the third-party library.
102+
This is the case, in particular, when we add a new integration.
103+
Over time, we may make bug fixes, add new features, and update the API.
104+
At the start, we will hopefully add the integration tests themselves.
105+
106+
The bad news is that the maintainers of the AI/ML packages may take considerable
107+
time to review and merge our changes. The good news is that we can begin testing
108+
without pointing to the main branch of the upstream repo.
109+
The parameter value of the `CLONE_URL` is very flexible.
110+
We literally just call `git clone $CLONE_URL`.
111+
As such, we can point to an arbitrary branch on an arbitrary repo.
112+
While developing, we encourage developers to point to a feature branch
113+
on their own fork, and add a TODO with the JIRA ticket to update the url
114+
once the pull-request has been merged.
115+
116+
### Patching upstream repos
117+
118+
We provide a simple mechanism to make changes to the third-party packages
119+
without requiring a pull-request (and acceptance by the upstream maintainers).
120+
This is done via Git Patch files.
121+
122+
Patch files are created very simply: `git diff > mypatch.patch`.
123+
If you can believe it, this was the primary mechanism to share code with another maintainer
124+
before pull-requests existed!
125+
To apply patches, add them to a `patches` directory within the `$DIR` of your build variant.
126+
As of this writing, the `chatgpt-retrieval-plugin` contains an example that you may use as a reference.
127+
You can create a number of different patch files, which will be applied recursively.
128+
This is useful to describe rationale, or to separate out ones that will be removed
129+
upon a merged pull-request to the upstream repo.
130+
131+
During ChatGPT Retrieval Plugin integration, we ran into build issues on Evergreen hosts.
132+
In this case, the package failed to build from source.
133+
It required a library that wasn't available on the host and had no wheel on PyPI.
134+
As it turned out, the package was actually an optional requirement,
135+
and so a one-line change to `pyproject.toml` solved our problem.
136+
137+
We realized that we could easily get this working without changing the upstream
138+
simply by applying a git patch file.
139+
This is a standard practice used by `conda package` maintainers,
140+
as they often have to build for a more broad set of scenarios than the original authors intended.

0 commit comments

Comments
 (0)