-
Notifications
You must be signed in to change notification settings - Fork 455
CDRIVER-3620 Audit compile and test scripts #1187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This are just some comments/responses to your summary writeup. I will follow with any code-specific comments that I might have in a separate review. Related to CDRIVER-3562 and the dropping of OpenSSL 0.9.8 and 1.0.0, note that the toolchain still contains older SSL implementations, including 0.9.8 and 1.0.0: https://github.com/10gen/mongo-c-toolchain/blob/c6f10761338dc7e94545d7fdc1d1a91fb1818f63/install-ssl.sh#L85-L86 Also, you might take a look at how the paralellism issue is handled in that script. Note that this is a different It might be worth updating that in conjunction with CDRIVER-3562. Here is the explanation of the The OpenSSL version being built is 1.1.0. It is evidently then being placed ahead of the system's libssl in the search path (whether by adding its location to ld.so.conf and executing ldconfig, or by setting
This happens when CMake was built and linked against a newer OpenSSL library that includes the In any case, the solution seems to be (and I say this without having looked at the implementation of the Evergreen task itself), to not push the newly build OpenSSL 1.1.0 into the front of the entire environment. Rather, build it and then just tell CMake to go find it only during the execution of the build itself. If you like, I can tackle this after the PR gets merged. |
Despite appearances, toolchain's solution to the OpenSSL parallel build problem seems to be to ignore it by setting parallelism to 1, making the use of the
I also attempted this workaround, but the goal of permitting CMake to run while dynamically linking against OpenSSL 1.1.1 conflicts with the intent of allowing it to find OpenSSL 1.1.0 during build configuration (it is required during the configuration step, not just during the build step). It is unclear why this only appears to be an issue with OpenSSL 1.1.0 and not with other OpenSSL versions (see linked-to patch, e.g. OpenSSL 1.0.2 seems to work just fine). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not fully reviewed, but I have some prelim comments.
BIG +1 for the build time improvements (my favorite kind of improvements).
That's a fair point.
Hmm. That is interesting. I'll try to take a closer look at it. |
Here is one possible way to deal with the issue:
Here is a patch build showing that it works. This would actually require changing the logic in
The reason for that is that OpenSSL 1.1.0 and 1.1.1 both generate a library file named Another potential workaround is to use an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just a minor suggestion for what Colby brought up with bounding parallelism in make -j
commands to a reasonable upper bound.
@rcsanchez97 Thank you for the deep dive! Will explore the suggested changes (leaning towards using |
@rcsanchez97 Regarding CDRIVER-3562, I feel comfortable dropping 0.9.8 and 1.0.0 given the comment in CDRIVER-3562, which refers to libmongoc documentation that effectively mandates OpenSSL 1.0.1 or newer already:
I think this would also justify dropping 0.9.8 and 1.0.0 from the toolchain. |
Latest changes verified by this patch. Custom OpenSSL libraries (via This necessitated the addition of if command -v ldd >/dev/null; then
# MacOS does not have ldd.
LD_LIBRARY_PATH="${openssl_lib_prefix}" ldd "${ping}" | grep "libssl"
LD_LIBRARY_PATH="${openssl_lib_prefix}" ldd "${test_gssapi}" | grep "libssl"
fi However, the build-and-run-authentication-tests-openssl-1.1.0 task is continuing to fail with the following error when running the Atlas Free Tier test (the preceeding X.509 test appears to succeed):
Examining the logs, it appears to be both compiling and linking wih the custom OpenSSL library correctly, but the error seems to suggest there may still be either a mismatch in library versions or misconfiguration of the environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Latest changes verified by this patch. Grep failing to match against |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work.
I fully support the decision to moving the mock server tests to a separate task to reduce build time on the debug-compile tasks.
(e.g. if the value of Evergreen expansions such as ${atlas_free} are considered sensitive).
The URIs for Atlas dev instances for driver testing are considered sensitive. Evergreen is transitioning to be entirely behind authentication (See e-mail with subject "Update: Evergreen Auth Policy"). I expect silent=True
is not needed regardless. run auth tests
does not appear to log the URIs. It seems OK to remove.
declare new_path | ||
new_path="$(readlink -f "/opt/mongo-c-toolchain/${ssl_ver}/bin")" | ||
new_path+=":${addl_path}" | ||
new_path+=":${old_path}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_path+=":${old_path}" |
old_path
is included in addl_path
.evergreen/compile-unix.sh
Outdated
fi | ||
|
||
if [[ -n "${ZSTD}" ]]; then | ||
# Since zstd inconsitently installed on macos-1014. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Since zstd inconsitently installed on macos-1014. | |
# Since zstd is inconsistently installed on macos-1014. |
.evergreen/install-ssl.sh
Outdated
tar zxf fips.tar.gz | ||
pushd openssl-fips-2.0.16 | ||
( | ||
set -x xtrace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set -x xtrace | |
set -o xtrace |
Deferring further diagnosis of build-and-run-authentication-tests-openssl-1.1.0 task failure to CDRIVER-4562 to unblock work. |
Latest changes verified by this patch. Will merge PR to unblock further work once the patch has completed and there are no further feedback/objections. |
FYI, you can use
It's because the custom-built OpenSSL can't find the root certificate store, which contains the CA that ultimately signed the Atlas server certificate. By default, OpenSSL doesn't include root certificates (TTBOMK) because that is a decision that has wide security implications. The developers probably leave the decision to implementors and integrators because that is the sort of thing that an implementor or integrator would want to decide. Try this:
That's the default CA certificate directory on Arch, Debian, and Ubuntu. I'm not sure if it is the same or different on other platforms, but it wouldn't be difficult to add some logic of we are running tasks with custom builds of OpenSSL on other platforms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, so I am approving. Note that I made a couple of minor comments/suggestions in the review, plus in the main PR conversation I provided a possible solution to the Atlas certificate issue. I will leave it up to you to decide whether to merge as is and defer my suggestions to later, or whether you want to make the tweaks before merging.
if [[ "${COMPILE_LIBMONGOCRYPT}" == "ON" ]]; then | ||
# Build libmongocrypt, using the previously fetched installed source. | ||
# TODO(CDRIVER-4394) update to use libmongocrypt 1.7.0 once there is a stable 1.7.0 release. | ||
git clone --depth=1 https://github.com/mongodb/libmongocrypt --branch 1.7.0-alpha1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 1.7.0 release has been made, so you can probably update this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libmongocrypt 1.7.0 requires cmake 3.12 (see mongodb/libmongocrypt#522 (comment)). The C driver builds with older versions of cmake in some distros. I plan to change the libmongocrypt build in CDRIVER-4394 to use libmongocrypt's compile.sh script to obtain the required version of cmake. I suggest leaving 1.7.0-alpha1 in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed that particular detail.
.evergreen/compile-unix.sh
Outdated
# Hosts may have libmongocrypt installed from apt/yum. We do not want to pick those up | ||
# since those libmongocrypt packages statically link libbson. | ||
# Note: may be overwritten by ${EXTRA_CONFIGURE_FLAGS}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Hosts may have libmongocrypt installed from apt/yum. We do not want to pick those up | |
# since those libmongocrypt packages statically link libbson. | |
# Note: may be overwritten by ${EXTRA_CONFIGURE_FLAGS}. | |
# Avoid symbol collisions with libmongocrypt installed via apt/yum. | |
# Note: may be overwritten by ${EXTRA_CONFIGURE_FLAGS}. |
If libmongocrypt is installed from apt/yum (from official packages), then the libmongocrypt library will be dynamically linked to the libbson shared library which is a proper dependency:
root@build01:~# ldd /usr/lib/x86_64-linux-gnu/libmongocrypt.so.0.0.0 |grep bson
libbson-1.0.so.0 => /lib/x86_64-linux-gnu/libbson-1.0.so.0 (0x00007fdc8b9ef000)
In the case of libmongocrypt installed from our PPA, then it is correct to say that the libbson linkage is static (because we don't ship a separate libbson package in the libmongocrypt PPA).
if [ "${COMPILE_LIBMONGOCRYPT}" = "ON" ]; then | ||
# Build libmongocrypt, using the previously fetched installed source. | ||
# TODO(CDRIVER-4394) update to use libmongocrypt 1.7.0 once there is a stable 1.7.0 release. | ||
git clone --depth=1 https://github.com/mongodb/libmongocrypt --branch 1.7.0-alpha1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 1.7.0 release has been made, so you can probably update this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will refrain given CMake 3.12 dependency mentioned in the other comment.
Why does this only appear to be an issue for OpenSSL 1.1.0 and not with any other custom-installed OpenSSL version being tested? My understanding was that the copying of tls-ca-bundle.pem into |
That is something I don't understand at first glance. It seems like it should work (and I had that thought when reviewing the PR), but it seems that for some reason it isn't working. Figuring it out would likely involve some deep digging. I have some vague ideas, but nothing that I can dive into at the moment given the other things in my queue. |
Description
This PR is part of CDRIVER-3620 and is focused on improving the quality and use of various Evergreen scripts pertaining to compilation and testing.
Verified by this patch.
Do not run tests in
compile
tasksThe most significant change in thiS PR is the removal of tests run after compilation in many
debug-compile-*
tasks. These "mock tests" are almost always immediately followed by correspondingtest-*
tasks that have equal if not greater code coverage. Therefore, execution of tests has been completely moved out of compilation-related scripts.Note, mock server testing is now covered only by the
test-asan-memcheck-mock-server
task, as regulartest
tasks setMONGOC_TEST_SKIP_MOCK="on"
by default. This can be removed if we want to include execution of mock server tests in most test tasks.This reduces task duration of most compile tasks by over 50%. Some examples:
Dependent tasks are therefore able to be executed (in parallel) sooner, reducing the overall runtime of a given Evergreen patch.
Coverage Tasks
Because of the remove of test execution in compile tasks, the
debug-compile-coverage
task had to be redefined to run and collect coverage data separately. Note: the coverage of tests executed in compile tasks had been very small anyways (MONGOC_TEST_SKIP_LIVE=on
andMONGOC_TEST_SKIP_SLOW=on
)...The pattern used for existing
test-coverage-*
tasks was adopted. A singlecompile coverage
function is used to build the source withCOVERAGE=ON
, which is then used by subsequent commands in a giventest-coverage-*
task.For now, all coverage tasks has been consolidated into a single
test-coverage-latest-replica-set-auth-sasl-openssl-cse
task run on Ubuntu 18.04 is currently being tested. This set can be expanded as needed by manipulating the required/prohibited conditions inIntegrationTask
.add_expansions_to_env
This very useful parameter to Evergreen functions removes the need for the common
export X="${X}"
pattern used to translate Evergreen expansions into Bash environment variables. Relevant Evergreen functions and scripts have been updated accordingly to utilize Evergreen expansions (now automatically environment variables) directly within Bash scripts.OpenSSL
Per CDRIVER-3562, testing with OpenSSL versions older than 1.0.1 (0.9.8 and 1.0.0) have been removed.
Authentication Tests
The
silent=True
parameter for therun auth tests
function was removed to hopefully assist in debugging failures when they occur. This can be reverted if necessary (e.g. if the value of Evergreen expansions such as${atlas_free}
are considered sensitive).bypass_dlclose
A followup to #1177. Instead of evaluating a target command within a
bypass_dlclose
function (e.g.bypass_dlclose command with bypass
), the function was refactored to allow more flexible use ofLD_PRELOAD
by returning the paths to libraries to preload as a string (output) instead. This also included the removal oftrap 'rm -rf "$tmp"' EXIT
, which caused difficult-to-diagnose on-script-exiterrexit
failures due to thetmp
variable being overwritten during execution of the wrapped command. Instead, thebypass_dlclose.so
library is now left in a temporary directory without requiring use of complex cleanup routines.Bash and Hygienic Scripts
Many scripts were refactored to utilize Bash features to improve readability and robustness. Of note, use of
"${<varname>:?}"
to assert variables that should not be null or unset, arrays to satisfy shellcheck warnings regarding variable expansions, and use of"${OSTYPE}"
and"${HOSTTYPE}"
instead of$OS
+uname
to reduce cross-script variable dependencies/leakage.Bugfixes Galore
As part of auditing the Evergreen scripts, numerous bugs were discovered that were being hidden or ignored.
OpenSSL/LibreSSL Compilation, Installation, and Selection
The
install-ssl.sh
script masked compilation and installation failure with|| true
, leading to the subsequent libmongoc compilation linking against system-installed OpenSSL library (usually 1.1.1) that do not match the intended library. After fixing this issue, thebuild-and-run-authentication-tests-*
tasks now actually build and run against the correct SSL libraries.OpenSSL versions prior to 1.1.0 have trouble being built in parallel (probably what motivated the liberal use of
|| true
). The order of targets built has been explictly enumerated to address these build failures while still maximizing build parallelism (e.g.build-and-run-authentication-tests-openssl-1.0.1-fips
: 5m 36s -> 2m 36s).The OpenSSL 1.1.0 library "letter" release ("exclusively contain bug and security fixes and no new features") had to be bumped from
f
tol
to address Perl incompatibility issues during configuration:"glob" is not exported by the File::Glob module
.Measures were also taken to reduce the verbosity of output by silencing non-important build messages (warnings and errors are still emitted).
Fixing this behavior also triggered/exposed compilation failure for
build-and-run-authentication-tests-openssl-1.1.0
due to the following error when attempting to execute CMake:It is unclear why this error only occurs with OpenSSL 1.1.0. I have elected to leave the task in a failing status for now. I would appreciate some assistance with this issue, but I do not consider it a high priority.
CFLAGS
vs. Maintainer FlagsSome compiler flags such as
-Wno-redundant-decls
set viaCFLAGS
were being effectively ignored due to maintainer flags being appended toCMAKE_C_FLAGS
rather than prepended. This resulted in the order of flags being<cflags> <maintainer-flags>
such that-Wno-redundant-decls
was eventually undone by a following-Wredundant-decls
. Maintainer flags are now prepended toCMAKE_C_FLAGS
soCFLAGS
can continue to be used to silence or overwrite default maintainer flags.X
andENABLE_X
Environment VariablesThere were some inconsistencies regarding how flags such as
SSL
,SASL
, and etc. were being defined in Evergreen and forwarded/set within compilation or test scripts, making it difficult to discern if/when a given variable was not being handled properly (i.e. defined as1
but compared against'ON'
). An attempt to improve their consistency was made, assisted by the introduction ofcheck_var_opt
andcheck_var_req
utility functions to print and assert the values of environment variables used by scripts.Error Handling in BAT Scripts
Errors were being ignored in
.bat
scripts due to noset -o errexit
-like equivalent.|| exit /b
were added to potentially-failing commands appropriately.