Skip to content

CDRIVER-3620 Add new config_generator #1193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
Feb 1, 2023

Conversation

eramongodb
Copy link
Contributor

@eramongodb eramongodb commented Jan 27, 2023

Description

This PR is part of CDRIVER-3620 and introduces the new Evergreen config generator powered by https://github.com/evergreen-ci/shrub.py.

Due to the volume of changes in this PR, I recommend reviewing commit-by-commit to more easily identify how entities are being translated from the legacy config generator to the new config generator.

This PR primarily demonstrates how the new config_generator works by converting some tasks and functions. Changes to the Evergreen matrix (set of tasks and variants) are deferred to followup PRs to minimize volume of changes to review.

.evergreen/config_generator

The new config generator is powered by .evergreen/config_generator/generate-config.py. As documented in the file, this script must be invoked using the following command:

$ python .evergreen/config_generator/generate-config.py

It is recommended to create a virtual environment and use .evergreen/config_generator/requirements.txt to install dependencies. This includes dependencies required by the legacy config generator, which is invoked by the new config generator.

$ python -m venv venv
$ . venv/bin/activate # Assuming Unix-like environment.
$ python -m pip install -r .evergreen/config_generator/requirements.txt

PYTHONPATH is required to allow Python to identify the config_generator module, otherwise you may observe the following error:

ModuleNotFoundError: No module named 'config_generator'

The structure of config_generator is as follows:

.evergreen/config_generator
├── components/         # Defines Evergreen config entities.
│   └── ...
├── etc/                # Misc. helper modules.
│   └── ...
├── generate-config.py  # The main script.
├── generators/         # Generates *.yml files.
│   └── <generator>.py
└── requirements.txt    # Required packages.

config_generator/generators

Generators use all_components() defined in etc/utils.py to recursively import and invoke appropriate generator functions defined by modules under components. This allows components to define functions, tasks, task groups, and variants as necessary in logical (groups of) modules. This is in contrast to the legacy config generator where entities are defined across multiple files, making it difficult to discern the relationships between entities (i.e. on which variants is a given task executed on?). Generators are not expected to be modified often.

The pre.py and post.py generators are special, as they define top-level hooks that apply to all tasks (excluding task groups). Instead of recursively parsing components, the pre and post commands should be defined explicitly.

The legacy-config.py generator invokes the legacy config generator to generate legacy-config.yml as a subprocess. This means only .evergreen/config_generator/generate-config.py needs to be run to generate all Evergreen config files.

All generated YAML files are placed under .evergreen/generated_configs, which are included by the top-level .evergreen/config.yml file. This structure is designed to reduce the total line count of any given YAML file.

config_generator/etc

This directory is for modules that do not define Evergreen entities themselves, but instead define useful Python entities used by other modules. Note, many of these utilities are not yet used by the changes in this PR.

distros.py

To better facilitate the validation and manipulation of distros used, all the distros of interest have been defined along with properties that are useful during task generation. These are currently unused in this PR, but their utility will be demonstrated in upcoming PRs, such as permitting components to easily select small vs. large distro flavors for compile vs. test tasks (see example matrix below).

utils.py

This defines a variety of helper classes and functions used by generators and components. Of note:

class EvgTaskWithRunOn

This allows tasks to define the distros they will be executed on rather than having to specify it in a variant definition. This will eventually allow for matrices to be defined as in this example:

tasks:
  - name: compile-on-linux
    tags: ["compile"]
    run_on: ubuntu1804-large
    commands: [{ func: "compile-source" }]
  - name: compile-on-macos
    tags: ["compile"]
    run_on: macos-1014
    commands: [{ func: "compile-source" }]
  - name: compile-on-windows
    tags: ["compile"]
    run_on: windows-64-vs2019-large
    commands: [{ func: "compile-source" }]

  - name: test-on-linux
    tags: ["test"]
    run_on: ubuntu1804-small
    commands: [{ func: "run-tests" }]
  - name: test-on-macos
    tags: ["test"]
    run_on: macos-1014
    commands: [{ func: "run-tests" }]
  - name: test-on-windows
    tags: ["test"]
    run_on: windows-64-vs2019-small
    commands: [{ func: "run-tests" }]

buildvariants:
  - name: compile-with-asan
    display_name: "Compile with ASAN"
    expansions: { SANITIZERS: "address" }
    tasks: [".compile", ".test"]
  - name: compile-with-tsan
    display_name: "Compile with TSAN"
    expansions: { SANITIZERS: "thread" }
    tasks: [".compile", ".test"]
  - name: compile-with-ubsan
    display_name: "Compile with UBSAN"
    expansions: { SANITIZERS: "undefined" }
    tasks: [".compile", ".test"]

class ConfigDumper

Great effort was put into an improved alternative to the _Dumper class used in the legacy config generator. In particular, the generated YAML does its best to conform to default VS Code YAML formatter, such as preferring " over ' when able, inserting spaces before/after curly braces (i.e. { abc: def } instead of {abc: def}), and indenting block sequences, i.e.:

abc:
  - def
  - geh

instead of:

abc:
- def
- geh

Furthermore, ConfigDumper goes out of its way to apply Evergreen-specific readability improvments such as:

  • sorting in alphabetic order
  • prioritizing important fields over alphabetic order (i.e. name comes first in tasks and variants).
  • "inlining" certain fields to reduce line count (i.e. tags, depends_on, and key-value pairs for expansions.update).
  • using | style for strings that span multiple lines

config_generator/components

To reduce the volume of initial changes to review, as well as to demonstrate how components work, several tasks and functions have been "relocated" from the legacy config generator. For now, no variants have been modified yet and task names have been preserved to limit changes to the generation process only. Modifications to the Evergreen matrix itself are deferred to followup PRs.

Tasks that have been relocated (under components) are:

  • "abi-compliance-check" -> abi_compliance_check.py
  • "make-release-archive" -> make_release_archive.py
  • "check-headers" -> check_mongoc_public_headers.py

Functions that have been relocated (under components/funcs) are:

  • "fetch source" -> fetch_source.py
  • "windows fix" -> removed (unnecessary)
  • "make files executable" -> merged into fetch_source.py
  • "prepare kerberos" -> prepare_kerberos.py
  • "upload build" -> upload_build.py
  • "upload release" -> merged into make_release_archive.py
  • "release archive" -> merged into make_release_archive.py
  • "upload docs" -> merged into make_release_archive.py
  • "upload man pages" -> merged into make_release_archive.py
  • "backtrace" -> backtrace.py
  • "bootstrap mongo-orchestration" -> bootstrap_mongo_orchestration.py
  • "fetch build" -> fetch_build.py
  • "clone drivers-evergreen-tools" -> fetch_det.py
  • "run kms servers" -> run_mock_kms_servers.py
  • "run tests" -> run_tests.py
  • "test versioned api" -> merged into run_tests.py
  • "stop load balancer" -> stop_load_balancer.py
  • "cleanup" -> stop_mongo_orchestration.py
  • "upload test results" -> upload_test_results.py
  • "upload mo artifacts" -> upload_mo_artifacts.py
  • "upload working dir" -> removed (unnecessary).

The intent behind the funcs subdirectory is to group components that define functions only, for the purpose of being used by one or more other components. Components outside of funcs define at least one task, task group, or variant. The funcs directory also demonstrates the power of generators' recursive parsing of the components directory, which allows logically associated groups of modules to be defined in a corresponding subdirectory.

All pre commands with the exception of fetch-source have been moved into relevant tasks only, resulting in the slight increase in total line count for legacy-config.yml. This permitted setting pre_error_fails_task: true in the top-level config file. Upcoming changes that modify the Evergreen matrix (redefining tasks and variants) are expected to significantly reduce the number of lines in legacy-config.yml.

Classes

A notable pattern is that every Evergreen function is defined as a class with name(), defn(), and call(). This is to facilitate their reuse across components via import and will be utilized heavily in upcoming PRs. An example of its intended effect can be seen in the make_release_archive.py component, which imports UploadBuild from config_generator.components.funcs.upload_build. Although not yet used, FetchBuild.call() in fetch_build.py also demonstrates an example of how required parameters to Evergreen functions can be validated and enforced by the Python class.

Command Types

As documented in the top-level config file, more attention is given to the command type of commands defined in the new config generator. This is intended to improve the experience of reviewing patch results by dividing potential failures into three categories rather than just two:

  • setup failure (lavendar): a command failed before the intended test.
  • test failure (red): the intended test failed.
  • system failure (purple): a command failed after the intended test.

This is also motivated by the goal of eventually enabling post_error_fails_task: true to reduce the volume of errors being masked or ignored, as well as eventually relocating all top-level post commands into relevant teardown_group commands of a task group instead.

Script Invocation

During the relocation process, an effort was made to convert invocation of scripts defined under .evergreen/scripts directly via ./path/to/script rather than via sh ./path/to/script or bash ./path/to/script to ensure script shebangs are validated and respected. Additionally, export commands are replaced by expansions.update, add_expansions_to_env, and env={...} when able to reduce verbosity and also make "inputs" (via Evergreen expansions) to commands more apparent. include_expansions_in_env can also be used instead of add_expansions_to_env to be even more explicit about expected inputs, but due to convenience, I have elected not to (validation can be done in relevant scripts instead if necessary).

@eramongodb
Copy link
Contributor Author

eramongodb commented Jan 27, 2023

After some consideration, removed the unique pre.py and post.py generators in favor of explicitly defining pre and post commands in the top-level config file. This means with the exception of legacy-config.py, all generators under config_generator/generators behave the same in how they recursively parse config_generator/components to generate functions, tasks, task_groups, and variants.

@eramongodb
Copy link
Contributor Author

Resolved merge conflicts. Latest changes verified by this patch.

Copy link
Contributor

@rcsanchez97 rcsanchez97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is phenomenal work! I am super excited to see our Evergreen configuration improved and this is an enormous improvement already, with more to come I'm sure.

LGTM

Copy link
Contributor

@kkloberdanz kkloberdanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Left minor comments. LGTM. The separate commits were helpful for reviewing. The effort for concise and consistent YAML formatting is appreciated.

Copy link
Contributor

@vector-of-bool vector-of-bool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very welcome improvements. My comments are mostly suggestions of Python idioms and code cleanup.

@eramongodb eramongodb merged commit a88f4d3 into mongodb:master Feb 1, 2023
@eramongodb eramongodb deleted the cdriver-3620 branch February 1, 2023 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants