Skip to content

devconfig,shellgen: option to patch ELF binaries with newer glibc #1574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Oct 31, 2023

Conversation

gcurtis
Copy link
Collaborator

@gcurtis gcurtis commented Oct 19, 2023

Add an optional patch_glibc boolean field to packages in devbox.json. When true, devbox will patch any ELF binaries in a package to link against the latest nixpkgs#glibc at runtime. This is a workaround for dynamic linking issues that can arise when using older packages.

For example, an old Python interpreter that's linked against glibc 2.35 might run a script that loads a native library that requires glibc 2.37. Because the linker only loads a library into a process once (regardless of version), linking will fail when the script references a 2.37 symbol.

The following devbox.json will reproduce a glibc version bug when the virtenv and crash scripts are run on x86-64-linux (aarch64 will not work). Setting the patch_glibc field to true will fix it:

{
  "packages": {
    "binutils": "latest",
    "libpqxx": "latest",
    "libxcrypt": "latest",
    "libz": "latest",
    "python37Packages.pip": "latest",
    "python": {
      "version": "3.7",
      "patch_glibc": false
    }
  },
  "shell": {
    "scripts": {
      "crash": "python3 -c 'import psycopg2'",
      "virtenv": [
        "rm -rf \"$VENV_DIR\"",
        "echo \"python3 path: $(readlink -f $(command -v python3))\"",
        "python3 -m venv \"$VENV_DIR\"",
        ". \"$VENV_DIR/bin/activate\"",
        "echo \"pip3 path: $(readlink -f $(command -v pip3))\"",
        "pip3 install psycopg2==2.9.5"
      ]
    }
  }
}

This field is intended to be a "last resort" for when a package cannot be updated to a newer version. Upgrading to a newer version of Python in the example above is preferable.

@gcurtis gcurtis requested review from mikeland73 and savil October 19, 2023 19:53
Add an optional `patch_glibc` boolean field to packages in devbox.json.
When true, devbox will patch any ELF binaries in a package to link
against the latest `nixpkgs#glibc` at runtime. This is a workaround for
dynamic linking issues that can arise when using older packages.

For example, an old Python interpreter that's linked against glibc 2.35
might run a script that loads a native library that requires glibc
2.37. Because the linker only loads a library into a process once
(regardless of version), linking will fail when the script references a
2.37 symbol.

The following devbox.json will reproduce a glibc version bug when the
`virtenv` and `crash` scripts are run on x86-64-linux (aarch64 will not
work). Setting the `patch_glibc` field to true will fix it:

```json
{
  "packages": {
    "binutils": "latest",
    "libpqxx": "latest",
    "libxcrypt": "latest",
    "libz": "latest",
    "python37Packages.pip": "latest",
    "python": {
      "version": "3.7",
      "patch_glibc": false
    }
  },
  "shell": {
    "scripts": {
      "crash": "python3 -c 'import psycopg2'",
      "virtenv": [
        "rm -rf \"$VENV_DIR\"",
        "echo \"python3 path: $(readlink -f $(command -v python3))\"",
        "python3 -m venv \"$VENV_DIR\"",
        ". \"$VENV_DIR/bin/activate\"",
        "echo \"pip3 path: $(readlink -f $(command -v pip3))\"",
        "pip3 install psycopg2==2.9.5"
      ]
    }
  }
}
```

This field is intended to be a "last resort" for when a package cannot
be updated to a newer version. Upgrading to a newer version of Python
in the example above is preferable.
Copy link
Collaborator

@savil savil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to get to review this!

I'd like to better understand why the python plugin wasn't working. In https://github.com/savil/devbox-issue-1276, I hadn't needed to reset the venv-dir.

{{- range .FlakeInputs }}
{{.Name}}.url = "{{.URLWithCaching}}";
{{- end }}

{{- if .PatchGlibc }}
nixpkgs-unstable.url = "nixpkgs";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the semantics for when the concrete resolution for this changes:

  • is it everytime we use the generated flake.nix?

How can we ensure it is locked and reproducible for users?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be locked by the flake.lock file. I'm not as familiar with the updating logic, but I think it should only change when devbox update is run.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, generally speaking, devbox update doesn't affect the flake.lock directly. It does affect how the flake.nix's inputs are defined, and in that manner will affect the flake.lock file.

I think this could be problematic. My understanding is:

  1. In general, in a flake, nixpkgs will resolve to the current nix channel for the user.
  2. The flake.lock will be generated from that. For users on a team, this could pin nixpkgs to different commit-hashes in the generated flake.lock, assuming they are even on the same nix channel.

Let me think some more (have to go to a meeting, jsut sharing my prelim. thoughts)

# Set these attributes so that glibc-patch.bash has access to their
# paths via environment variables of the same name.
inherit pkg;
glibc = nixpkgs.legacyPackages."{{.System}}".glibc;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using stdenv.cc.libc_lib. I don't know if glibc nix package is always reliably updated. For example, glibc 2.36 versions are missing https://www.nixhub.io/packages/glibc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I specified glibc because the patch would break if the stdenv libc was different (for example, it could be musl).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

@gcurtis
Copy link
Collaborator Author

gcurtis commented Oct 23, 2023

@savil PTAL. I fixed the overlay, removed the lib patching, and added a system check.

@gcurtis gcurtis requested a review from savil October 23, 2023 23:22
Copy link
Collaborator

@savil savil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally lgtm, assuming overlays is how we wanna go. I think the nixpkgs reference in the generated flake.nix is problematic due to huirting reproducibility. Will circle back shortlky....

@@ -42,7 +42,7 @@
{{- range $flake.Packages }}
{{- if .PatchGlibc }}
(final: prev: {
{{.PackageAttributePath}} = prev.{{.PackageAttributePath}};
{{.PackageAttributePath}} = (patchGlibc prev.{{.PackageAttributePath}});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, moving it here makes sense

# Set these attributes so that glibc-patch.bash has access to their
# paths via environment variables of the same name.
inherit pkg;
glibc = nixpkgs.legacyPackages."{{.System}}".glibc;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

{{- range .FlakeInputs }}
{{.Name}}.url = "{{.URLWithCaching}}";
{{- end }}

{{- if .PatchGlibc }}
nixpkgs-unstable.url = "nixpkgs";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, generally speaking, devbox update doesn't affect the flake.lock directly. It does affect how the flake.nix's inputs are defined, and in that manner will affect the flake.lock file.

I think this could be problematic. My understanding is:

  1. In general, in a flake, nixpkgs will resolve to the current nix channel for the user.
  2. The flake.lock will be generated from that. For users on a team, this could pin nixpkgs to different commit-hashes in the generated flake.lock, assuming they are even on the same nix channel.

Let me think some more (have to go to a meeting, jsut sharing my prelim. thoughts)

{{- range .FlakeInputs }}
{{.Name}}.url = "{{.URLWithCaching}}";
{{- end }}

{{- if .PatchGlibc }}
nixpkgs-unstable.url = "nixpkgs";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a hard coded reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed:

avoid creating nixpkgs-unstable and just use nixpkgs from above. Just set nixpkgs to be the hard coded commit you want in go code.

Copy link
Contributor

@mikeland73 mikeland73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recapping offline discussion:

This PR works because bin wrappers point to the nix store directly (and skip the profile). We are wanting to get rid of bin wrappers soon, so we don't want to rely on this functionality.

For a fix in a follow up:

  • Move glibc patching to a flake (one flake per package that requires patching)
  • Revert top level flake so it no longer has awareness of patching (we decided for now to not do the overlay)
  • You can modify AllInstallablePackages (similar to how ProcessPluginPackages works) to replace the original package with the local flake. This will ensure the package gets installed to profile and also included in the top level flake. You should not need any custom code to handle the patching flake inside the top level flake.

PatchGlibc bool
}

func (f *flakeInput) BuildInputs() ([]buildInput, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to revert this as well once you do the individual flake approach since PatchGlibc is not needed in the top level flake.

@gcurtis
Copy link
Collaborator Author

gcurtis commented Oct 26, 2023

@savil @mikeland73 please take another look.

  • Patching now happens in a separate .devbox/gen/flake/glibc-patch flake.
  • Any packages that need to be patched have their flake input overridden to refer to the glibc-patch flake.
  • There's some duplication between the top-level flake template and the glibc flake template to handle cached packages and initializing each nixpkgs input.
  • The virtual environment bug still happens, which I think should go away once wrappers are removed.

Copy link
Collaborator

@savil savil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited to ship a fix for glibc dynamic linking issues!

Could you add an example devbox-project under examples/ that exercises the patch glibc? That way, we'll ensure good CICD coverage and also have a ready example to debug issues if someone reports something related to patch-glibc.

@@ -77,48 +83,65 @@ func (f *flakeInput) BuildInputs() ([]string, error) {
// i.e. have a commit hash and always resolve to the same package/version.
// Note: inputs returned by this function include plugin packages. (php only for now)
// It's not entirely clear we always want to add plugin packages to the top level
func flakeInputs(ctx context.Context, packages []*devpkg.Package) []*flakeInput {
func flakeInputs(ctx context.Context, packages []*devpkg.Package) []flakeInput {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the motivation to returning the struct by value, instead of a pointer to the struct?

I generally find that returning a pointer like []*flakeInput leads to easier-to-follow code. For example, line 136 below feels kinda complicated because we're trying to return the struct-by-value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slices should generally contain values unless there's a specific reason they're needed. A slice of pointers allocates each element separately on the heap and adds a layer of indirection. A slice of values is a single chunk of memory which is nicer for GC and locality. We weren't using any of the pointer semantics (other than keying the lookup while building the slice) so I figured I'd switch it.

I moved the "get-or-append" logic into its own method to try and address your feedback about line 136.

@gcurtis
Copy link
Collaborator Author

gcurtis commented Oct 31, 2023

After rebasing main the patch was no longer working. I attempted to get the patched flake into the nix profile, but kept running into bugs with the getting the various normalized/full/resolved/etc attribute paths right. The other problem is that right now we install to the profile before generating the flake. This breaks because the patched flake doesn't exist to install to the profile yet. I don't want to flip that logic right now because it's a much bigger change.

For now I'm just going to hack this by not removing the store path from the PATH. I really want to clean up the devpkg and flake generation code so that it's easier to reason about and revisit this.

@gcurtis gcurtis merged commit df1982d into main Oct 31, 2023
@gcurtis gcurtis deleted the gcurtis/glibc branch October 31, 2023 20:08
Copy link

sentry-io bot commented Nov 1, 2023

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ **Generic Error: <redacted errors.withStack>: write shell.nix to file: <redacted fs.PathError>: <redacted syscal... go.jetpack.io/devbox/internal/shellgen in write... View Issue

Did you find this useful? React with a 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants