-
Notifications
You must be signed in to change notification settings - Fork 249
devconfig,shellgen: option to patch ELF binaries with newer glibc #1574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add an optional `patch_glibc` boolean field to packages in devbox.json. When true, devbox will patch any ELF binaries in a package to link against the latest `nixpkgs#glibc` at runtime. This is a workaround for dynamic linking issues that can arise when using older packages. For example, an old Python interpreter that's linked against glibc 2.35 might run a script that loads a native library that requires glibc 2.37. Because the linker only loads a library into a process once (regardless of version), linking will fail when the script references a 2.37 symbol. The following devbox.json will reproduce a glibc version bug when the `virtenv` and `crash` scripts are run on x86-64-linux (aarch64 will not work). Setting the `patch_glibc` field to true will fix it: ```json { "packages": { "binutils": "latest", "libpqxx": "latest", "libxcrypt": "latest", "libz": "latest", "python37Packages.pip": "latest", "python": { "version": "3.7", "patch_glibc": false } }, "shell": { "scripts": { "crash": "python3 -c 'import psycopg2'", "virtenv": [ "rm -rf \"$VENV_DIR\"", "echo \"python3 path: $(readlink -f $(command -v python3))\"", "python3 -m venv \"$VENV_DIR\"", ". \"$VENV_DIR/bin/activate\"", "echo \"pip3 path: $(readlink -f $(command -v pip3))\"", "pip3 install psycopg2==2.9.5" ] } } } ``` This field is intended to be a "last resort" for when a package cannot be updated to a newer version. Upgrading to a newer version of Python in the example above is preferable.
33cfa71
to
24ef38a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to get to review this!
I'd like to better understand why the python plugin wasn't working. In https://github.com/savil/devbox-issue-1276, I hadn't needed to reset the venv-dir.
{{- range .FlakeInputs }} | ||
{{.Name}}.url = "{{.URLWithCaching}}"; | ||
{{- end }} | ||
|
||
{{- if .PatchGlibc }} | ||
nixpkgs-unstable.url = "nixpkgs"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the semantics for when the concrete resolution for this changes:
- is it everytime we use the generated flake.nix?
How can we ensure it is locked and reproducible for users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be locked by the flake.lock file. I'm not as familiar with the updating logic, but I think it should only change when devbox update
is run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, generally speaking, devbox update
doesn't affect the flake.lock
directly. It does affect how the flake.nix
's inputs are defined, and in that manner will affect the flake.lock
file.
I think this could be problematic. My understanding is:
- In general, in a flake,
nixpkgs
will resolve to the currentnix channel
for the user. - The
flake.lock
will be generated from that. For users on a team, this could pinnixpkgs
to different commit-hashes in the generated flake.lock, assuming they are even on the same nix channel.
Let me think some more (have to go to a meeting, jsut sharing my prelim. thoughts)
# Set these attributes so that glibc-patch.bash has access to their | ||
# paths via environment variables of the same name. | ||
inherit pkg; | ||
glibc = nixpkgs.legacyPackages."{{.System}}".glibc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest using stdenv.cc.libc_lib
. I don't know if glibc
nix package is always reliably updated. For example, glibc 2.36 versions are missing https://www.nixhub.io/packages/glibc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I specified glibc
because the patch would break if the stdenv libc was different (for example, it could be musl).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point
@savil PTAL. I fixed the overlay, removed the lib patching, and added a system check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally lgtm, assuming overlays is how we wanna go. I think the nixpkgs
reference in the generated flake.nix is problematic due to huirting reproducibility. Will circle back shortlky....
@@ -42,7 +42,7 @@ | |||
{{- range $flake.Packages }} | |||
{{- if .PatchGlibc }} | |||
(final: prev: { | |||
{{.PackageAttributePath}} = prev.{{.PackageAttributePath}}; | |||
{{.PackageAttributePath}} = (patchGlibc prev.{{.PackageAttributePath}}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, moving it here makes sense
# Set these attributes so that glibc-patch.bash has access to their | ||
# paths via environment variables of the same name. | ||
inherit pkg; | ||
glibc = nixpkgs.legacyPackages."{{.System}}".glibc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point
{{- range .FlakeInputs }} | ||
{{.Name}}.url = "{{.URLWithCaching}}"; | ||
{{- end }} | ||
|
||
{{- if .PatchGlibc }} | ||
nixpkgs-unstable.url = "nixpkgs"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, generally speaking, devbox update
doesn't affect the flake.lock
directly. It does affect how the flake.nix
's inputs are defined, and in that manner will affect the flake.lock
file.
I think this could be problematic. My understanding is:
- In general, in a flake,
nixpkgs
will resolve to the currentnix channel
for the user. - The
flake.lock
will be generated from that. For users on a team, this could pinnixpkgs
to different commit-hashes in the generated flake.lock, assuming they are even on the same nix channel.
Let me think some more (have to go to a meeting, jsut sharing my prelim. thoughts)
{{- range .FlakeInputs }} | ||
{{.Name}}.url = "{{.URLWithCaching}}"; | ||
{{- end }} | ||
|
||
{{- if .PatchGlibc }} | ||
nixpkgs-unstable.url = "nixpkgs"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this a hard coded reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed:
avoid creating nixpkgs-unstable
and just use nixpkgs
from above. Just set nixpkgs
to be the hard coded commit you want in go code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recapping offline discussion:
This PR works because bin wrappers point to the nix store directly (and skip the profile). We are wanting to get rid of bin wrappers soon, so we don't want to rely on this functionality.
For a fix in a follow up:
- Move glibc patching to a flake (one flake per package that requires patching)
- Revert top level flake so it no longer has awareness of patching (we decided for now to not do the overlay)
- You can modify
AllInstallablePackages
(similar to howProcessPluginPackages
works) to replace the original package with the local flake. This will ensure the package gets installed to profile and also included in the top level flake. You should not need any custom code to handle the patching flake inside the top level flake.
internal/shellgen/flake_input.go
Outdated
PatchGlibc bool | ||
} | ||
|
||
func (f *flakeInput) BuildInputs() ([]buildInput, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to revert this as well once you do the individual flake approach since PatchGlibc
is not needed in the top level flake.
@savil @mikeland73 please take another look.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excited to ship a fix for glibc dynamic linking issues!
Could you add an example devbox-project under examples/
that exercises the patch glibc? That way, we'll ensure good CICD coverage and also have a ready example to debug issues if someone reports something related to patch-glibc.
@@ -77,48 +83,65 @@ func (f *flakeInput) BuildInputs() ([]string, error) { | |||
// i.e. have a commit hash and always resolve to the same package/version. | |||
// Note: inputs returned by this function include plugin packages. (php only for now) | |||
// It's not entirely clear we always want to add plugin packages to the top level | |||
func flakeInputs(ctx context.Context, packages []*devpkg.Package) []*flakeInput { | |||
func flakeInputs(ctx context.Context, packages []*devpkg.Package) []flakeInput { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the motivation to returning the struct by value, instead of a pointer to the struct?
I generally find that returning a pointer like []*flakeInput
leads to easier-to-follow code. For example, line 136 below feels kinda complicated because we're trying to return the struct-by-value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slices should generally contain values unless there's a specific reason they're needed. A slice of pointers allocates each element separately on the heap and adds a layer of indirection. A slice of values is a single chunk of memory which is nicer for GC and locality. We weren't using any of the pointer semantics (other than keying the lookup while building the slice) so I figured I'd switch it.
I moved the "get-or-append" logic into its own method to try and address your feedback about line 136.
After rebasing main the patch was no longer working. I attempted to get the patched flake into the nix profile, but kept running into bugs with the getting the various normalized/full/resolved/etc attribute paths right. The other problem is that right now we install to the profile before generating the flake. This breaks because the patched flake doesn't exist to install to the profile yet. I don't want to flip that logic right now because it's a much bigger change. For now I'm just going to hack this by not removing the store path from the PATH. I really want to clean up the devpkg and flake generation code so that it's easier to reason about and revisit this. |
Suspect IssuesThis pull request was deployed and Sentry observed the following issues:
Did you find this useful? React with a 👍 or 👎 |
Add an optional
patch_glibc
boolean field to packages in devbox.json. When true, devbox will patch any ELF binaries in a package to link against the latestnixpkgs#glibc
at runtime. This is a workaround for dynamic linking issues that can arise when using older packages.For example, an old Python interpreter that's linked against glibc 2.35 might run a script that loads a native library that requires glibc 2.37. Because the linker only loads a library into a process once (regardless of version), linking will fail when the script references a 2.37 symbol.
The following devbox.json will reproduce a glibc version bug when the
virtenv
andcrash
scripts are run on x86-64-linux (aarch64 will not work). Setting thepatch_glibc
field to true will fix it:This field is intended to be a "last resort" for when a package cannot be updated to a newer version. Upgrading to a newer version of Python in the example above is preferable.