Skip to content

[rm nixpkgs] make HEAD request to BinaryCache to ensure binary is cached #1318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 17, 2023

Conversation

savil
Copy link
Collaborator

@savil savil commented Jul 27, 2023

Summary

This PR makes a HTTP HEAD request to https://cache.nixos.org/<hash>.narinfo to verify that the binary is indeed cached.

If not, we fallback to the legacy path that is dependent on downloading nixpkgs.

Implementation note
In IsInBinaryCache, I wanted to insert the HEAD request. However, this function can be called in a loop for all packages, which is bad for perf.

Instead, in this design, a new function FillNarInfoCache(packageList) makes the HEAD requests concurrently, storing results in an in-memory cache.

We require that callers invoke FillNarInfoCache(packageList) prior to invoking IsInBinaryCache. Failing this, IsInBinaryCache will return an error. Not thrilled with this design, since it may lead to inadvertent errors, but this code is pretty core and our test coverage should expose any codepaths that call IsInBinaryCache without FillNarInfoCache.

How was it tested?

I could do devbox shell in a project that has hello package. I inserted some print statements to verify that the HEAD
response was 200 and that the caching for the boolean value of isNarInfoInCache was working.

@savil savil requested review from ipince, gcurtis and mikeland73 July 27, 2023 00:45

// Check if the narinfo is present in the binary cache
if exists, ok := isNarInfoInCache[p.Raw]; ok {
fmt.Printf("narInfo cache hit: %v\n", exists)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the Printfs be debug logs instead? I'm guessing we want this to be a transparent thing for the user.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I'll be deleting the Printfs prior to landing! Just have them there to ensure this code works as I want. (this entire section is also gated by the feature-flag)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove this after I test the final version of this PR on devbox-cloud

Copy link
Contributor

@mikeland73 mikeland73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this approach will introduce a perf regression which scales O(n) with number of packages. Thoughts on alternatives in comment?

@jetify-com jetify-com deleted a comment from vercel bot Aug 2, 2023
@savil savil marked this pull request as draft August 7, 2023 21:03
@savil
Copy link
Collaborator Author

savil commented Aug 7, 2023

needs revision so marking as Draft.

@savil savil force-pushed the savil/narinfo-in-cache branch from 36c196d to 857dda1 Compare August 15, 2023 00:29
Copy link
Collaborator Author

savil commented Aug 15, 2023

Current dependencies on/for this PR:

This comment was auto-generated by Graphite.

@savil savil requested a review from mikeland73 August 15, 2023 05:09
@savil savil marked this pull request as ready for review August 15, 2023 05:09
Copy link
Collaborator

@gcurtis gcurtis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We require that callers invoke FillNarInfoCache(packageList) prior to invoking IsInBinaryCache. Failing this, IsInBinaryCache will return an error. Not thrilled with this design, since it may lead to inadvertent errors, but this code is pretty core and our test coverage should expose any codepaths that call IsInBinaryCache without FillNarInfoCache.

If you're looking for an alternate approach, each package could have a sync.Once that makes the HEAD request. The PackageFromString(s) functions would call go pkg.checkCache.Do(...) to kick off the requests in the background. The Package.IsInBinaryCache method would use the same once to block until the network request has finished.

reqURL := BinaryCache + "/" + pathParts.hash + ".narinfo"
res, err := httpClient.Head(reqURL)
if err != nil {
if os.IsTimeout(err) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.IsTimeout is deprecated because it's vague in what's considered a timeout. Use http.NewRequestWithContext instead of setting a timeout in the client. Then you can check with errors.Is(ctx.Err(), context.DeadlineExceeded).

dashIndex = i
}
}
return &storePathParts{hash, name, "" /*version*/}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storePathParts{Version: ""} instead of a comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then hash, and name are not set in the struct, no?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant something like storePathParts{hash: hash, name: name}.

return &storePathParts{hash, name, "" /*version*/}
}

func (p *storePathParts) Equal(other *storePathParts) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use storePathParts as a value then you can just compare with ==.

group, _ := errgroup.WithContext(ctx)
for _, p := range packages {
// If the package's NarInfo status is already known, skip it
if _, ok := isNarInfoInCache[p.Raw]; ok {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if _, ok := isNarInfoInCache[p.Raw]; ok {
if isNarInfoInCache[p.Raw] {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, here, the value can be false, so I need to check if it is set.

}

// enable for nix >= 2.17
return vercheck.SemverCompare(version, "2.17.0") >= 0, nil
httpClient := &http.Client{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a single http.Client for all packages to reuse TCP connections.

Comment on lines 706 to 718
group.Go(func() error {
return pkg.fillNarInfoCache()
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this become group.Go(pkg.fillNarInfoCache)?

Copy link
Collaborator

@gcurtis gcurtis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think (correct me if I'm wrong) that fillNarInfoCache is being run concurrently for each package, but it calls a bunch of stuff that isn't safe (writing to isNarInfoInCache, nix.Version, etc.).

}
pkg := p
group.Go(func() error {
return pkg.fillNarInfoCache()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think pkg.fillNarInfoCache is safe to run concurrently. The map, Nix system/version, etc. all need locking.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh good catch! I can ensure the nix.System value is cached prior to executing this, but may need to add some locking for the map.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@savil savil force-pushed the savil/narinfo-in-cache branch from cf321c6 to 1afca73 Compare August 16, 2023 22:30
Copy link
Collaborator Author

@savil savil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gcurtis updated the code without doing the sync.Once.

If you're looking for an alternate approach, each package could have a sync.Once that makes the HEAD request. The PackageFromString(s) functions would call go pkg.checkCache.Do(...) to kick off the requests in the background. The Package.IsInBinaryCache method would use the same once to block until the network request has finished.

Ooh I like this approach. It would simplify the code, and ensure it works by default: The current pattern of "ensure you call X before calling Y method" is really ugly :(

The reason I didn't directly kick off a HEAD request in one of the package constructors is:

  1. Unfortunately, we recreate the Package structs for the same packages in different places. The structs are not reused. This is why I have a devpkg variable isNarInfoInCache, and not a struct field.
  2. We may create a Package struct which is not used for IsBinaryInCache, and so we'd do wasted work.

One idea is to modify your idea from the PackageFromString constructor to the IsInBinaryCache where we read the result (the modification is: lazily initialize it):

  1. Inside pkg.IsInBinaryCache(), we can do go sync.Once(pkg.fillNarInfoCache). And call that sync.Once(pkg.fillNarInfoCache) again to block for the return value.
  • If its the first time, it'll kick off a goroutine in the background.

Let me try in a stacked PR.... so as not to block this PR but also try to improve it. EDIT: nope, this doesn't work: it'll be block other packages.

dashIndex = i
}
}
return &storePathParts{hash, name, "" /*version*/}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then hash, and name are not set in the struct, no?

group, _ := errgroup.WithContext(ctx)
for _, p := range packages {
// If the package's NarInfo status is already known, skip it
if _, ok := isNarInfoInCache[p.Raw]; ok {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, here, the value can be false, so I need to check if it is set.

}
pkg := p
group.Go(func() error {
return pkg.fillNarInfoCache()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@savil savil requested a review from gcurtis August 16, 2023 23:12
@savil
Copy link
Collaborator Author

savil commented Aug 16, 2023

Hmm, I need to make some fixes...moving to draft.

@savil savil force-pushed the savil/narinfo-in-cache branch 2 times, most recently from 5850449 to d2d0a57 Compare August 17, 2023 00:56
@savil savil marked this pull request as draft August 17, 2023 00:57
@savil savil force-pushed the savil/narinfo-in-cache branch from d2d0a57 to 1afca73 Compare August 17, 2023 00:59
@savil savil force-pushed the savil/narinfo-in-cache branch 2 times, most recently from dcc715b to 4e30ce1 Compare August 17, 2023 03:25
@savil savil marked this pull request as ready for review August 17, 2023 03:58
@savil savil force-pushed the savil/narinfo-in-cache branch from 4e30ce1 to c3af318 Compare August 17, 2023 03:59
@savil
Copy link
Collaborator Author

savil commented Aug 17, 2023

@gcurtis PTAL

Copy link
Collaborator

@gcurtis gcurtis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock, but I think are a couple more race conditions in fillNarInfoCache.

pathParts := newStorePathParts(sysInfo.StorePath)
reqURL := BinaryCache + "/" + pathParts.hash + ".narinfo"
//nolint:govet
ctx, _ := context.WithDeadline(context.Background(), time.Now().Add(5*time.Second))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cancel() always needs to be called (even after the action is completed) otherwise it'll leak resources.

You can also use WithTimeout to avoid adding the time yourself.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

dashIndex = i
}
}
return &storePathParts{hash, name, "" /*version*/}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant something like storePathParts{hash: hash, name: name}.

return nil
}

sysInfo, err := p.sysInfoIfExists()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are more races here:

  • this eventually calls nix.Version which could cause multiple nix --version commands to launch at once. It looks like they'd also race to write to the same global version variable.
  • assuming packages share the same lockfile, it calls p.lockfile.Resolve which can update lock.File.Packages.
  • the call to nix.System can call ComputeSystem which cachedSystem == "" which runs a Nix command and writes to the global cachedSystem variable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah, you are right.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: Instead of sprinkling locks everywhere, I could ensure these are invoked for all packages prior to starting the go-routines. Then within the goroutines, they'll be in read-only mode. Would that work?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would work. It might be easier to just split apart the HTTP request from everything else. Then you can calculate the store hash and check the map synchronously.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit we should cache nix.Version the same way we do with nix.System

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the call to nix.System can call ComputeSystem

This should be pre-computed. I think the ComputeSystem in there was just out of abundance of caution.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nix.Version is cached already

@savil
Copy link
Collaborator Author

savil commented Aug 17, 2023

Gah, I really dislike the implementation in this PR. It makes the code brittle and harder to reason about. Inclined to try other approaches. Will land, but keep thinking about refactoring this.

@savil savil merged commit ff28666 into main Aug 17, 2023
@savil savil deleted the savil/narinfo-in-cache branch August 17, 2023 22:20
Copy link
Contributor

@mikeland73 mikeland73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@savil this is great!

I wonder if we can split out much of the nar cache stuff in devpkg into own file to keep it a bit more separate.

exists, ok := isNarInfoInCache.status[p.Raw]
isNarInfoInCache.lock.RUnlock()
if !ok {
return false, errors.Errorf("narInfo cache miss: %v. call XYZ before invoking IsInBinaryCache", p.Raw)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad error string? XYZ ?

return nil
}

sysInfo, err := p.sysInfoIfExists()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit we should cache nix.Version the same way we do with nix.System

return nil
}

sysInfo, err := p.sysInfoIfExists()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the call to nix.System can call ComputeSystem

This should be pre-computed. I think the ComputeSystem in there was just out of abundance of caution.

savil added a commit that referenced this pull request Aug 22, 2023
## Summary

No logic change. Just grouping this code into its own file, ordered by
public functions/methods on top.

Also fixed this placeholder error message:
#1318 (comment)

## How was it tested?

compiles
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants