-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Refactor generation of SHA256 checksums #3116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci please smoke test |
this is great! |
looks like the new integration tests from #3088 are failing unrelated to this PR change (reproduced the same failure locally on main branch). cc @abertelrud @friedbunny |
@mattt This looks great! But wouldn't it have made more sense to do this in |
@mattt this looks great. we are going into a merge freeze for non-critical bug fixes until after the 5.4 branch is created on 1/7 (see https://forums.swift.org/t/swift-5-4-release-process/), so will merge it then |
/// This property uses the CryptoKit implementation of | ||
/// Secure Hashing Algorithm 2 (SHA-2) hashing with a 256-bit digest, when available, | ||
/// falling back on a native implementation in Swift provided by TSCBasic. | ||
public var sha256Checksum: String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to return the actual data here, and then the call site can convert it to hex, or base-whatever, etc? The representation seems orthogonal to the computation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the existing call sites, we only call SHA256().hash(bytes)
to produce a hexadecimal representation. Returning ByteString
here would be more flexible, but that may be unnecessary. Really, it's a matter of taste; I'm happy with any solution that makes it easy to opt-in to a faster hash function when available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine to leave this in this PR. I just have a general preference for separating out functionality into orthogonal parts, but since this is already an extension ByteString, it's fine to leave as is.
ByteString came about back before String had a UTF-8 backing, so performance was a problem in going between UTF-8 data and String. My sense is that we could probably transition to String here over time, at which point we might want to revisit this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A big 👍 for transitioning away from ByteString
at some point. When I started working on the codebase for registry support, I found the use of TSC / SPM-specific APIs instead of Swift Standard Library and Foundation (and now System) counterparts to be a frequent point of frustration. Many of these APIs were necessary at the time (e.g. JSON
before Codable), but have now become technical debt.
@swift-ci please smoke test macOS Platform |
@mattt could you plz look at resolving the conflicts with this one |
@swift-ci please smoke test |
if checksumAlgorithm is SHA256, #available(macOS 10.15, *) { | ||
checksumAlgorithm = CryptoKitSHA256() | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this feels a bit hacky... could we change the ctor argument checksumAlgorithm: HashAlgorithm = SHA256()
to checksumAlgorithm: HashAlgorithm? = nil
and create default instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this version has clearer semantics; the default is SHA256, not nil
, and we're upgrading to a faster alternative when available. I also value the consistency with the preceding archiver
parameter in the Workspace
initializer and the checksumAlgorithm
parameter MockWorkspace
. But I yield to your preference on this. Here's the equivalent using a nil
argument:
public init(
dataPath: AbsolutePath,
editablesPath: AbsolutePath,
pinsFile: AbsolutePath,
manifestLoader: ManifestLoaderProtocol,
repositoryManager: RepositoryManager? = nil,
currentToolsVersion: ToolsVersion = ToolsVersion.currentToolsVersion,
toolsVersionLoader: ToolsVersionLoaderProtocol = ToolsVersionLoader(),
delegate: WorkspaceDelegate? = nil,
config: Workspace.Configuration = Workspace.Configuration(),
fileSystem: FileSystem = localFileSystem,
repositoryProvider: RepositoryProvider = GitRepositoryProvider(),
downloader: Downloader = FoundationDownloader(),
netrcFilePath: AbsolutePath? = nil,
archiver: Archiver = ZipArchiver(),
checksumAlgorithm: HashAlgorithm? = nil,
) {
// ...
var checksumAlgorithm = checksumAlgorithm ?? SHA256()
#if canImport(CryptoKit)
if #available(macOS 10.15, *) {
checksumAlgorithm = CryptoKitSHA256()
}
#endif
self.checksumAlgorithm = checksumAlgorithm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my point was to try and avoid allocating memory twice. since this is the workspace and it usually only initialized once I guess this is not very critical. lets take the PR as is and we can modify it later if we decide to
@swift-ci please smoke test |
Add PackageRegistry module target Add PackageRegistryTests module target Add --enable-package-registry command-line option Cache registry managers by base URL Set _useLegacyIdentities in SwiftTool initialization Incorporate review feedback for use of tsc_await Reuse HTTP clients in registry manager Remove custom userAgent This is already provided by HTTPClient Add internal archiverFactory to RegistryManager for DI Add queue parameter to RegistryManager.discover Add diagnosticsEngine parameter to RegistryManager Replace MockPackageRegistryURLProtocol with HTTPClient handlers Remove use of tsc_await in RegistryManager.discover Stash progress Merge swiftlang#3116 Merge main
Improves performance and convenience of SHA256 checksum generation.
Motivation:
While working on #3023, dependency resolution with package registries was significantly slower than I expected. After a bit of profiling, I determined that a call to
SHA256().hash
for the downloaded Zip archive of each package release was responsible for the slowdown. Switching to the CryptoKit implementation of SHA256 created improved end-to-end performance from 70s to 15s. (See #3023 (comment))Modifications:
This PR starts by creating a
sha256Checksum
property in an extension onByteString
that provides a lowercase hexadecimal representation of the SHA256 hash, using CryptoKit when available (0bd6a54).Next, it refactors existing usage of SHA256 to use this new property (8512a88, 85fb8da, c92682c).
Finally, it changes the
Workspace
initializer to upgrade the providedchecksumAlgorithm
parameter toCryptoKitSHA256
when CryptoKit is available and a defaultSHA256
value is provided (246cadb).Result:
With these changes, Swift Package Manager should perform certain tasks somewhat faster. In particular, I believe this should significantly improve the performance of working with binary artifacts, which currently uses the slower, native TSCBasic
SHA256
implementation.This change will also be useful for #3023, which will rebase to consolidate its existing conditional use of CryptoKit.