Skip to content

Refactor generation of SHA256 checksums #3116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 8, 2021
Merged

Conversation

mattt
Copy link
Contributor

@mattt mattt commented Dec 15, 2020

Improves performance and convenience of SHA256 checksum generation.

Motivation:

While working on #3023, dependency resolution with package registries was significantly slower than I expected. After a bit of profiling, I determined that a call to SHA256().hash for the downloaded Zip archive of each package release was responsible for the slowdown. Switching to the CryptoKit implementation of SHA256 created improved end-to-end performance from 70s to 15s. (See #3023 (comment))

Modifications:

This PR starts by creating a sha256Checksum property in an extension on ByteString that provides a lowercase hexadecimal representation of the SHA256 hash, using CryptoKit when available (0bd6a54).

Next, it refactors existing usage of SHA256 to use this new property (8512a88, 85fb8da, c92682c).

Finally, it changes the Workspace initializer to upgrade the provided checksumAlgorithm parameter to CryptoKitSHA256 when CryptoKit is available and a default SHA256 value is provided (246cadb).

Result:

With these changes, Swift Package Manager should perform certain tasks somewhat faster. In particular, I believe this should significantly improve the performance of working with binary artifacts, which currently uses the slower, native TSCBasic SHA256 implementation.

This change will also be useful for #3023, which will rebase to consolidate its existing conditional use of CryptoKit.

@mattt
Copy link
Contributor Author

mattt commented Dec 15, 2020

@swift-ci please smoke test

@tomerd
Copy link
Contributor

tomerd commented Dec 15, 2020

this is great!

@tomerd
Copy link
Contributor

tomerd commented Dec 15, 2020

looks like the new integration tests from #3088 are failing unrelated to this PR change (reproduced the same failure locally on main branch). cc @abertelrud @friedbunny

@hartbit
Copy link
Contributor

hartbit commented Dec 15, 2020

@mattt This looks great! But wouldn't it have made more sense to do this in TSCBasic so that other projects that use TSCBasic could also benefit from a faster sha256 when CryptoKit is available? I think that was the idea initially.

@tomerd
Copy link
Contributor

tomerd commented Dec 15, 2020

@mattt this looks great. we are going into a merge freeze for non-critical bug fixes until after the 5.4 branch is created on 1/7 (see https://forums.swift.org/t/swift-5-4-release-process/), so will merge it then

/// This property uses the CryptoKit implementation of
/// Secure Hashing Algorithm 2 (SHA-2) hashing with a 256-bit digest, when available,
/// falling back on a native implementation in Swift provided by TSCBasic.
public var sha256Checksum: String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to return the actual data here, and then the call site can convert it to hex, or base-whatever, etc? The representation seems orthogonal to the computation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the existing call sites, we only call SHA256().hash(bytes) to produce a hexadecimal representation. Returning ByteString here would be more flexible, but that may be unnecessary. Really, it's a matter of taste; I'm happy with any solution that makes it easy to opt-in to a faster hash function when available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to leave this in this PR. I just have a general preference for separating out functionality into orthogonal parts, but since this is already an extension ByteString, it's fine to leave as is.

ByteString came about back before String had a UTF-8 backing, so performance was a problem in going between UTF-8 data and String. My sense is that we could probably transition to String here over time, at which point we might want to revisit this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A big 👍 for transitioning away from ByteString at some point. When I started working on the codebase for registry support, I found the use of TSC / SPM-specific APIs instead of Swift Standard Library and Foundation (and now System) counterparts to be a frequent point of frustration. Many of these APIs were necessary at the time (e.g. JSON before Codable), but have now become technical debt.

mattt added a commit to mattt/swift-package-manager that referenced this pull request Dec 15, 2020
@tomerd tomerd added the next waiting for next merge window label Jan 5, 2021
@tomerd
Copy link
Contributor

tomerd commented Jan 5, 2021

@swift-ci please smoke test macOS Platform

@tomerd
Copy link
Contributor

tomerd commented Jan 5, 2021

@mattt could you plz look at resolving the conflicts with this one

@tomerd tomerd added ready Author believes the PR is ready to be merged & any feedback has been addressed and removed next waiting for next merge window labels Jan 5, 2021
@mattt
Copy link
Contributor Author

mattt commented Jan 7, 2021

@swift-ci please smoke test

if checksumAlgorithm is SHA256, #available(macOS 10.15, *) {
checksumAlgorithm = CryptoKitSHA256()
}
#endif
Copy link
Contributor

@tomerd tomerd Jan 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels a bit hacky... could we change the ctor argument checksumAlgorithm: HashAlgorithm = SHA256() to checksumAlgorithm: HashAlgorithm? = nil and create default instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this version has clearer semantics; the default is SHA256, not nil, and we're upgrading to a faster alternative when available. I also value the consistency with the preceding archiver parameter in the Workspace initializer and the checksumAlgorithm parameter MockWorkspace. But I yield to your preference on this. Here's the equivalent using a nil argument:

public init(
        dataPath: AbsolutePath,
        editablesPath: AbsolutePath,
        pinsFile: AbsolutePath,
        manifestLoader: ManifestLoaderProtocol,
        repositoryManager: RepositoryManager? = nil,
        currentToolsVersion: ToolsVersion = ToolsVersion.currentToolsVersion,
        toolsVersionLoader: ToolsVersionLoaderProtocol = ToolsVersionLoader(),
        delegate: WorkspaceDelegate? = nil,
        config: Workspace.Configuration = Workspace.Configuration(),
        fileSystem: FileSystem = localFileSystem,
        repositoryProvider: RepositoryProvider = GitRepositoryProvider(),
        downloader: Downloader = FoundationDownloader(),
        netrcFilePath: AbsolutePath? = nil,
        archiver: Archiver = ZipArchiver(),
        checksumAlgorithm: HashAlgorithm? = nil,
    ) {
       // ...

        var checksumAlgorithm = checksumAlgorithm ?? SHA256()
        #if canImport(CryptoKit)
        if #available(macOS 10.15, *) {
            checksumAlgorithm = CryptoKitSHA256()
        }
        #endif
        self.checksumAlgorithm = checksumAlgorithm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my point was to try and avoid allocating memory twice. since this is the workspace and it usually only initialized once I guess this is not very critical. lets take the PR as is and we can modify it later if we decide to

@tomerd
Copy link
Contributor

tomerd commented Jan 7, 2021

@swift-ci please smoke test

@tomerd tomerd merged commit ea0b31e into swiftlang:main Jan 8, 2021
@mattt mattt deleted the sha256 branch January 8, 2021 12:22
mattt added a commit to mattt/swift-package-manager that referenced this pull request Jan 30, 2021
Add PackageRegistry module target

Add PackageRegistryTests module target

Add --enable-package-registry command-line option

Cache registry managers by base URL

Set _useLegacyIdentities in SwiftTool initialization

Incorporate review feedback for use of tsc_await

Reuse HTTP clients in registry manager

Remove custom userAgent

This is already provided by HTTPClient

Add internal archiverFactory to RegistryManager for DI

Add queue parameter to RegistryManager.discover

Add diagnosticsEngine parameter to RegistryManager

Replace MockPackageRegistryURLProtocol with HTTPClient handlers

Remove use of tsc_await in RegistryManager.discover

Stash progress

Merge swiftlang#3116

Merge main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready Author believes the PR is ready to be merged & any feedback has been addressed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants