Skip to content

Refactor generation of SHA256 checksums #3116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 8, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions Sources/Basics/ByteString+Extensions.swift
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
This source file is part of the Swift.org open source project
Copyright (c) 2020 Apple Inc. and the Swift project authors
Licensed under Apache License v2.0 with Runtime Library Exception
See http://swift.org/LICENSE.txt for license information
See http://swift.org/CONTRIBUTORS.txt for Swift project authors
*/

import TSCBasic

extension ByteString {
/// A lowercase, hexadecimal representation of the SHA256 hash
/// generated for the byte string's contents.
///
/// This property uses the CryptoKit implementation of
/// Secure Hashing Algorithm 2 (SHA-2) hashing with a 256-bit digest, when available,
/// falling back on a native implementation in Swift provided by TSCBasic.
public var sha256Checksum: String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to return the actual data here, and then the call site can convert it to hex, or base-whatever, etc? The representation seems orthogonal to the computation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the existing call sites, we only call SHA256().hash(bytes) to produce a hexadecimal representation. Returning ByteString here would be more flexible, but that may be unnecessary. Really, it's a matter of taste; I'm happy with any solution that makes it easy to opt-in to a faster hash function when available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to leave this in this PR. I just have a general preference for separating out functionality into orthogonal parts, but since this is already an extension ByteString, it's fine to leave as is.

ByteString came about back before String had a UTF-8 backing, so performance was a problem in going between UTF-8 data and String. My sense is that we could probably transition to String here over time, at which point we might want to revisit this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A big 👍 for transitioning away from ByteString at some point. When I started working on the codebase for registry support, I found the use of TSC / SPM-specific APIs instead of Swift Standard Library and Foundation (and now System) counterparts to be a frequent point of frustration. Many of these APIs were necessary at the time (e.g. JSON before Codable), but have now become technical debt.

#if canImport(CryptoKit)
if #available(macOS 10.15, *) {
return CryptoKitSHA256().hash(self).hexadecimalRepresentation
}
#endif

return SHA256().hash(self).hexadecimalRepresentation
}
}
1 change: 1 addition & 0 deletions Sources/Basics/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
# See http://swift.org/CONTRIBUTORS.txt for Swift project authors

add_library(Basics
ByteString+Extensions.swift
ConcurrencyHelpers.swift
Dictionary+Extensions.swift
DispatchTimeInterval+Extensions.swift
Expand Down
50 changes: 25 additions & 25 deletions Sources/PackageLoading/ManifestLoader.swift
Original file line number Diff line number Diff line change
Expand Up @@ -664,6 +664,7 @@ public final class ManifestLoader: ManifestLoaderProtocol {
let toolsVersion: ToolsVersion
let env: [String: String]
let swiftpmVersion: String
let sha256Checksum: String

init (packageIdentity: PackageIdentity,
manifestPath: AbsolutePath,
Expand All @@ -672,37 +673,38 @@ public final class ManifestLoader: ManifestLoaderProtocol {
swiftpmVersion: String,
fileSystem: FileSystem
) throws {
let manifestContents = try fileSystem.readFileContents(manifestPath).contents
let sha256Checksum = try Self.computeSHA256Checksum(packageIdentity: packageIdentity, manifestContents: manifestContents, toolsVersion: toolsVersion, env: env, swiftpmVersion: swiftpmVersion)

self.packageIdentity = packageIdentity
self.manifestPath = manifestPath
self.manifestContents = try fileSystem.readFileContents(manifestPath).contents
self.manifestContents = manifestContents
self.toolsVersion = toolsVersion
self.env = env
self.swiftpmVersion = swiftpmVersion
self.sha256Checksum = sha256Checksum
}

// TODO: is there a way to avoid the dual hashing?
func hash(into hasher: inout Hasher) {
hasher.combine(self.packageIdentity)
hasher.combine(self.manifestContents)
hasher.combine(self.toolsVersion.description)
for key in self.env.keys.sorted(by: >) {
hasher.combine(key)
hasher.combine(env[key]!) // forced unwrap safe
}
hasher.combine(self.swiftpmVersion)
hasher.combine(self.sha256Checksum)
}

// TODO: is there a way to avoid the dual hashing?
func computeHash() throws -> ByteString {
private static func computeSHA256Checksum(
packageIdentity: PackageIdentity,
manifestContents: [UInt8],
toolsVersion: ToolsVersion,
env: [String: String],
swiftpmVersion: String
) throws -> String {
let stream = BufferedOutputByteStream()
stream <<< self.packageIdentity
stream <<< self.manifestContents
stream <<< self.toolsVersion.description
for key in self.env.keys.sorted(by: >) {
stream <<< key <<< env[key]! // forced unwrap safe
stream <<< packageIdentity
stream <<< manifestContents
stream <<< toolsVersion.description
for key in env.keys.sorted(by: >) {
stream <<< key <<< env[key]! // forced unwrap safe
}
stream <<< self.swiftpmVersion
return SHA256().hash(stream.bytes)
stream <<< swiftpmVersion
return stream.bytes.sha256Checksum
}
}

Expand Down Expand Up @@ -849,7 +851,7 @@ public final class ManifestLoader: ManifestLoaderProtocol {
if compilerResult.exitStatus != .terminated(code: 0) {
return
}

// Pass an open file descriptor of a file to which the JSON representation of the manifest will be written.
let jsonOutputFile = tmpDir.appending(component: "\(packageIdentity)-output.json")
guard let jsonOutputFileDesc = fopen(jsonOutputFile.pathString, "w") else {
Expand Down Expand Up @@ -1135,12 +1137,11 @@ private final class SQLiteManifestCache: Closable {
}

func put(key: ManifestLoader.ManifestCacheKey, manifest: ManifestLoader.ManifestParseResult) throws {
let query = "INSERT OR IGNORE INTO MANIFEST_CACHE VALUES (?, ?);"
let query = "INSERT OR IGNORE INTO MANIFEST_CACHE VALUES (?, ?);"
try self.executeStatement(query) { statement -> Void in
let keyHash = try key.computeHash()
let data = try self.jsonEncoder.encode(manifest)
let bindings: [SQLite.SQLiteValue] = [
.string(keyHash.hexadecimalRepresentation),
.string(key.sha256Checksum),
.blob(data),
]
try statement.bind(bindings)
Expand All @@ -1151,8 +1152,7 @@ private final class SQLiteManifestCache: Closable {
func get(key: ManifestLoader.ManifestCacheKey) throws -> ManifestLoader.ManifestParseResult? {
let query = "SELECT value FROM MANIFEST_CACHE WHERE key == ? LIMIT 1;"
return try self.executeStatement(query) { statement -> ManifestLoader.ManifestParseResult? in
let keyHash = try key.computeHash()
try statement.bind([.string(keyHash.hexadecimalRepresentation)])
try statement.bind([.string(key.sha256Checksum)])
let data = try statement.step()?.blob(at: 0)
return try data.flatMap {
try self.jsonDecoder.decode(ManifestLoader.ManifestParseResult.self, from: $0)
Expand Down
5 changes: 2 additions & 3 deletions Sources/SourceControl/Repository.swift
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,9 @@ public struct RepositorySpecifier: Hashable, Codable {
/// unique for each repository.
public var fileSystemIdentifier: String {
// Use first 8 chars of a stable hash.
let hash = SHA256().hash(url).hexadecimalRepresentation
let suffix = hash.dropLast(hash.count - 8)
let suffix = ByteString(encodingAsUTF8: url).sha256Checksum.prefix(8)

return basename + "-" + suffix
return "\(basename)-\(suffix)"
}

/// Returns the cleaned basename for the specifier.
Expand Down
7 changes: 7 additions & 0 deletions Sources/Workspace/Workspace.swift
Original file line number Diff line number Diff line change
Expand Up @@ -303,6 +303,13 @@ public class Workspace {
self.downloader = downloader
self.netrcFilePath = netrcFilePath
self.archiver = archiver

var checksumAlgorithm = checksumAlgorithm
#if canImport(CryptoKit)
if checksumAlgorithm is SHA256, #available(macOS 10.15, *) {
checksumAlgorithm = CryptoKitSHA256()
}
#endif
Copy link
Contributor

@tomerd tomerd Jan 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels a bit hacky... could we change the ctor argument checksumAlgorithm: HashAlgorithm = SHA256() to checksumAlgorithm: HashAlgorithm? = nil and create default instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this version has clearer semantics; the default is SHA256, not nil, and we're upgrading to a faster alternative when available. I also value the consistency with the preceding archiver parameter in the Workspace initializer and the checksumAlgorithm parameter MockWorkspace. But I yield to your preference on this. Here's the equivalent using a nil argument:

public init(
        dataPath: AbsolutePath,
        editablesPath: AbsolutePath,
        pinsFile: AbsolutePath,
        manifestLoader: ManifestLoaderProtocol,
        repositoryManager: RepositoryManager? = nil,
        currentToolsVersion: ToolsVersion = ToolsVersion.currentToolsVersion,
        toolsVersionLoader: ToolsVersionLoaderProtocol = ToolsVersionLoader(),
        delegate: WorkspaceDelegate? = nil,
        config: Workspace.Configuration = Workspace.Configuration(),
        fileSystem: FileSystem = localFileSystem,
        repositoryProvider: RepositoryProvider = GitRepositoryProvider(),
        downloader: Downloader = FoundationDownloader(),
        netrcFilePath: AbsolutePath? = nil,
        archiver: Archiver = ZipArchiver(),
        checksumAlgorithm: HashAlgorithm? = nil,
    ) {
       // ...

        var checksumAlgorithm = checksumAlgorithm ?? SHA256()
        #if canImport(CryptoKit)
        if #available(macOS 10.15, *) {
            checksumAlgorithm = CryptoKitSHA256()
        }
        #endif
        self.checksumAlgorithm = checksumAlgorithm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my point was to try and avoid allocating memory twice. since this is the workspace and it usually only initialized once I guess this is not very critical. lets take the PR as is and we can modify it later if we decide to

self.checksumAlgorithm = checksumAlgorithm
self.isResolverPrefetchingEnabled = isResolverPrefetchingEnabled
self.skipUpdate = skipUpdate
Expand Down
2 changes: 1 addition & 1 deletion Sources/XCBuildSupport/PIF.swift
Original file line number Diff line number Diff line change
Expand Up @@ -1205,7 +1205,7 @@ extension PIF {
func sign<T: PIFSignableObject & Encodable>(_ obj: T) throws {
let signatureContent = try encoder.encode(obj)
let bytes = ByteString(signatureContent)
obj.signature = SHA256().hash(bytes).hexadecimalRepresentation
obj.signature = bytes.sha256Checksum
}

let projects = workspace.projects
Expand Down
23 changes: 23 additions & 0 deletions Tests/BasicsTests/ByteStringExtensionsTests.swift
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
/*
This source file is part of the Swift.org open source project

Copyright (c) 2020 Apple Inc. and the Swift project authors
Licensed under Apache License v2.0 with Runtime Library Exception

See http://swift.org/LICENSE.txt for license information
See http://swift.org/CONTRIBUTORS.txt for Swift project authors
*/

import Basics
import TSCBasic
import XCTest

final class ByteStringExtensionsTests: XCTestCase {
func testSHA256Checksum() {
let byteString = ByteString(encodingAsUTF8: "abc")
XCTAssertEqual(byteString.contents, [0x61, 0x62, 0x63])

// See https://csrc.nist.gov/csrc/media/projects/cryptographic-standards-and-guidelines/documents/examples/sha_all.pdf
XCTAssertEqual(byteString.sha256Checksum, "ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad")
}
}