Skip to content

Add missing Builtin vector functions to SIMD with FloatingPoint #78744

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kntkymt
Copy link
Contributor

@kntkymt kntkymt commented Jan 19, 2025

Motivation

SIMDn where Scalar is FloatingPoint don't have concrete builtin vector operations that FixedWidthInteger have. that means currently SIMDn<Float>/SIMDn<Double> just expand operations into for loop.

Since one of the objectives of SIMD is providing api for vector programing (mentioned in SE-0229 SIMD), It would be better to provide concrete builtin vector operations for FloatingPoint as well.

(note that: auto-vectorization by LLVM backend may convert loop into vector instruction but not guaranteed.)

In this PR, I added builtin vector operations (+,-, *, /) to SIMDn where Scalar is FloatingPoint.

Asm outputs

let a: SIMD64<Float> = [1, 2, 3, 4]
let b: SIMD64<Float> = [5, 6, 7, 8]
let c = a + b

I can see fadd.4s (which is a vector instruction for float on arm64) in asm output

Lloh13:
	add	x8, x8, _$s6ForASM1bs6SIMD64VySfGvp@PAGEOFF
	stp	q0, q1, [x8, #192]
	ldp	q0, q1, [sp, #768]
	stp	q0, q1, [x8, #224]
	ldp	q0, q1, [sp, #672]
	stp	q0, q1, [x8, #128]
	ldp	q0, q1, [sp, #704]
	stp	q0, q1, [x8, #160]
	ldp	q0, q1, [sp, #608]
	stp	q0, q1, [x8, #64]
	ldp	q0, q1, [sp, #640]
	stp	q0, q1, [x8, #96]
	ldp	q0, q1, [sp, #544]
	stp	q0, q1, [x8]
	ldp	q0, q1, [sp, #576]
	stp	q0, q1, [x8, #32]
	ldr	q0, [sp, #544]
	ldp	q1, q2, [sp, #32]
	fadd.4s	v0, v1, v0
	ldr	q1, [sp, #560]

full output

benchmarks

@discardableResult
func benchmark(samples: UInt64, operation:  () -> Void) -> (min: Duration, max: Duration, median: Duration, average: Duration) {
    var results: [Duration] = []
    for _ in 0..<samples {
        let start = ContinuousClock.now
        operation()
        let end = ContinuousClock.now

        results.append(end - start)
    }

    let sorted = results.sorted(by: <)
    return (sorted.first!, sorted.last!, sorted[Int(samples) / 2], results.reduce(.zero, +) / samples)
}

func benchmark(for function: (SIMD64<Float>, SIMD64<Float>) -> SIMD64<Float>, label: String) {
    let a: SIMD64<Float> = .random(in: 1..<1000)
    let b: SIMD64<Float> = .random(in: 1..<1000)

    let result = benchmark(samples: 1_000_000) {
        _ = function(a, b)
    }
    print("\(label): \(result)")
}
benchmark(for: +, label: "add")
benchmark(for: -, label: "sub")
benchmark(for: *, label: "mul")
benchmark(for: /, label: "div")

-Onone: builtin operations are around 10x faster than current implementation

// swift-driver version: 1.115 Apple Swift version 6.0 (swiftlang-6.0.0.9.10 clang-1600.0.26.2)
// Target: arm64-apple-macosx14.0
// swift main.swift
add: (min: 1.208e-06 seconds, max: 0.000196375 seconds, median: 1.333e-06 seconds, average: 1.369681377e-06 seconds)
sub: (min: 1.166e-06 seconds, max: 0.00030225 seconds, median: 1.333e-06 seconds, average: 1.34110289e-06 seconds)
mul: (min: 1.208e-06 seconds, max: 0.000231125 seconds, median: 1.333e-06 seconds, average: 1.344088114e-06 seconds)
div: (min: 1.208e-06 seconds, max: 7.8709e-05 seconds, median: 1.333e-06 seconds, average: 1.333120559e-06 seconds)

// swift-project/build/Ninja-RelWithDebInfoAssert/swift-macosx-arm64/bin/swift main.swift
add: (min: 8.3e-08 seconds, max: 0.00022325 seconds, median: 1.67e-07 seconds, average: 1.93475469e-07 seconds)
sub: (min: 8.3e-08 seconds, max: 0.00100575 seconds, median: 1.67e-07 seconds, average: 1.8926893e-07 seconds)
mul: (min: 8.3e-08 seconds, max: 0.000154084 seconds, median: 1.67e-07 seconds, average: 1.8895175e-07 seconds)
div: (min: 8.3e-08 seconds, max: 0.000333 seconds, median: 1.67e-07 seconds, average: 1.90077136e-07 seconds)

-O: almost same (or builtin operations are bit faster) at this sample

// swift -O main.swift
add: (min: 4.2e-08 seconds, max: 5.7042e-05 seconds, median: 1.67e-07 seconds, average: 1.63937871e-07 seconds)
sub: (min: 4.1e-08 seconds, max: 0.001872084 seconds, median: 1.67e-07 seconds, average: 1.68241621e-07 seconds)
mul: (min: 4.1e-08 seconds, max: 0.000124084 seconds, median: 1.67e-07 seconds, average: 1.6312866e-07 seconds)
div: (min: 8.3e-08 seconds, max: 0.001913042 seconds, median: 1.67e-07 seconds, average: 1.75056209e-07 seconds)

// swift-project/build/Ninja-RelWithDebInfoAssert/swift-macosx-arm64/bin/swift -O main.swift
add: (min: 4.2e-08 seconds, max: 3.475e-05 seconds, median: 1.67e-07 seconds, average: 1.60648905e-07 seconds)
sub: (min: 8.3e-08 seconds, max: 2.125e-05 seconds, median: 1.67e-07 seconds, average: 1.60266289e-07 seconds)
mul: (min: 4.2e-08 seconds, max: 1.8e-05 seconds, median: 1.67e-07 seconds, average: 1.58766867e-07 seconds)
div: (min: 4.1e-08 seconds, max: 3.5e-05 seconds, median: 1.67e-07 seconds, average: 1.6083846e-07 seconds)

Error on AutoDiff (need help 🙏 )

by this PR, test/AutoDiff/stdlib/simd.swift will fail due to issue that derivative does not yet support alwaysEmitIntoClient. I have no idea for fixing this issue 😓 . How can I move forward?

error log

[edited] I disabled these test at this moment.

@kntkymt kntkymt marked this pull request as ready for review January 19, 2025 17:02
@kntkymt kntkymt requested a review from a team as a code owner January 19, 2025 17:02
@asl
Copy link
Contributor

asl commented Jan 20, 2025

Tagging @kovdan01

Actually he's currently working on #54445 so this might be resolved soon

@kntkymt
Copy link
Contributor Author

kntkymt commented Jan 21, 2025

@asl @kovdan01
Thanks for your reply, Wow! that's amazing news 👀 I'll wait a while 👍

@kovdan01
Copy link
Contributor

@kntkymt This particular error which you are facing is actually expected: if we try to differentiate a function defined in a different file, this function should either be explicitly marked as @differentiable or have an explicit derivative defined. This particular error is not related to @_alwaysEmitIntoClient - if you delete these from your code, you'll still see the error. In order to get rid of the error, custom derivatives should be defined in stdlib/public/Differentiation/SIMDDifferentiation.swift.gyb.

As for issues related to @_alwaysEmitIntoClient - these should be resolved by #78908. Particularly, it allows defining custom derivatives for @_alwaysEmitIntoClient functions, and defining such derivatives is a way to get rid of the error you are facing. Without #78908, defining such a derivative would lead to linker error or compiler crash (depending on some circumstances).

So, I see 2 ways here.

  1. If the functions you introduced must be @_alwaysEmitIntoClient, you need to apply [AutoDiff] Support custom derivatives for @_alwaysEmitIntoClient functions #78908 locally (or wait until it's merged) and define custom derivatives for newly introduced functions in stdlib/public/Differentiation/SIMDDifferentiation.swift.gyb (and mark these derivatives as @_alwaysEmitIntoClient as well).

  2. If @_alwaysEmitIntoClient is not a hard restriction, you can omit it as for now, and define custom derivative w/o waiting for [AutoDiff] Support custom derivatives for @_alwaysEmitIntoClient functions #78908. After that gets merged, newly introduced functions and there derivatives could be updated to become @_alwaysEmitIntoClient.

@kntkymt
Copy link
Contributor Author

kntkymt commented Jan 27, 2025

@kovdan01 Thanks for detailed explanation 🙇

In order to get rid of the error, custom derivatives should be defined in stdlib/public/Differentiation/SIMDDifferentiation.swift.gyb.

understood.

  1. If the functions you introduced must be @_alwaysEmitIntoClient, you need to apply [AutoDiff] Support custom derivatives for @_alwaysEmitIntoClient functions #78908 locally (or wait until it's merged) and define custom derivatives for newly introduced functions in stdlib/public/Differentiation/SIMDDifferentiation.swift.gyb (and mark these derivatives as @_alwaysEmitIntoClient as well).

let me go with 1. I will add derivatives implementations in stdlib/public/Differentiation/SIMDDifferentiation.swift.gyb. after #78908 merged since it's not a rush.

@kntkymt
Copy link
Contributor Author

kntkymt commented Jan 29, 2025

let me convert this PR into draft until I address AutoDiff issue as we mentioned above, thanks!

@kntkymt kntkymt marked this pull request as draft January 29, 2025 02:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants