Skip to content

stdlib: excise the FP16 support routines on x64 #62988

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

compnerd
Copy link
Member

Bump the minimum supported CPU to ~Nahelem so that we can take advantage of the F16C/CVT16 extensions to support half floating point rounding. This should avoid the need to replicate the compiler-rt functionality in the runtime and the associated problems with linking compiler-rt builtins on non-Windows targets when performing static linking.

@compnerd
Copy link
Member Author

compnerd commented Jan 12, 2023

@compnerd
Copy link
Member Author

CC: @rjmccall

@stephentyrone
Copy link
Contributor

This doesn’t actually solve the problem on its own; F16C provides conversions float <-> float16, but the compiler will still generate runtime calls for conversions to/from double and float80. I’ve worked around these in the stdlib for the conversions that were missing in macOS’s compiler-rt, but there are some others IIRC.

@compnerd
Copy link
Member Author

@stephentyrone - that is good to know! Do you have an example of that? I am hitting another issue in the compiler atm, so I haven't gotten to that point yet. At least via testing through clang, it appeared that the truncation and extension would get lowered to F16C.

@compnerd
Copy link
Member Author

@swift-ci please test Windows platform

@shahmishal
Copy link
Member

@swift-ci test

@stephentyrone
Copy link
Contributor

@compnerd here's an example of a runtime call generated even with f16c (by clang): https://godbolt.org/z/6b7513on6

@compnerd
Copy link
Member Author

compnerd commented Jan 12, 2023

Interesting; we should definitely add a test case for that - https://godbolt.org/z/nbGjYG1bs - doesn't generate the libcall on Windows as per the ABI. Seems like we will need to ensure that the builtins are linked on non-Windows targets.

@stephentyrone
Copy link
Contributor

stephentyrone commented Jan 12, 2023

Here's one where you generate a call on Windows: https://godbolt.org/z/qsYqTvb1c

@stephentyrone
Copy link
Contributor

I can work around that one in the stdlib for you, but more generally we need to be able to use the arithmetic builtins from the just-built compiler-rt, so making that work is the long-term solution we really want.

@compnerd
Copy link
Member Author

Most of the math routines shouldn't be forming libcalls on Windows. FP16 operations are odd since there is no FP16 on Windows AFAIK. We should be forcing LLVM to lower that properly in the longer term rather than having to workaround it at the library level.

@stephentyrone
Copy link
Contributor

stephentyrone commented Jan 12, 2023

I'm not sure what "forcing LLVM to lower that properly" means other than generating a libcall; that's the proper lowering. We don't want to unconditionally expand everything that's a libcall on any platform inline.

@compnerd
Copy link
Member Author

I was thinking that it should avoid the libcall on Windows specifically - as the OS doesn't expect to link compiler-rt ever (it should be similar to what MSVC does).

However, a trophy for you - you correctly identified that __truncdfhf2 is needed after the X86 CodeGen issue is resolved.

@compnerd
Copy link
Member Author

Please test with following PRs:
swiftlang/llvm-project#5995

@swift-ci please test

@compnerd
Copy link
Member Author

Please test with following PRs:
swiftlang/llvm-project#5995

@swift-ci please test Linux platform

@compnerd
Copy link
Member Author

@swift-ci please test Windows platform

@@ -41,4 +41,4 @@ bb0:
// the order of features differs.

// X86_64: define{{( protected)?}} swiftcc void @baz{{.*}}#0
// X86_64: #0 = {{.*}}"target-features"="+cx16,+cx8,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+ssse3,+x87"
// X86_64: #0 = {{.*}}"target-features"="+avx,+crc32,+cx16,+cx8,+f16c,+fxsr,+mmx,+popcnt,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're definitely not OK with bumping the baseline requirement for swift on x86 by ~4 years without a language workgroup and/or core team discussion. I think we should really try to fix this in a different manner.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that the Language WG should be involved in that - I think that @airspeedswift was relayed this though, and @rjmccall was also CC'ed for that.

Copy link
Member Author

@compnerd compnerd Jan 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should also mention that the f16c removes the __extendhfxf2 not the __truncsfhf2 and that the latter is something that I will restore for the time being. I think that one option might be to add __extendhfxf2 to the support routines (Linux specific - FP80)

@compnerd
Copy link
Member Author

-- Testing: 8865 tests, 36 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 

3 warning(s) in tests

Testing Time: 390.24s
  Unsupported      : 2104
  Passed           : 6884
  Expectedly Failed:   51
Test project T:/3
      Start  1: dispatch_apply
 1/22 Test  #1: dispatch_apply ...................   Passed    0.18 sec
      Start  2: dispatch_api
 2/22 Test  #2: dispatch_api .....................   Passed    0.14 sec
      Start  3: dispatch_debug
 3/22 Test  #3: dispatch_debug ...................   Passed    0.14 sec
      Start  4: dispatch_queue_finalizer
 4/22 Test  #4: dispatch_queue_finalizer .........   Passed    1.14 sec
      Start  5: dispatch_overcommit
 5/22 Test  #5: dispatch_overcommit ..............   Passed    0.15 sec
      Start  6: dispatch_context_for_key
 6/22 Test  #6: dispatch_context_for_key .........   Passed    0.14 sec
      Start  7: dispatch_after
 7/22 Test  #7: dispatch_after ...................   Passed    9.16 sec
      Start  8: dispatch_timer
 8/22 Test  #8: dispatch_timer ...................   Passed    2.15 sec
      Start  9: dispatch_timer_short
 9/22 Test  #9: dispatch_timer_short .............   Passed    1.40 sec
      Start 10: dispatch_timer_timeout
10/22 Test #10: dispatch_timer_timeout ...........   Passed    6.15 sec
      Start 11: dispatch_sema
11/22 Test #11: dispatch_sema ....................   Passed    0.22 sec
      Start 12: dispatch_timer_bit31
12/22 Test #12: dispatch_timer_bit31 .............   Passed    2.30 sec
      Start 13: dispatch_timer_bit63
13/22 Test #13: dispatch_timer_bit63 .............   Passed    1.15 sec
      Start 14: dispatch_timer_set_time
14/22 Test #14: dispatch_timer_set_time ..........   Passed    2.45 sec
      Start 15: dispatch_data
15/22 Test #15: dispatch_data ....................   Passed    0.15 sec
      Start 16: dispatch_io_muxed
16/22 Test #16: dispatch_io_muxed ................   Passed    1.36 sec
      Start 17: dispatch_io_net
17/22 Test #17: dispatch_io_net ..................   Passed    0.47 sec
      Start 18: dispatch_io_pipe
18/22 Test #18: dispatch_io_pipe .................   Passed   10.17 sec
      Start 19: dispatch_io_pipe_close
19/22 Test #19: dispatch_io_pipe_close ...........   Passed    0.16 sec
      Start 20: dispatch_select
20/22 Test #20: dispatch_select ..................   Passed    0.22 sec
      Start 21: dispatch_c99
21/22 Test #21: dispatch_c99 .....................   Passed    0.15 sec
      Start 22: dispatch_plusplus
22/22 Test #22: dispatch_plusplus ................   Passed    0.15 sec

100% tests passed, 0 tests failed out of 22

Total Test time (real) =  39.81 sec
The following tests FAILED:
	1281 - TestFoundation.TestNSString-test_NSHomeDirectoryForUser (Failed)
	1283 - TestFoundation.TestNSString-test_expandingTildeInPath (Failed)
	1347 - TestFoundation.TestProcess-test_multiProcesses (Failed)
	1377 - TestFoundation.TestURL-test_URLByResolvingSymlinksInPathShouldUseTheCurrentDirectory (Failed)
	1378 - TestFoundation.TestURL-test_resolvingSymlinksInPathShouldAppendTrailingSlashWhenExistingDirectory (Failed)
	1379 - TestFoundation.TestURL-test_resolvingSymlinksInPathShouldResolveSymlinks (Failed)
[0/1][  0%][0.000s] Running XCTest functional test suite
-- Testing: 26 tests, 26 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 

Testing Time: 24.11s
  Passed: 26

The Foundation test failures are intriguing as locally, I am seeing only 2 failures. But seems that Swift, libdispatch, and XCTest are good. The Foundation test failures seem to match the current ones in CI, so I think that Windows is still looking good assuming we figure out the FP16 issue.

@compnerd
Copy link
Member Author

swift-corelibs-foundation\Tests\Foundation\Tests\TestNSString.swift:1009: error: TestNSString.test_NSHomeDirectoryForUser :
  XCTAssertEqual failed: ("Optional("C:/Users/swift-ci")") is not equal to ("nil") - 

swift-corelibs-foundation\Tests\Foundation\Tests\TestNSString.swift:1036: error: TestNSString.test_expandingTildeInPath :
  XCTAssertEqual failed: ("~swift-ci") is not equal to ("C:/Users/swift-ci") - Could resolve home directory for specific user

swift-corelibs-foundation\Tests\Foundation\Tests\TestProcess.swift:803: error: TestProcess.test_multiProcesses :
  XCTAssertEqual failed: ("T:/4/TestFoundation.app") is not equal to ("C:/Users/swift-ci/jenkins/workspace/swift-PR-build-toolchain-windows/build/4/TestFoundation.app") - 

swift-corelibs-foundation\Tests\Foundation\Tests\TestURL.swift:562: error: TestURL.test_URLByResolvingSymlinksInPathShouldUseTheCurrentDirectory :
  XCTAssertEqual failed: ("file:///C:/Users/swift-ci/jenkins/workspace/swift-PR-build-toolchain-windows/build/tmp/org.swift.TestFoundation.TestURL.resourceValues.464/foo/bar/baz") is not equal to ("file:///T:/tmp/org.swift.TestFoundation.TestURL.resourceValues.464/foo/bar/baz") - 

swift-corelibs-foundation\Tests\Foundation\Tests\TestURL.swift:576: error: TestURL.test_resolvingSymlinksInPathShouldAppendTrailingSlashWhenExistingDirectory :
  XCTAssertEqual failed: ("file:///C:/Users/swift-ci/jenkins/workspace/swift-PR-build-toolchain-windows/build/tmp/org.swift.TestFoundation.TestURL.resourceValues.7768/") is not equal to ("file:///T:/tmp/org.swift.TestFoundation.TestURL.resourceValues.7768/") - 

swift-corelibs-foundation\Tests\Foundation\Tests\TestURL.swift:591: error: TestURL.test_resolvingSymlinksInPathShouldResolveSymlinks :
  XCTAssertEqual failed: ("file:///C:/Users/swift-ci/jenkins/workspace/swift-PR-build-toolchain-windows/build/tmp/org.swift.TestFoundation.TestURL.resourceValues.5012/destination") is not equal to ("file:///T:/tmp/org.swift.TestFoundation.TestURL.resourceValues.5012/destination") - 

Some text editing with the actual failures here indicates that the failures are the two that I see locally, and 4 related to symlink handling where the expectation doesn't match due to the build being on a different drive than the source.

@compnerd
Copy link
Member Author

@swift-ci please build toolchain Windows platform

@bnbarham
Copy link
Contributor

@swift-ci please test Linux platform

@compnerd
Copy link
Member Author

@swift-ci please test macOS platform

Bump the minimum supported CPU to IvyBridge so that we can take
advantage of the F16C/CVT16 extensions to support half floating point
rounding.  This avoids the undefined reference to `__extendsfhf2` in the
FP16 support.
@compnerd
Copy link
Member Author

@swift-ci please test

Add a simple special case implementation of the FP16 routines for the FP80 extension.

Implementation by @al45tair!
@compnerd
Copy link
Member Author

@swift-ci please test

@compnerd
Copy link
Member Author

@swift-ci please test

@compnerd
Copy link
Member Author

compnerd commented Feb 1, 2023

@swift-ci please test

@compnerd
Copy link
Member Author

compnerd commented Feb 1, 2023

@swift-ci please test

@compnerd
Copy link
Member Author

compnerd commented Feb 1, 2023

@swift-ci please test Linux platform

@compnerd
Copy link
Member Author

compnerd commented Feb 1, 2023

@swift-ci please test Linux platform

1 similar comment
@bnbarham
Copy link
Contributor

bnbarham commented Feb 2, 2023

@swift-ci please test Linux platform

@compnerd
Copy link
Member Author

compnerd commented Feb 2, 2023

@swift-ci please test Linux platform

@compnerd
Copy link
Member Author

compnerd commented Feb 3, 2023

I think that this is no longer needed!

@compnerd compnerd closed this Feb 3, 2023
@compnerd compnerd deleted the cvt16 branch February 3, 2023 18:44
@lin72h
Copy link

lin72h commented Feb 4, 2023

I've been watching the progress on it, can you elaborate more on the reason?

@etcwilde
Copy link
Member

etcwilde commented Feb 5, 2023

@lin72h I've been watching the progress on it, can you elaborate more on the reason?

@al45tair fixed it without requiring bumping the minimum CPU version in 125e54f and df1891e. Not having to bump the minimum processor version is desirable.

@lin72h
Copy link

lin72h commented Feb 6, 2023

@etcwilde perfect, Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants