-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[embedded] Add Set to the embedded stdlib #70217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci please test |
@@ -275,3 +275,10 @@ public func swift_deletedMethodError() -> Never { | |||
public func swift_willThrow() throws { | |||
} | |||
|
|||
@_silgen_name("arc4random_buf") | |||
public func arc4random_buf(buf: UnsafeMutableRawPointer, nbytes: Int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can expect this to be on all embedded systems. Certainly AVR won't have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, and I think the implication is that if you can't provide arc4random_buf
then you cannot use Set or anything else that needs to compute hashes of values. Is that fair?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this doesn't create a direct dependency. Only when you use something that needs it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im not sure that works that way currently for package based builds; are you saying when we have LTO enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, even without LTO. Embedded Swift in the compiler causes all dependent functions/types/libraries to be brought in lazily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compiler causes all dependent functions/types/libraries to be brought in lazily.
Hmm then I should track down that bug it seems...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be marked @_extern(c, "arc4random_buf") instead.
Ooh, a new fancy attribute, perfect!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requirements for the hash seed are that
- it should be generated at process startup (or at least before/during the first time a hash table is initialized)
- it must not change after the first hash table is created
- it needs to be generated by a strong random number generator -- as strong as practical on the target system. It must not be feasible to predict what the seed will be, not even if the process starts on a freshly booted system with no memory of any previous executions.
If it is not possible to choose a good hash seed, then it likely won't be a good idea to allow the use any hashed collection type. These are inherently probabilistic data structures -- they deeply depend on the strength of the hash function, and they become a security liability if we cannot deliver on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be declared public
? Esp. as we have swift_stdlib_random
immediately below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't need to be, dropped 'public'.
@@ -251,6 +251,16 @@ extension Hasher { | |||
} | |||
} | |||
|
|||
#if $Embedded | |||
@usableFromInline | |||
var _swift_stdlib_Hashing_parameters: _SwiftHashingParameters = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably more for a embedded firmware toolchain maintainer to implement - the seed likely can be sourced from some sort of hardware support for random generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that the user / the toolchain maintainer handles that in arc4random_buf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: one consequence is that if you declare any reference type you then MUST implement arc4_random because you then support classes which require Hashable transitively
also IIUC this applies to enums with rawValues e.g. enum Foo: UInt8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not accurate -- see https://github.com/apple/swift/pull/70344/files#diff-1289fc8d34bfbbd931d86529683eb2b8f54c9c5d3d6b3f6111fa192d6650e71a where a test shows that even if you declare classes/subclasses, that alone doesn't trigger dependency on Hashable / arc4_random. Only if you actually ask for a hash of something (directly, or by using a Set) do you get the dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Kuba is right here -- hashing is not a baseline thing for native Swift classes. (It is part of the NSObject contract, but native Swift classes do not need to opt into that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turning this into a Swift variable is somewhat risky -- global variables in Swift are subject to concurrency-related constraints that may be overkill for this use case.
The seed is used every time we hash anything, so it ought to be trivially accessible, preferably with absolutely no synchronization overhead (or any branches, even).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constraints you're talking about are essentially a swift_once wrapper, correct? And the concern with that is about performance, but not correctness, right?
If so, I'm inclined to use this mechanism for its simplicity, and leave the performance concern as a future/followup improvement. Unconditionally initializing the hash seed won't be great for embedded Swift users who don't use hashing.
@@ -127,6 +127,22 @@ internal class __EmptySetSingleton: __RawSetStorage { | |||
#endif | |||
} | |||
|
|||
#if $Embedded | |||
public var _swiftEmptySetSingleton: (Int, Int, Int, Int, UInt8, UInt8, UInt16, UInt32, Int, Int, Int) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoa? is this even safe to do like this?!
if so we should consider init vectors to be done like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that only homogeneous tuples are guaranteed to be layout-stable… Do we have a guarantee about how this will be laid out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is paired to the same source that defines that - so ostensibly we could have a unit test that validates the expected layout and usage of such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, why is the original C-based solution not suitable to use in embedded platforms?
Making this a Swift var
is going to be risky even if we can somehow assume that tuple layout will match -- the empty singleton ought not be defined in writable memory. (It invariably leads to someone figuring out how to insert something into the empty singleton, leading to a really "fun" debugging session for somebody else. 😛)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because in embedded Swift, the whole stdlib is only compiled to SIL and distributed as a .swiftmodule, and there isn't any .a or .o with actual code produced. Only the client's / app's build actually IRGen code, including the stdlib code. This lets us allow users to specify their actual LLVM ABI and IRGen flags without the need to have a matching stdlib build (this is a desirable property because embedded hardware/environments/SDKs have tons of actively used combinations of ABI settings). Essentially a similar outcome that you get with "header-only" C/C++ libraries.
Flip side is that we can't have C/C++ code in the embedded stdlib. The two concerns mentioned (layout guarantees, const-ness / non-writable memory) seem manageable to me, I'll try to address them separately afterwards (by adding tests checking the layout, and by exploring IRGen flags or other options that would make the singleton variables constant), if that's okay with you.
Added a comment explaining this in the meantime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have another (though weaker) point to make: We should start considering moving parts of the regular Swift runtime from C to Swift, and the embedded Swift runtime can serve as a nice proving ground for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re. non-writable memory: AFAICT, the current regular definitions of the set/array singletons are also not in non-writable memory, and they are technically in writable memory. (Sounds like we should fix that there too.)
// Finally, take a slow path through the standard library to see if the | ||
// current environment can accept a larger stack allocation. | ||
guard #available(macOS 12.3, iOS 15.4, watchOS 8.5, tvOS 15.4, *) //SwiftStdlib 5.6 | ||
else { | ||
return false | ||
} | ||
return swift_stdlib_isStackAllocationSafe(byteCount, alignment) | ||
#else | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be always true? 'cause else wise it forces the temp allocations to be heap based
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See a few lines above -- there's a hardcoded limit of 1024, under which stack allocations are always allowed. Returning false here means allocations over 1024 move to the heap. That seems pretty reasonable to me, though of course 1024 is a very arbitrary limit...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well in a number of embedded systems the heap and stack are from the same ram segment so eh 6 one way 1/2 a dozen the other I guess...
@swift-ci please test |
633d27b
to
490ffe2
Compare
@swift-ci please test |
@swift-ci please test |
rawValue = _bridgeObject(taggingPayload: taggedPayload) | ||
#else | ||
rawValue = Builtin.reinterpretCast(taggedPayload) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I believe directly casting between reference types and trivial types is undefined behavior. If we need to keep this type enabled, then this reinterpret cast probably ought to be replaced with:
rawValue = Builtin.reinterpretCast(taggedPayload) | |
let p = UnsafeRawPointer(bitPattern: taggedPayload) | |
self.rawValue = Unmanaged.fromOpaque(p).takeUnretainedValue() |
(The takeUnretainedValue()
will emit a (depening on the value, probably, but not necessarily) noop call to swift_retain.)
However, the notion of initializing a regular strong reference from an arbitrary bit pattern is bogus; we should not do it, ever. The only users of this entry point are _modify
accessors and the like, where it is used to generate dummy instances. Instead of using a null pointer for that purpose, we need to start using the immortal empty singleton object we already have!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I think I managed to get rid of this initializer and of the sketchy casts -- would you mind taking another look now?
@@ -31,9 +31,15 @@ internal struct _BridgeStorage<NativeClass: AnyObject> { | |||
|
|||
// rawValue is passed inout to _isUnique. Although its value | |||
// is unchanged, it must appear mutable to the optimizer. | |||
#if !$Embedded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of cutting out specific parts of this, we should instead #if
out the entire _BridgeStorage
type -- there is no ObjC runtime in embedded Swift, and this type is only ever used to facilitate verbatim bridging.
@@ -275,3 +275,10 @@ public func swift_deletedMethodError() -> Never { | |||
public func swift_willThrow() throws { | |||
} | |||
|
|||
@_silgen_name("arc4random_buf") | |||
public func arc4random_buf(buf: UnsafeMutableRawPointer, nbytes: Int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requirements for the hash seed are that
- it should be generated at process startup (or at least before/during the first time a hash table is initialized)
- it must not change after the first hash table is created
- it needs to be generated by a strong random number generator -- as strong as practical on the target system. It must not be feasible to predict what the seed will be, not even if the process starts on a freshly booted system with no memory of any previous executions.
If it is not possible to choose a good hash seed, then it likely won't be a good idea to allow the use any hashed collection type. These are inherently probabilistic data structures -- they deeply depend on the strength of the hash function, and they become a security liability if we cannot deliver on that.
@@ -251,6 +251,16 @@ extension Hasher { | |||
} | |||
} | |||
|
|||
#if $Embedded | |||
@usableFromInline | |||
var _swift_stdlib_Hashing_parameters: _SwiftHashingParameters = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Kuba is right here -- hashing is not a baseline thing for native Swift classes. (It is part of the NSObject contract, but native Swift classes do not need to opt into that.)
@@ -275,3 +275,10 @@ public func swift_deletedMethodError() -> Never { | |||
public func swift_willThrow() throws { | |||
} | |||
|
|||
@_silgen_name("arc4random_buf") | |||
public func arc4random_buf(buf: UnsafeMutableRawPointer, nbytes: Int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be declared public
? Esp. as we have swift_stdlib_random
immediately below.
@_extern(c, "arc4random_buf") | ||
public func arc4random_buf(buf: UnsafeMutableRawPointer, nbytes: Int) | ||
|
||
public func swift_stdlib_random(_ buf: UnsafeMutableRawPointer, _ nbytes: Int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not have this take an UnsafeMutableRawBufferPointer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I wanted to match the existing swift_stdlib_random's signature that we have in the C/C++ runtime, so that other existing users of swift_stdlib_random in the stdlib can be left as they are (to avoid unnecessary #ifs).
@@ -102,7 +102,7 @@ extension _HashTable { | |||
} | |||
|
|||
internal static func hashSeed( | |||
for object: AnyObject, | |||
for object: Builtin.NativeObject, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this change? (This isn't an objection, I'm just curious.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AnyObject
is a protocol, and object: AnyObject
is an existential, and existentials are disallowed in embedded Swift.
@@ -127,6 +127,22 @@ internal class __EmptySetSingleton: __RawSetStorage { | |||
#endif | |||
} | |||
|
|||
#if $Embedded | |||
public var _swiftEmptySetSingleton: (Int, Int, Int, Int, UInt8, UInt8, UInt16, UInt32, Int, Int, Int) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, why is the original C-based solution not suitable to use in embedded platforms?
Making this a Swift var
is going to be risky even if we can somehow assume that tuple layout will match -- the empty singleton ought not be defined in writable memory. (It invariably leads to someone figuring out how to insert something into the empty singleton, leading to a really "fun" debugging session for somebody else. 😛)
@@ -251,6 +251,16 @@ extension Hasher { | |||
} | |||
} | |||
|
|||
#if $Embedded | |||
@usableFromInline | |||
var _swift_stdlib_Hashing_parameters: _SwiftHashingParameters = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turning this into a Swift variable is somewhat risky -- global variables in Swift are subject to concurrency-related constraints that may be overkill for this use case.
The seed is used every time we hash anything, so it ought to be trivially accessible, preferably with absolutely no synchronization overhead (or any branches, even).
…tEmptySetSingleton
… a bunch of #ifs inside
@swift-ci please test |
1 similar comment
@swift-ci please test |
I'm going to proceed and pursue merging this PR, even though there are some very valid points in the discussion above that deserve more explorations and investigations. Given that embedded Swift is experimental and there's no promise of stability (yet), we can always iterate on it later, and even revert implementation choices. |
Mostly a straightforward addition of the Set-implementing source files to the embedded stdlib, except:
Builtin.BridgeObject
in BridgeStorage causes usage of swift_bridgeObjectRelease/swift_bridgeObjectRetain, which we could probably implement in the embedded runtime, but it looks like a better idea to just avoid usingBuiltin.BridgeObject
altogether_swift_stdlib_Hashing_parameters
are now needed, so they need to be initializedELEMENT_TYPE_OF_SET_VIOLATES_HASHABLE_REQUIREMENTS
cannot be used in embedded Swift because it uses a metatype as an argument_swiftEmptySetSingleton
needs to be defined