Skip to content

[embedded] Add Set to the embedded stdlib #70217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jan 3, 2024

Conversation

kubamracek
Copy link
Contributor

Mostly a straightforward addition of the Set-implementing source files to the embedded stdlib, except:

  • Builtin.BridgeObject in BridgeStorage causes usage of swift_bridgeObjectRelease/swift_bridgeObjectRetain, which we could probably implement in the embedded runtime, but it looks like a better idea to just avoid using Builtin.BridgeObject altogether
  • _swift_stdlib_Hashing_parameters are now needed, so they need to be initialized
  • ELEMENT_TYPE_OF_SET_VIOLATES_HASHABLE_REQUIREMENTS cannot be used in embedded Swift because it uses a metatype as an argument
  • _swiftEmptySetSingleton needs to be defined

@kubamracek kubamracek requested a review from a team as a code owner December 4, 2023 23:47
@kubamracek
Copy link
Contributor Author

@swift-ci please test

@kubamracek kubamracek added the embedded Embedded Swift label Dec 4, 2023
@@ -275,3 +275,10 @@ public func swift_deletedMethodError() -> Never {
public func swift_willThrow() throws {
}

@_silgen_name("arc4random_buf")
public func arc4random_buf(buf: UnsafeMutableRawPointer, nbytes: Int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can expect this to be on all embedded systems. Certainly AVR won't have it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, and I think the implication is that if you can't provide arc4random_buf then you cannot use Set or anything else that needs to compute hashes of values. Is that fair?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this doesn't create a direct dependency. Only when you use something that needs it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not sure that works that way currently for package based builds; are you saying when we have LTO enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, even without LTO. Embedded Swift in the compiler causes all dependent functions/types/libraries to be brought in lazily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compiler causes all dependent functions/types/libraries to be brought in lazily.

Hmm then I should track down that bug it seems...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be marked @_extern(c, "arc4random_buf") instead.

Ooh, a new fancy attribute, perfect!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requirements for the hash seed are that

  • it should be generated at process startup (or at least before/during the first time a hash table is initialized)
  • it must not change after the first hash table is created
  • it needs to be generated by a strong random number generator -- as strong as practical on the target system. It must not be feasible to predict what the seed will be, not even if the process starts on a freshly booted system with no memory of any previous executions.

If it is not possible to choose a good hash seed, then it likely won't be a good idea to allow the use any hashed collection type. These are inherently probabilistic data structures -- they deeply depend on the strength of the hash function, and they become a security liability if we cannot deliver on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be declared public? Esp. as we have swift_stdlib_random immediately below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to be, dropped 'public'.

@@ -251,6 +251,16 @@ extension Hasher {
}
}

#if $Embedded
@usableFromInline
var _swift_stdlib_Hashing_parameters: _SwiftHashingParameters = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably more for a embedded firmware toolchain maintainer to implement - the seed likely can be sourced from some sort of hardware support for random generation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that the user / the toolchain maintainer handles that in arc4random_buf.

Copy link
Contributor

@phausler phausler Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: one consequence is that if you declare any reference type you then MUST implement arc4_random because you then support classes which require Hashable transitively

also IIUC this applies to enums with rawValues e.g. enum Foo: UInt8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not accurate -- see https://github.com/apple/swift/pull/70344/files#diff-1289fc8d34bfbbd931d86529683eb2b8f54c9c5d3d6b3f6111fa192d6650e71a where a test shows that even if you declare classes/subclasses, that alone doesn't trigger dependency on Hashable / arc4_random. Only if you actually ask for a hash of something (directly, or by using a Set) do you get the dependency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Kuba is right here -- hashing is not a baseline thing for native Swift classes. (It is part of the NSObject contract, but native Swift classes do not need to opt into that.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turning this into a Swift variable is somewhat risky -- global variables in Swift are subject to concurrency-related constraints that may be overkill for this use case.

The seed is used every time we hash anything, so it ought to be trivially accessible, preferably with absolutely no synchronization overhead (or any branches, even).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constraints you're talking about are essentially a swift_once wrapper, correct? And the concern with that is about performance, but not correctness, right?

If so, I'm inclined to use this mechanism for its simplicity, and leave the performance concern as a future/followup improvement. Unconditionally initializing the hash seed won't be great for embedded Swift users who don't use hashing.

@@ -127,6 +127,22 @@ internal class __EmptySetSingleton: __RawSetStorage {
#endif
}

#if $Embedded
public var _swiftEmptySetSingleton: (Int, Int, Int, Int, UInt8, UInt8, UInt16, UInt32, Int, Int, Int) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoa? is this even safe to do like this?!
if so we should consider init vectors to be done like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that only homogeneous tuples are guaranteed to be layout-stable… Do we have a guarantee about how this will be laid out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is paired to the same source that defines that - so ostensibly we could have a unit test that validates the expected layout and usage of such.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, why is the original C-based solution not suitable to use in embedded platforms?

Making this a Swift var is going to be risky even if we can somehow assume that tuple layout will match -- the empty singleton ought not be defined in writable memory. (It invariably leads to someone figuring out how to insert something into the empty singleton, leading to a really "fun" debugging session for somebody else. 😛)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because in embedded Swift, the whole stdlib is only compiled to SIL and distributed as a .swiftmodule, and there isn't any .a or .o with actual code produced. Only the client's / app's build actually IRGen code, including the stdlib code. This lets us allow users to specify their actual LLVM ABI and IRGen flags without the need to have a matching stdlib build (this is a desirable property because embedded hardware/environments/SDKs have tons of actively used combinations of ABI settings). Essentially a similar outcome that you get with "header-only" C/C++ libraries.

Flip side is that we can't have C/C++ code in the embedded stdlib. The two concerns mentioned (layout guarantees, const-ness / non-writable memory) seem manageable to me, I'll try to address them separately afterwards (by adding tests checking the layout, and by exploring IRGen flags or other options that would make the singleton variables constant), if that's okay with you.

Added a comment explaining this in the meantime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have another (though weaker) point to make: We should start considering moving parts of the regular Swift runtime from C to Swift, and the embedded Swift runtime can serve as a nice proving ground for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re. non-writable memory: AFAICT, the current regular definitions of the set/array singletons are also not in non-writable memory, and they are technically in writable memory. (Sounds like we should fix that there too.)

// Finally, take a slow path through the standard library to see if the
// current environment can accept a larger stack allocation.
guard #available(macOS 12.3, iOS 15.4, watchOS 8.5, tvOS 15.4, *) //SwiftStdlib 5.6
else {
return false
}
return swift_stdlib_isStackAllocationSafe(byteCount, alignment)
#else
return false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be always true? 'cause else wise it forces the temp allocations to be heap based

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See a few lines above -- there's a hardcoded limit of 1024, under which stack allocations are always allowed. Returning false here means allocations over 1024 move to the heap. That seems pretty reasonable to me, though of course 1024 is a very arbitrary limit...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well in a number of embedded systems the heap and stack are from the same ram segment so eh 6 one way 1/2 a dozen the other I guess...

@kubamracek
Copy link
Contributor Author

@swift-ci please test

@kubamracek
Copy link
Contributor Author

@swift-ci please test

@kubamracek
Copy link
Contributor Author

@swift-ci please test

rawValue = _bridgeObject(taggingPayload: taggedPayload)
#else
rawValue = Builtin.reinterpretCast(taggedPayload)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I believe directly casting between reference types and trivial types is undefined behavior. If we need to keep this type enabled, then this reinterpret cast probably ought to be replaced with:

Suggested change
rawValue = Builtin.reinterpretCast(taggedPayload)
let p = UnsafeRawPointer(bitPattern: taggedPayload)
self.rawValue = Unmanaged.fromOpaque(p).takeUnretainedValue()

(The takeUnretainedValue() will emit a (depening on the value, probably, but not necessarily) noop call to swift_retain.)

However, the notion of initializing a regular strong reference from an arbitrary bit pattern is bogus; we should not do it, ever. The only users of this entry point are _modify accessors and the like, where it is used to generate dummy instances. Instead of using a null pointer for that purpose, we need to start using the immortal empty singleton object we already have!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think I managed to get rid of this initializer and of the sketchy casts -- would you mind taking another look now?

@@ -31,9 +31,15 @@ internal struct _BridgeStorage<NativeClass: AnyObject> {

// rawValue is passed inout to _isUnique. Although its value
// is unchanged, it must appear mutable to the optimizer.
#if !$Embedded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of cutting out specific parts of this, we should instead #if out the entire _BridgeStorage type -- there is no ObjC runtime in embedded Swift, and this type is only ever used to facilitate verbatim bridging.

@@ -275,3 +275,10 @@ public func swift_deletedMethodError() -> Never {
public func swift_willThrow() throws {
}

@_silgen_name("arc4random_buf")
public func arc4random_buf(buf: UnsafeMutableRawPointer, nbytes: Int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requirements for the hash seed are that

  • it should be generated at process startup (or at least before/during the first time a hash table is initialized)
  • it must not change after the first hash table is created
  • it needs to be generated by a strong random number generator -- as strong as practical on the target system. It must not be feasible to predict what the seed will be, not even if the process starts on a freshly booted system with no memory of any previous executions.

If it is not possible to choose a good hash seed, then it likely won't be a good idea to allow the use any hashed collection type. These are inherently probabilistic data structures -- they deeply depend on the strength of the hash function, and they become a security liability if we cannot deliver on that.

@@ -251,6 +251,16 @@ extension Hasher {
}
}

#if $Embedded
@usableFromInline
var _swift_stdlib_Hashing_parameters: _SwiftHashingParameters = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Kuba is right here -- hashing is not a baseline thing for native Swift classes. (It is part of the NSObject contract, but native Swift classes do not need to opt into that.)

@@ -275,3 +275,10 @@ public func swift_deletedMethodError() -> Never {
public func swift_willThrow() throws {
}

@_silgen_name("arc4random_buf")
public func arc4random_buf(buf: UnsafeMutableRawPointer, nbytes: Int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be declared public? Esp. as we have swift_stdlib_random immediately below.

@_extern(c, "arc4random_buf")
public func arc4random_buf(buf: UnsafeMutableRawPointer, nbytes: Int)

public func swift_stdlib_random(_ buf: UnsafeMutableRawPointer, _ nbytes: Int) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have this take an UnsafeMutableRawBufferPointer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I wanted to match the existing swift_stdlib_random's signature that we have in the C/C++ runtime, so that other existing users of swift_stdlib_random in the stdlib can be left as they are (to avoid unnecessary #ifs).

@@ -102,7 +102,7 @@ extension _HashTable {
}

internal static func hashSeed(
for object: AnyObject,
for object: Builtin.NativeObject,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this change? (This isn't an objection, I'm just curious.)

Copy link
Contributor Author

@kubamracek kubamracek Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnyObject is a protocol, and object: AnyObject is an existential, and existentials are disallowed in embedded Swift.

@@ -127,6 +127,22 @@ internal class __EmptySetSingleton: __RawSetStorage {
#endif
}

#if $Embedded
public var _swiftEmptySetSingleton: (Int, Int, Int, Int, UInt8, UInt8, UInt16, UInt32, Int, Int, Int) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, why is the original C-based solution not suitable to use in embedded platforms?

Making this a Swift var is going to be risky even if we can somehow assume that tuple layout will match -- the empty singleton ought not be defined in writable memory. (It invariably leads to someone figuring out how to insert something into the empty singleton, leading to a really "fun" debugging session for somebody else. 😛)

@@ -251,6 +251,16 @@ extension Hasher {
}
}

#if $Embedded
@usableFromInline
var _swift_stdlib_Hashing_parameters: _SwiftHashingParameters = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turning this into a Swift variable is somewhat risky -- global variables in Swift are subject to concurrency-related constraints that may be overkill for this use case.

The seed is used every time we hash anything, so it ought to be trivially accessible, preferably with absolutely no synchronization overhead (or any branches, even).

@kubamracek
Copy link
Contributor Author

@swift-ci please test

1 similar comment
@kubamracek
Copy link
Contributor Author

@swift-ci please test

@kubamracek
Copy link
Contributor Author

I'm going to proceed and pursue merging this PR, even though there are some very valid points in the discussion above that deserve more explorations and investigations. Given that embedded Swift is experimental and there's no promise of stability (yet), we can always iterate on it later, and even revert implementation choices.

CC @phausler @lorentey

@kubamracek kubamracek merged commit 1efcb41 into swiftlang:main Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
embedded Embedded Swift
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants