Skip to content

Builtin.FixedArray #76831

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 23, 2024
Merged

Builtin.FixedArray #76831

merged 1 commit into from
Oct 23, 2024

Conversation

jckarter
Copy link
Contributor

@jckarter jckarter commented Oct 2, 2024

Add a primitive that can be used to represent uniform aggregates of contiguously-laid-out values, parameterized by a generic size and element type. Builtin.FixedArray gets all of its layout properties from its element type, and copies/moves/destroys elementwise.

@jckarter jckarter force-pushed the builtin-vector branch 3 times, most recently from 71f100b to f481919 Compare October 9, 2024 02:38
elements.push_back(Element.buildTypeLayoutEntry(IGM, eltTy,
useStructLayouts));
});
return IGM.typeLayoutCache.getOrCreateAlignedGroupEntry(elements, T,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to use the new ArrayLayoutEntry which takes an element layout, element type, and a count type (which is trivial to build back from an unsigned IntegerType::get(std::to_string(arraySize), false, ctx)).

}

template<typename...Addresses, typename Body>
void eachElementAddrLoop(IRGenFunction &IGF,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how useful it'd be here, but I added IRGenFunction::emitLoopOverElements to support the ArrayLayoutEntry work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see! I'll work on reconciling these in a follow up since it would be good to have only one implementation. emitLoopOverElements looks like it has some bugs in its ad-hoc element indexing that could be avoided by using TypeInfo::indexArray.

@jckarter
Copy link
Contributor Author

@swift-ci Please test

@jckarter
Copy link
Contributor Author

@swift-ci Please test

@jckarter
Copy link
Contributor Author

@swift-ci Please test

@jckarter
Copy link
Contributor Author

@swift-ci Please test

@jckarter
Copy link
Contributor Author

@swift-ci Please test

@jckarter
Copy link
Contributor Author

@swift-ci Please test

@jckarter jckarter marked this pull request as ready for review October 17, 2024 00:15
@jckarter jckarter requested a review from tshortli as a code owner October 17, 2024 00:15
@jckarter jckarter changed the title [wip] Builtin.Vector Builtin.Vector Oct 17, 2024
@jckarter
Copy link
Contributor Author

@swift-ci Please test

@jckarter jckarter changed the title Builtin.Vector Builtin.FixedArray Oct 17, 2024
@jckarter
Copy link
Contributor Author

@swift-ci Please test

StoredPointerDifference Count;
ConstTargetMetadataPointer<Runtime, swift::TargetMetadata> Element;

StoredPointerDifference getCount() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned about callers naively thinking this is going to return the raw value, and then they won't be correctly round-tripping the metadata. Can we give this a name that makes it clear that it's a zero-bounded count?

CanType getSize() const { return Size; }

/// Get the fixed integer number of elements if known and zero or greater.
std::optional<unsigned> getFixedSize() const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a typedef for the value type here, and it should probably be represented with uint64_t.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point about uint64_t, though I don't know that a typedef would do much over just using uint64_t directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's communicatively valuable. I guess a 128-bit target is unlikely, though, you're right.

@@ -1537,3 +1539,682 @@ void irgen::emitBuiltinCall(IRGenFunction &IGF, const BuiltinInfo &Builtin,

llvm_unreachable("IRGen unimplemented for this builtin!");
}

template<typename BaseTypeInfo, typename ElementTypeInfo = BaseTypeInfo>
class ArrayTypeInfoBase : public BaseTypeInfo {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put all of this in a new file.

virtual llvm::Value *getArraySize(IRGenFunction &IGF, SILType T) const = 0;
virtual std::optional<unsigned> getFixedArraySize(SILType T) const = 0;

template<typename Body>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please either add a comment explaining the expected signature of body or just make this use llvm::function_ref.

auto predBB = IGF.Builder.GetInsertBlock();

auto loopBB = IGF.createBasicBlock("each_vector_element");
auto endBB = IGF.createBasicBlock("end_vector_element");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array, please

buildTypeLayoutEntry(IRGenModule &IGM,
SILType T,
bool useStructLayouts) const override {
std::vector<TypeLayoutEntry *> elements;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dead, which seems wrong.

Copy link
Contributor Author

@jckarter jckarter Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not dead. This is necessary to override the pure virtual requirement of FixedArrayTypeInfo (note the override).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant the elements variable.

@@ -35,7 +35,7 @@ namespace irgen {
void emitBuiltinCall(IRGenFunction &IGF, const BuiltinInfo &builtin,
BuiltinInst *Inst, ArrayRef<SILType> argTypes,
Explosion &args, Explosion &result);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace

}

template<typename Body>
void eachElementOffset(Body &&body) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either comment the signature or use llvm::function_ref; also, this should definitely be a uint64_t.


void addToAggLowering(IRGenModule &IGM, SwiftAggLowering &lowering,
Size offset) const override {
eachElementOffset([&](unsigned eltByteOffset){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At what point do you check that offset * elementSize doesn't overflow? And actually, the same question applies to the dynamic cases, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we don't anywhere else we do type layout, but you're right we probably should here since it's a lot easier to induce a size overflow now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably doesn't make any sense to treat huge arrays as loadable at any level, so I put a cap at 4096 after which we lower the type as address-only in SIL, and Fixed but not Loadable in IRGen.

For dynamic cases, I'm wondering if it'd be sufficient to have the standard library type do a precondition check that the value wouldn't be larger than address space before attempting initialization. If you can never construct such a value to begin with, then the value witness operations should never occur. Though I suppose it would be possible to have a type like

struct Foo<let N: Int> {
  var buffer: Vector<N, Int>
  var somethingElse: Int

  init() {
    self.somethingElse = 42
    self.buffer = initialBuffer()
  }
}

and if an attacker got control of N, they could trick the init() call into writing 42 somewhere bad before the self.buffer initializer got a chance to trap. Maybe we could cap the value witness table size of these array types at some practical limit (maybe 2^31-1 for 32-bit platforms, and 2^55-1 on 64-bit platforms), so that if a huge array shows up as part of an aggregate, the offsets for following fields have some cushion before they wrap around?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should probably step back and get this right in general. Semantically, we should really be checking legality in the initializer of every type, and it's just that fixed-size types can emit that check statically, hopefully by removing it. Fixed-size arrays obviously make this much easier to violate, but it's always been the right rule. And fixed-size arrays make it easier to violate in all cases, not just when the size of the fixed-size array type itself overflows. So I guess there should be a flag in the VWT for illegal types, and struct/enum/class dynamic layout needs to detect this and set the flag appropriately.

If the semantics are that illegal types are well-formed but invalid to instantiate, ideally they should have zero size. That might screw up extra-inhabitant enum layout, though, since it would make the entire case go away. Do we ever use extra-inhabitant layout strategies based on non-fixed enum cases, e.g. by knowing that the payload is a tuple whose first element has a fixed layout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the semantics are that illegal types are well-formed but invalid to instantiate, ideally they should have zero size. That might screw up extra-inhabitant enum layout, though, since it would make the entire case go away. Do we ever use extra-inhabitant layout strategies based on non-fixed enum cases, e.g. by knowing that the payload is a tuple whose first element has a fixed layout?

Yeah, for tuples and structs, we pick the element with the most extra inhabitants to provide extra inhabitants for the aggregate.

If we can properly track type legality, and make such types treated as zero-size, that would have the nice theoretical benefit that an illegal type could be substituted into an enum payload while still allowing the valid payloads of the enum to be practically usable. I wonder if we could retroactively track Never as being illegal too; that might "break" the size we expect of structs with unconditional Never fields, but those structs couldn't actually be instantiated…

Copy link
Contributor

@rjmccall rjmccall Oct 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we can make this work, we just have to find a rule that's consistent with our current assumptions about type layout. In particular, consider a type like this:

  struct A<T> {
    var x: SomeClassReference
    var y: T
  }

When we're laying out Optional<A<T>> generically, do we always assume A<T> has extra inhabitants because we can see that x does, such that e.g. constructing .none can just write a null pointer (as the first XI) into that field? If so, then the illegal-type layout rule probably has to stay consistent with that — whatever fields are assumed to always exist have to continue to exist, and it's only the other fields that can be removed when we recognize an illegal type.

Ultimately, I think we're deciding between three rules here:

  1. Product types with any illegal fields are zero-sized overall.
  2. All fields following illegal fields in product types are zero-sized.
  3. Product types just treat illegal fields as zero-sized but don't otherwise change layout.

(1) only works if we don't make any XI assumptions about known fields when laying out enums of generic non-fixed product types. (2) only works if those XI assumptions are only made about in the fixed-size prefix of the product. (3) should work in any case, but we might be in deep trouble if illegality starts in the product itself (i.e. all of the fields are legal, but the product layout overflows).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we're laying out Optional<A> generically, do we always assume A has extra inhabitants because we can see that x does, such that e.g. constructing .none can just write a null pointer (as the first XI) into that field? If so, then the illegal-type layout rule probably has to stay consistent with that — whatever fields are assumed to always exist have to continue to exist, and it's only the other fields that can be removed when we recognize an illegal type.

It looks like dynamic type layout for A dynamically picks the field with the most extra inhabitants:

  // We have extra inhabitants if any element does. Use the field with the most.
  unsigned extraInhabitantCount = 0;
  for (unsigned i = 0; i < numFields; ++i) {
    unsigned fieldExtraInhabitantCount =
        fieldTypes[i]->getNumExtraInhabitants();
    if (fieldExtraInhabitantCount > extraInhabitantCount) {
      extraInhabitantCount = fieldExtraInhabitantCount;
    }
  }

So we should be able to make it so that a product with an illegal field itself becomes fully illegal.

@jckarter
Copy link
Contributor Author

@swift-ci Please test

@lorentey
Copy link
Member

What happened to the builtin's name?

@jckarter
Copy link
Contributor Author

@lorentey We can't call the compiler type BuiltinVectorType because that's already taken for the type representing Builtin.Vec. Several people noted they would rather have the source-level name of the builtin match the compiler-level type name, and it seems like it doesn't really matter that much since you're the only person who has to see it. The standard library can still call its type whatever it wants.

@lorentey
Copy link
Member

@jckarter If you say so. The new builtin is clearly a close relative of Builtin.Vec<N>x<T>; it seems weird to invent a radically new name for it. (And FixedArray is a curious choice.) The compiler indeed does have a venerable tradition of intentionally using radically different language than the stdlib; I don't think that helps communication, but who am I to question tradition. 😛

@jckarter
Copy link
Contributor Author

@swift-ci Please test

@jckarter
Copy link
Contributor Author

@swift-ci Please test Windows

@jckarter
Copy link
Contributor Author

@swift-ci Please test

Copy link
Contributor

@rjmccall rjmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One request, but otherwise LGTM

@jckarter
Copy link
Contributor Author

@swift-ci Please test

@@ -13625,6 +13625,7 @@ _swift_getExtendedExistentialTypeMetadata
_swift_getExtendedExistentialTypeMetadata_unique
_swift_getExtendedExistentialTypeShape
_swift_getExtendedFunctionTypeMetadata
_swift_getFixedArrayTypeMetadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these tests, you should add an Added: _swift_getFixedArrayTypeMetadata entry to both abi/macOS/arm64/stdlib.swift and abi/macOS/x86_64/stdlib.swift

`Builtin.FixedArray<let N: Int, T: ~Copyable & ~Escapable>` has the layout of `N` elements of type `T` laid out
sequentially in memory (with the tail padding of every element occupied by the array). This provides a primitive
on which the standard library `Vector` type can be built.
@jckarter
Copy link
Contributor Author

@swift-ci Please test

@jckarter jckarter merged commit 8242110 into swiftlang:main Oct 23, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants