Skip to content

[clang] Better bitfield access units #65742

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Conversation

urnathan
Copy link
Contributor

@urnathan urnathan commented Sep 8, 2023

Reimplement the SysV (non-Microsoft) bitfield access unit computation. For avoidance of doubt, this is not an ABI change, but the memory accesses Clang emits to access bitfields.

This is a stack of two commits, to make it easier to see the change in behaviour. The first commit is a set of new tests that verify the placement of access units under the current algorith. The second commit is the change, and alters those tests as required. The guts of the change is in clang/lib/CodeGen/CGRecordLayoutBuilder.cpp.

The SysV ABI describes placing bitfields in 'storage units', which may end up overlapping eachother or other data members (for instance 'int bf :4;' may be in a 32-bit storage unit that overlaps a plan 'char c;' field). LLVM doesn't represent structures at that level, and Clang
must create a non-overlapping representation of the structure. That's what's happening here. To avoid confusion, I'm using the phrase 'access unit' to refer to the integer type Clang uses.

There are two aspects to chosing access units:

  1. bitfields that share bits of a single byte must be in the same access unit -- you can't touch only one of those bitfields. This computation is unchanged.

  2. adjacent bitfields that occupy distinct bytes, might want to share an access unit. One wants to do this if that makes a better access size. This computation is changed.

The current algorithm computes both of these in a single pass. Computing if the next bitfield extends the current access unit by requiring it exacly abut the previous, and a few other constraints. But there are some issues with this:

a) If there's even a single bit of padding between the two, no concatenation occurs. This seems suboptimal.

b) Because of the above-mentioned overlapping, bitfield access units might not start on an aligned boundary, and some architectures do not support unaligned accesses.

c) Likewise, the access unit might not be able to be a natural 2^n size, and the adjacent bytes occupied (with either other data members, or with C++, a derived class occupying tail-padding bytes). We must emit exactly a (say) 24-bit access (to avoid aliasing issues?) (example below).

d) Concatentating access units can lead to an unaligned larger access, and again, that's problematic for strict alignment ISA. (consider struct C { char a : 8; char b : 8;}; C has alignment 1, but a 2 byte access unit would naturally want 2-byte alignment.

My understanding is that larger access units were desired, to reduce the total number of accesses, and there was an assertion that the access would be narrowed if only some of the bytes changed.

That's not what we've observed, and its demonstrable on x86_64:

struct B {
  int a : 16;
  int b : 8;
  char c;
};

Here, B::a and B::b could be placed into a 3 byte access unit, but we can't use 4-byte accesses due to the 4th byte containing c. The code we end up generating for an increment of B::b is:

        movzbl  2(%rdi), %eax
        shll    $16, %eax
        movzwl  (%rdi), %ecx
        addl    %ecx, %eax
        addl    $65536, %eax                    # imm = 0x10000
        movw    %ax, (%rdi)
        shrl    $16, %eax
        movb    %al, 2(%rdi)

we're loading both B::a and B::b, sticking them together, incrementing B::b and then only storing the latter. I.e. we've reduced the memory accesses, but we're doing more work than necessary. Without the B::c field, there's 1 byte of padding, and a regular 4-byte access can be used (and is). X86_64 is not unique in this behaviour.

I'm of the opinion that one cannot do these in a single pass -- concatenating access units fundamentally requires knowing how large both units are, and you cannot tell that just by looking at the first bitfield of the second access unit.

That's what this patch does: separate the two algorithm into separate phases. We first determine which bitfields /must/ be in a single access unit, and then determine which of those access units could be concatentated. We pay attention to the storage-size of the integral type -- an i24 naturally requires 4 bytes of storage. When generating the access units we note barriers such as :0 bitfields (on ABIs that care), and the starting location of the next non-bitfield or, the structure tail-padding.

The concatenation only occurs if:

a) The larger access is naturally aligned, or unaligned access is cheap.

b) The larger access storage size doesn't expand into a forbidden zone

c) The larger access is not bigger than a register.

d) Neither of the two access units contain a volatile bitfield.

(Note that if an access-unit's storage-size naturally overlaps a subsequent access unit, we'll merge the two provided. A partial overlap can occur with packed structures and is handled too.)

ETA: in essence, it only merges access units when the number of memory accesses to get at the merged unit is less than the number of accesses for the original unmerged set of access units.

In order to know the ending barrier location, we adjust the call to accumulateBitFields, to pass that in. There's a bit of reworking on accumulateFields to make that happen, and as an added bonus we can skip [[no_unique_address]] empty members that would otherwise split a bitfield run. (I suspect that never happens in practice though.)

An example of the change this makes is that the above x86_64 case now places the two bitfields in separate access units, because i16 and i8 can be better accessed than an i24. If there's no B::c member they're placed in an i32 access unit, which is effectively the same as now. (I see similar changes for arm32 target.)

I added a new Clang TargetInfo hook to specify whether the target has cheap unaligned memory accesses or not. That’s used by the new access unit concatenation code. LLVM has a CodeGen hook to tell you this (TargetLoweringBase::allowsMisalignedMemoryAccesses), but that’s not
really accessible from where we are in Clang. AFAICT creating TargetLoweringBase requires an llvm MachineFunction, and we don’t have one when laying out records. The new hook defaults to false, which is in keeping with it being ‘a 32-bit RISC platform, like PPC or SPARC’. That is at odds with the current access unit algorithm, which behaves as-if unaligned access has no cost. I went through the target architectures’ implementation of allowsMisalignedMemoryAccesses as best I could, to override that default. It’s possible I missed cases. For the record, the following targets set the flag to true (possibly under control of specific flags): AArch64, ARM, LoongArch, Mips, PPC, SystemZ, VE, WebAssembly & X86.

Clang already has a target hook for register size, which actually reports pointer-size – a reasonable guess, but incorrect for Loongson64, which only has an ILP32 ABI supported. But, that’s an orthogonal problem. (I didn’t check for similar cases, for instance a 32 bit ABI on a mips64 cpu.)

The performance data obtained from our internal benchmarks for an out-of-tree strict-alignment ILP32 target shows code size reduction of .4% to .75%, and performance increase of .2% to 1.2%. Some benchmarks showed no change, they do not appear to be bitfield-centric.

The above-mentioned x86_64 case now places the two bitfields in different, naturally sized, access units (i16 and i8), and now generates the following assembly:

      incb    2(%rdi)

One of the ARM bitfield tests is likewise simplified.

@urnathan urnathan requested review from a team as code owners September 8, 2023 11:31
@github-actions github-actions bot added backend:ARM backend:AArch64 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. backend:loongarch labels Sep 8, 2023
@urnathan urnathan marked this pull request as draft September 8, 2023 12:33
@nunoplopes
Copy link
Member

Note that changing the memory accesses performed by clang (load or store) is an ABI change at IR level because of UB.

Also, please have a look at the existing -ffine-grained-bitfield-accesses flag and the discussions around it and IPO.

@efriedma-quic
Copy link
Collaborator

The primary issue with over-wide bitfield accesses is ensuring we don't cause data races. C++ [intro.memory]p3: "A memory location is either an object of scalar type that is not a bit-field or a maximal sequence of adjacent bit-fields all having nonzero width. Two or more threads of execution can access separate memory locations without interfering with each other." So that imposes an upper limit: we can't store any byte that might overlap another field.

There isn't really a lower limit: the ABI rule is that we access as many bytes as possible, but we aren't actually obligated to access those bytes; accesses that don't actually read or write a field aren't observable.

The advantage of exposing the wide accesses to the optimizer is that it allows memory optimizations, like CSE or DSE, to reason about the entire bitfield as a single unit, and easily eliminate accesses. Reducing the size of bitfield accesses in the frontend is basically throwing away information.

The disadvantage is, as you note here, that sometimes codegen doesn't manage to clean up the resulting code well.

I guess my primary question here is, instead of making clang try to guess which accesses will be optimal, can we improve the way the LLVM code generator handles the patterns currently generated by clang? I'm not exactly against changing the IR generated by clang, but changing the IR like this seems likely to improve some cases at the expense of others.

@urnathan
Copy link
Contributor Author

urnathan commented Sep 8, 2023

Note that changing the memory accesses performed by clang (load or store) is an ABI change at IR level because of UB.

Could you clarify why UB is a consideration here?

@efriedma-quic
Copy link
Collaborator

Note that changing the memory accesses performed by clang (load or store) is an ABI change at IR level because of UB.

Could you clarify why UB is a consideration here?

There's some weirdness with poison values; see https://reviews.llvm.org/D129541 etc. This is mostly an IR modeling thing; the machine instructions work fine, we just need to make sure we don't accidentally poison IR values that aren't supposed to be poison.

@urnathan
Copy link
Contributor Author

Also, please have a look at the existing -ffine-grained-bitfield-accesses flag and the discussions around it and IPO.

That flag's behaviour is unchanged here -- it continues to not merge access units. My contention here is that (a) merging units can be advantageous, but (b) the existing algorithm has some deficiencies.

@urnathan
Copy link
Contributor Author

Note that changing the memory accesses performed by clang (load or store) is an ABI change at IR level because of UB.

Could you clarify why UB is a consideration here?

There's some weirdness with poison values; see https://reviews.llvm.org/D129541 etc. This is mostly an IR modeling thing; the machine instructions work fine, we just need to make sure we don't accidentally poison IR values that aren't supposed to be poison.

Certainly, by could you clarify why changing the access patterns could do that? My understanding of Poison is that should they cause a user-visible state change, that's indicative of the original progam's UB, and the languages in question don't guarantee anything at that point -- nasal demons and all. That the observed behaviour of a UB-exhibiting program changes is not a defect. What am I missing?

@urnathan
Copy link
Contributor Author

The advantage of exposing the wide accesses to the optimizer is that it allows memory optimizations, like CSE or DSE, to reason about the entire bitfield as a single unit, and easily eliminate accesses. Reducing the size of bitfield accesses in the frontend is basically throwing away information.

Hm, I've been thinking of it the opposite way round -- merging bitfields is throwing away information (about where cuts might be). And it's unclear to me how CSE or DSE could make use of a merged access unit to eliminate accesses -- it would seem to me that a merged access unit accessed at a single point would make it look like the whole unit was live? (Can you point me at an example of the analysis you describe happening?)

That the simple x86 example I showed doesn't show (complete) undoing of the merging suggests it is hard for CSE and DSE to do the analysis you indicate. DSE did work there, to undo the merge, but there's no dead load elimination happening. But, that DSE is merely undoing the gluing that the front end did -- if we didn't glue, then it would always happen.

The disadvantage is, as you note here, that sometimes codegen doesn't manage to clean up the resulting code well.

My contention is that the current algorithm both (a) fails to merge some mergeable access units and (b) inappropriately merges some access units especially on strict alignment machines.

I guess my primary question here is, instead of making clang try to guess which accesses will be optimal, can we improve the way the LLVM code generator handles the patterns currently generated by clang? I'm not exactly against changing the IR generated by clang, but changing the IR like this seems likely to improve some cases at the expense of others.

In the testcases that were changed I did not encounter one that generated worse code (an ARM testcase showed better code due to, IIRC, not merging more than a register width, as with the x86 case it didn't eliminate unnecessary loads whereas with this those are gone). I would be very interested in seeing cases that degrade though.

@efriedma-quic
Copy link
Collaborator

Hm, I've been thinking of it the opposite way round -- merging bitfields is throwing away information (about where cuts might be).

In essence, late optimizations can make accesses narrower, but it can't make them wider. The information about cuts is generally recoverable by analyzing the masking instructions.


When I refer to CSE/DSE, I'm mostly talking about keeping values in registers for longer. They don't know anything about individual fields in bitfields. If we split bitfields too narrowly, we end up with extra memory accesses for accessing multiple adjacent fields. And if you have accesses which overlap (access some, but not all, of the same memory), CSE/DSE get much worse.

Having a bitfield unit that can't be loaded in a single memory access probably doesn't have much benefit, if there's a natural boundary we can use to split.

Certainly, by could you clarify why changing the access patterns could do that? My understanding of Poison is that should they cause a user-visible state change, that's indicative of the original progam's UB, and the languages in question don't guarantee anything at that point -- nasal demons and all. That the observed behaviour of a UB-exhibiting program changes is not a defect. What am I missing?

Depending on how we fix the interaction between poison and bitfields, there could be an invariant that accessing any bitfield has to freeze all adjacent bitfield values. If we do that, then the set of "adjacent" values actually matters. There's probably some way we can make this work without actually introducing memory accesses, though; probably not worth worrying about too much here.

@urnathan
Copy link
Contributor Author

thanks for your comments,

When I refer to CSE/DSE, I'm mostly talking about keeping values in registers for longer. They don't know anything about individual fields in bitfields. If we split bitfields too narrowly, we end up with extra memory accesses for accessing multiple adjacent fields. And if you have accesses which overlap (access some, but not all, of the same memory), CSE/DSE get much worse.

To be clear by 'accesses which overlap' you mean accesses to bitfields that share parts of a byte? I agree -- that's an unchanged part of the algorithm, bitfields that share parts of a single byte must be in the same access unit.

Or do you mean an access to a non-bitfield adjacent to a bitfield? How does that square with the memory model you pointed to earlier? Or is it that if you see both such accesses, it must be safe to merge them, or there's some UB already happening? But anyway, this algorithm doesn't affect that -- in both before and after cases something outside of bitfieldAccumulation must be merging them.

Having a bitfield unit that can't be loaded in a single memory access probably doesn't have much benefit, if there's a natural boundary we can use to split.

I think we're agreeing here. The current algorithm generates access units that can require multiple accesses, even on a relaxed alignment machine (I refer back to the x86 example). This replacement tries much harder to not do that -- it only merges access units if that demonstrably lowers the number of accesses needed for the merged access unit from accessing the original set of units. The motivating case we had was a sequence of 4 byte loads, to fetch an unaligned 32-bit access unit, 1 byte of that being fiddled with, and then that one byte being written back.

@urnathan urnathan marked this pull request as ready for review September 11, 2023 18:46
@urnathan
Copy link
Contributor Author

Ping? Anything more I can do to progress this?

@urnathan
Copy link
Contributor Author

time for another ping, I think ...

@nigelp-xmos
Copy link
Contributor

XCore: yes the current setting of the new flag, the default false value, is right for XCore. Misaligned loads can generate a function call. We do not have benchmarks as such, but comparing the code generated for the test files touched here, there is an improvement. In particular bitfield-ir.cpp is a big improvement.

In bitfield.cpp, two misaligned loads remain. One disappears if I remove the struct packed attribute. The other remains: N0::read31(). This is not a problem, and is not a degradation, it's a function call intended for the purpose; just mentioning for information.

@efriedma-quic
Copy link
Collaborator

I'm planning to take a closer look at this when I have more time. Sorry I haven't been more responsive here.

One very brief note: in the comments in the code, you might want to distinguish between the semantic width of a bitfield (i.e. the C standard notion of a "memory location", which has ABI significance), vs. the accesses we choose to generate (we don't need to generate no-op reads/writes).

@urnathan
Copy link
Contributor Author

One very brief note: in the comments in the code, you might want to distinguish between the semantic width of a bitfield (i.e. the C standard notion of a "memory location", which has ABI significance), vs. the accesses we choose to generate (we don't need to generate no-op reads/writes).

Indeed, the naming here is unfortunately overloaded. SysV psABIs use 'Storage Unit' (at least 86 does, and IIRC others follow the same nomenclature). But Clang also uses 'StorageInfo' and 'Storage$FOO' in its bitfield structures, which is unfortunate. I've used 'Access Unit' to describe the latter in this patch. If you meant something else, please clarify (or if I've missed some places?).

@yonghong-song
Copy link
Contributor

With this pull request, I tried to build linux kernel and kernel bpf selftest and run the selftest, and didn't find any issues. So presumably the implementation is correct.

I tried a particular kernel bpf selftest:
https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/progs/test_tunnel_kern.c#L39-L42
which has a few bitfield usages.
The generated code is the same with or without this change.
The bitfield patterns in the above test_tunnel_kern.c is pretty typical in kernel which is pro
aligned bitfields. This patch tries to handle some cases better for not-commonly-used bitfield patterns, those patterns (like the one in the pull request summary) are very rare in bpf program or kernel, so I guess it should have minimum impact on BPF backend.

@urnathan urnathan force-pushed the push/bitfield branch 2 times, most recently from ce9168a to f95a126 Compare March 11, 2024 15:44
@urnathan
Copy link
Contributor Author

@rjmccall here's a refactoring along the lines you suggested. Once one stops worrying about rescanning when the next access unit fails to accumulate into the current one, things get simpler. The compiler should be able to turn the boolean conditional flow within the loop body into direct control flow.

Would you like the API change I introduced in the previous patch (having accumulateBitFields return the following non-bitfield) broken out into a separate PR?

Copy link
Contributor

@rjmccall rjmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, the structure looks great! Mostly style and clarity comments from here.

Barrier = false;
if (Field->isZeroLengthBitField(Context) &&
(Context.getTargetInfo().useZeroLengthBitfieldAlignment() ||
Context.getTargetInfo().useBitFieldTypeAlignment()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in a previous review, please consider zero-length bit-fields to be barriers unless we literally can't because they're not char-aligned. In this case, we know that the zero-length bit-field is char-aligned, so we can honor it as a barrier, and you don't need this target condition.

Please add a comment to the case above this, something like this:

This can be a zero-length bit-field if the target doesn't align them to a
char boundary.  This is non-conformant, but we don't have a choice but
to cross the bit-field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in a previous review, please consider zero-length bit-fields to be barriers unless we literally can't because they're not char-aligned. In this case, we know that the zero-length bit-field is char-aligned, so we can honor it as a barrier, and you don't need this target condition.

Oh, I thought you were concerned about a darwin-specific ABI behaviour (which I then noticed was different in this area).

Your request wouldn't match the existing behaviour, AFAICT. For instance:

struct X {
unsigned a;
unsigned b: 8;
unsigned : 0;
unsigned c : 8;
} x;

both x86_64-linux and aarch64-darwin with -fno-bitfield-type-align place those two bitfields in a single access unit:

 BitFields:[
    <CGBitFieldInfo Offset:0 Size:8 IsSigned:0 StorageSize:16 StorageOffset:4 VolatileOffset:0 VolatileStorageSize:0 VolatileStorageOffset:0>
    <CGBitFieldInfo Offset:8 Size:0 IsSigned:0 StorageSize:16 StorageOffset:4 VolatileOffset:0 VolatileStorageSize:0 VolatileStorageOffset:0>
    <CGBitFieldInfo Offset:8 Size:8 IsSigned:0 StorageSize:16 StorageOffset:4 VolatileOffset:0 VolatileStorageSize:0 VolatileStorageOffset:0>
]>

Do you have a case where those bitfields are currently placed in separate access units (when the zero length is not a barrier)?

From the C++ std, [class.bit]

As a special case, an unnamed bit-field with a width of zero specifies alignment of the next bit-field at an
allocation unit boundary.

But for ABIs that reject using zero-length bitfields as such barriers, we've already stepped outside the language (so why separate the access units if such a bitfield happens to permit it?) Should I be looking at some other document? (The x86_64 psABI seems very vague about this, just mentioning bitfields may share storage units with other members.)

Are you taking the opportunity to alter the observed behaviour?

Copy link
Contributor

@rjmccall rjmccall Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most important standard quote to me is this, from C++ [intro.memory]p3 (which matches the definition in C):

A memory location is either an object of scalar type or a maximal sequence of adjacent bit-fields all having nonzero width.

Ultimately, this means that concurrent stores to bit-fields on opposite sides of a zero-width bit-field are not allowed to interfere with each other. This is the primary reason zero-width bit-fields exist, given how implementation-defined everything about their actual layout is. We obviously cannot guarantee that if the bit-fields overlap within a byte, but I do think we need to be making that guarantee whenever we can. And yes, I would like to take this opportunity to alter the observed behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears there are no existing tests for the current behaviour of such non-barrier zero-sized bitfields -- altering the patch in the way you describe results in no test failures beyond those in the newly added CodeGen/bitfield-access-pad.c. (altering trunk in a similar way results in only similar failures in the baseline version of that new test)

Also, the two tests for -ffine-grained-bitfield-access (CodeGenCXX/finegrain-bitfield-access.cpp & CodeGenCXX/finegrain-bitfield-type.cpp) do not have tests containing such zero-length fields falling on byte boundaries with -fno-bitfield-type-align. (I.e. patterns that would be combined now and with this diff, except for the -ffine-grained-bitfield-access disabling that.)

The behaviour here seems a little undertested.

For the record, I'd live with the behaviour you suggest, but i think that change belongs in a different patch (either before or after this patch).

Copy link
Contributor

@rjmccall rjmccall Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by your reticence here. If this were, broadly, an NFC patch, then I'd completely understand not wanting to fix a random bug that we found while working on it. But this is definitely not an NFC patch; you're changing our generated code patterns in very significant ways, including in ways that break apart access units in exactly the way that would happen if we considered these to be barriers. Given that, I don't understand why it'd only be okay to change code patterns in ways that aren't bug fixes, especially since the change strictly simplifies the logic in your patch.

@wzssyqa
Copy link
Contributor

wzssyqa commented Mar 14, 2024

For MIPSr6, it is just like AARCH64, since some microarchitecture doesn't support mis-unaligned well in hardware level, so we need an options to disable it: kernel may need it.

For GCC, we have -mno-unalgined-access. We need also add this option to clang.

#85174
Add -m(no-)unaligned-access for MIPSr6.

@urnathan
Copy link
Contributor Author

@rjmccall here is a rebase an update, which I think addresses all your comments. I did make some material changes though:

  1. I removed the Volatile handling. I was always a little uncomfortable with it because it didn't affect the access units of a non-volatile bitfield that ends up being volatile qualified via the structure's volatile quals itself. Users of volatile fields are probably confused, but if not they have a std-blessed mechanism for isolating access units -- a zero-length bitfield. The heuristic broke the idea that this patch /just/ implemented a better access algorithm so I was being inconsistent. AFAICT the ARM ABI, which seems to be one that describes volatile bitfields carefully, does not specify that volatile bitfields should be as isolated as possible. This change causes one change, in CodeGen/aapcs-bitfield.c with:
struct st5 {
  int a : 12;
  volatile char c : 5;
} st5;

The two bitfields are placed into a single 32-bit access unit, rather than separate 16bit ones. Either behaviour is ok with the aapcs. (The previous algorithm would have placed them into a single 24bit field if they abutted at a byte boundary, no change in that behaviour now.)

  1. Implementing the barrier behaviour you wanted. I tried to find a psABI that had the right attributes to place barriers at arbitrary bit positions, to see if it had any comment, but failed to find one. as you say, this simplifies things, but ...

  2. The barrier bitfield behaviour that we already had showed an interesting behaviour, which I suspect is from a later scissoring processing. Namely with:

struct  A {
  unsigned a : 16;
  unsigned b : 8;
  char : 0;
  unsigned d;
} a;

we'd generate an llvm struct of { i24, i32}, but then adjust the access unit for the bitfields to be a 32-bit unit itself. That seems conformant, because nothing else uses that 4th byte, and the std merely says that the bitfield starts at an allocation unit boundary. This patch was originally treating that barrier as a limit to the current span, whereas we can use the next member with storage as that limit. This is actually the behaviour we had when we reached the end of a run of bitfields, so I have made that change as you can see from the changed handling setting Barrier.

  1. I adjusted the new testcases to reduce their complexity -- we don't care about endianness, which only affects the placement of the bitfields within their access unit, not the access units themselves.

It may be that the later pass that was adjusting bitfield accesses to natural sizes can be simplified or deleted. I've not investigated.

@rjmccall
Copy link
Contributor

Yeah, we don't need to care about the actual bit offset of the zero-width bit-field as long as we honor the non-interference properties across it. I'll take a look at the patch, thanks.

@urnathan
Copy link
Contributor Author

Sorry to push another update, but I realized the LimitOffset computation could be sunk to the point of use, and therefore not computed in all cases.

Copy link
Contributor

@rjmccall rjmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks! I have a few editorial nits, and there's an assertion that seems off, but otherwise this looks ready to go.

These new tests verify clang's choice of types to access
bitfields. This is in preparation for the next patch, which changes
that algorithm. With these tests already in place it'll be easier to
see what has changed.
@urnathan
Copy link
Contributor Author

@rjmccall thanks, here is an update with those changes. I've collapsed it to the 3 commits mentioned earlier

  1. Tests marked up for the current behaviour
  2. TargetInfo strict/flexible alignment load predicate
  3. The new algorithm

Copy link

✅ With the latest revision this PR passed the Python code formatter.

Copy link
Contributor

@rjmccall rjmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Thanks for putting up with such a protracted review! I really like where we've ended up.

LGTM. Please wait a few days before merging to give other reviewers a chance to give final feedback.

urnathan added a commit that referenced this pull request Mar 29, 2024
urnathan added a commit that referenced this pull request Mar 29, 2024
Promote ARM & AArch64's HasUnaligned to TargetInfo and set for all
targets.
urnathan added a commit that referenced this pull request Mar 29, 2024
Reimplement bitfield access unit computation for SysV ABIs. Considers
expense of unaligned and non-power-of-2 accesses.
@AZero13
Copy link
Contributor

AZero13 commented Mar 29, 2024

Ping?

@urnathan
Copy link
Contributor Author

Committed, expect tailclipping cleanup PR soon

@urnathan
Copy link
Contributor Author

urnathan commented Apr 1, 2024

Be aware of bug #87227 and PR #87238 to address that

@vitalybuka
Copy link
Collaborator

This failure is caused by the patch
https://lab.llvm.org/buildbot/#/builders/239/builds/6363/steps/10/logs/stdio
cc @hctim

vitalybuka added a commit to llvm/llvm-zorg that referenced this pull request Apr 2, 2024
Fixes stack overflow in clang/test/SemaTemplate/instantiation-depth-default.cpp
after llvm/llvm-project#65742
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 backend:ARM backend:loongarch backend:PowerPC backend:SystemZ backend:WebAssembly clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.