Skip to content

[TargetInstrInfo] enable foldMemoryOperand for InlineAsm #70743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions llvm/include/llvm/CodeGen/TargetInstrInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -2186,6 +2186,16 @@ class TargetInstrInfo : public MCInstrInfo {
// Get the call frame size just before MI.
unsigned getCallFrameSizeAt(MachineInstr &MI) const;

/// Fills in the necessary MachineOperands to refer to a frame index.
/// The best way to understand this is to print `asm(""::"m"(x));` after
/// finalize-isel. Example:
/// INLINEASM ... 262190 /* mem:m */, %stack.0.x.addr, 1, $noreg, 0, $noreg
/// we would add placeholders for: ^ ^ ^ ^
virtual void
getFrameIndexOperands(SmallVectorImpl<MachineOperand> &Ops) const {
llvm_unreachable("unknown number of operands necessary");
}

private:
mutable std::unique_ptr<MIRFormatter> Formatter;
unsigned CallFrameSetupOpcode, CallFrameDestroyOpcode;
Expand Down
62 changes: 62 additions & 0 deletions llvm/lib/CodeGen/TargetInstrInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -565,6 +565,64 @@ static MachineInstr *foldPatchpoint(MachineFunction &MF, MachineInstr &MI,
return NewMI;
}

static void foldInlineAsmMemOperand(MachineInstr *MI, unsigned OpNo, int FI,
const TargetInstrInfo &TII) {
MachineOperand &MO = MI->getOperand(OpNo);
const VirtRegInfo &RI = AnalyzeVirtRegInBundle(*MI, MO.getReg());

// If the machine operand is tied, untie it first.
if (MO.isTied()) {
unsigned TiedTo = MI->findTiedOperandIdx(OpNo);
MI->untieRegOperand(OpNo);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it generally correct to untie an operand?

I don't know how the tie constraints are set for inline asm so sorry if the question is dumb!

Copy link
Member Author

@nickdesaulniers nickdesaulniers Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only registers may be tied. Note how MachineInstr::untieRegOperand() does nothing if !MachineOperand::isReg() and calls MachineOperand::isTied() which asserts if !isReg().

Because we are folding register MachineOperands into Frame Indices, we must untie them, otherwise we will assert in MachineOperand::ChangeToFrameIndex (line 582). Many of the MachineOperand::ChangeTo* methods assert if the input is a tied register and you try to change them (without manually untieing them first).

This is also the reason why I had to "remember" which operands were tied when splicing new ones in a41b149 ; if you splice new operands in the middle, you need to remember the existing ties, untie them, move them, insert new operands, then manually retie them.


Additional context on what even is a tied operand in the context of inline asm.

For inline asm, you can have inputs, outputs, in/outs, or 2 variables where one is an input, one is an output, but for the sake of register allocation the location chosen for the inline asm is one and the same.

asm ("# %0 %1":"=r"(x):"0"(y));

x is an output, y is an input. But they will be allocated the same storage location for the duration of the inline asm; %0 will be equivalent to %1.

I don't know when that's ever preferable to just using one variable.

asm ("# %0":"+r"(x));

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only registers may be tied.

Yes, what I'm saying is tied operands and untied operands don't have the same semantic and I was wondering if this code could break that.

What I was thinking is, with tied operand, we'll have something like:

a = a + ...

When we untie and fold, we'll end up with:

a = @a + ...

At this point, a and @a don't hold the same value, so we broke the tied operand semantic.
Now, I think this is not a problem for regular code because if @a is used somewhere else, regalloc should have produced the correct store operation, i.e.,

a = @a + ...
store a, @a <-- values properly tied again

My concern is around inline asm that regalloc cannot unbox.
What happens if the inline asm statement is a sequence of instructions:

a = a + ...
... = a

That we turn into:

a = @a
... = @a <-- @a != a

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: A safe temporary solution could be to disallow the folding (i.e., use the m variant only) for tied operands.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that we call this recursively on the tied operand, so we would produce:

@a = @a + ...

So in theory this is fine.

In practice, I don't believe this works all the time because IIRC, for instance, x86 cannot have a memory operand both as definition and as argument of the same instruction.

Copy link
Member Author

@nickdesaulniers nickdesaulniers Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: A safe temporary solution could be to disallow the folding (i.e., use the m variant only) for tied operands.

No. Support for tied operands is table stakes for resolving #20571 because clang will lower "+" modifiers ("in+out" variables) to tied operands. We need to support:

  1. inputs
  2. outputs
  3. in/outs

so 3 is table stakes.

In practice, I don't believe this works all the time because IIRC, for instance, x86 cannot have a memory operand both as definition and as argument of the same instruction.

If you have tied operands for inline asm, you may have 2 distinct variables (one as input, one as output), but for the duration of the inline asm blob, they have 1 (the same) storage location (register or stack slot). So in the inline asm, you'd only refer to the 1 storage location (be it register or memory) and not both values. So it doesn't matter if x86 cannot have a memory def and use as operands on the same instr. My example above wasn't clear in that regard, perhaps another example:

long foo (long x) {
    long y;
    asm (
        "mov %0, r8\n\t"
        "mov r8, %0"
        :"=r"(y):"0"(x):"r8");
    return y;
}

in the case, %1 is not referenced in the inline asm, only %0 is. But if it were, it would have the same value as %0 because that's what tying is (at least my understanding of it, which may be wrong).

what I'm saying is tied operands and untied operands don't have the same semantic

That is correct but I think only so for register operands. If you have 2 memory operands which refer to the same stack slot, they are in effect tied. IIUC, tied operands only matter for the purpose of register allocation. If an input value is tied to an output value, tieing them just means "put them in the same place (be that a register or memory location)."

So in MIR if we have:

INLINEASM &"# $0 $1", 0 /* attdialect */, 1076101130 /* regdef:GR32 spillable */, def %0, 2147483657 /* reguse tiedto:$0 */, %1(tied-def 3)

(i.e. %0 is tied to %1) then that means the developer is asking the register allocator "please assign %0 and %1 to the same storage location (be that register or stack slot) for the duration of the INLINEASM blob.

Given that example as input, after my change (this change), we first notice that %0 is tied, so we untie it, and transform the above (the recursive call) to:

INLINEASM &"# $0 $1", 0 /* attdialect */, 1076101130 /* regdef:GR32 spillable */, def %0, 262190 /* mem:m */, %stack.0, 1, $noreg, 0, $noreg

which replaces %1 (a tied register def) with a frame index. Then upon return, we continue transforming %0 replacing that with the same frame index (in this case %stack.o):

INLINEASM &"# $0 $1", 0 /* attdialect */, 262190 /* mem:m */, %stack.0, 1, $noreg, 0, $noreg, 262190 /* mem:m */, %stack.0, 1, $noreg, 0, $noreg

(this is better reflected in the test added in the child patch in #70832)

I just realized that we call this recursively on the tied operand, so we would produce:
@A = @A + ...

Yes! Essentially if the output is tied to the input, then @a = @a + ... is produced. If it's not tied, then you'd get a = @a + ... which is maybe what you want (for an input or output, but not an in/out).


It's still possible that I'm completely misunderstanding something about tied operands; I hope the above cleared up some questions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR Sounds fine to me, we're not creating new problems.

Agree on all what you're saying but my point is when we replace operands with memory locations, the resulting inline assembly may not be correct or put differently, I don't see where we check that the resulting asm is correct.
I actually realize now that this is not specific to tied operands.

E.g., consider the following inline asm constraints on x86 (with someop being something that makes sense)

    asm (
        "someop %0, %1, %2"
        : "=rm"(dst) : "rm"(a1), "rm"(a2));

The resulting code will be fine as long as regalloc spills at most one operand. If it spills two or more, we'll have more than one memory location on someop. Assuming someop is a single x86 operation, I believe this will produce something invalid.

With tied operands, I believe we just make this more likely to happen, but then considering that previously we were just using the m constraint in these cases, we already had this issue anyway.

Therefore, nothing to worry about, I guess this is the responsibility of the developers to set something that make sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resulting code will be fine as long as regalloc spills at most one operand. If it spills two or more, we'll have more than one memory location on someop. Assuming someop is a single x86 operation, I believe this will produce something invalid.

Correct, but for that case, I would say that the inline asm is invalid and should only use m on one or two of the three operands:

     asm (
         "someop %0, %1, %2"
-         : "=rm"(dst) : "rm"(a1), "rm"(a2));
+         : "=rm"(dst) : "r"(a1), "r"(a2));

That's just another instance of "the constraints were wrongly specified, but you didn't notice until there was register pressure (and the compiler was forced to choose m rather than r)." And that's not specific to clang or GCC.


With tied operands, I believe we just make this more likely to happen

A lot of the cases you're thinking of seem to be related to any given single x86 instruction within inline asm likely not supporting more than 1 (or is it 2) memory operands. But for tied operands, you likely have more than one instruction in your inline asm, so that's less likely to be problematic.

asm (
  "mov %1, %r8\n\t"
  "mov %r8, %0"
  : "=rm"(output) : "0"(input));

so if we can't allocate output to a register, and input is tied, the single stack slot chosen for the both of them is not an issue, since they are used in different instructions (which happen to support register or memory for those operands).

My point is that the use of tied operands is likely to be with inline asm w/ more than one instruction, so the limitations of most modern CISC ISAs (is x86 modern...well, relative to other CISC) around how many memory operands they can have I think is not a concern. (If it is, because only one instruction is being used, then it's probably a mistake for the programmer to specify "rm" rather than "r").

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, but for that case, I would say that the inline asm is invalid [...]

Thanks for the confirmation.

Agree on everything you said :).
(And already approved the PR)

// Intentional recursion!
foldInlineAsmMemOperand(MI, TiedTo, FI, TII);
}

// Change the operand from a register to a frame index.
MO.ChangeToFrameIndex(FI, MO.getTargetFlags());

SmallVector<MachineOperand, 4> NewOps;
TII.getFrameIndexOperands(NewOps);
assert(!NewOps.empty() && "getFrameIndexOperands didn't create any operands");
MI->insert(MI->operands_begin() + OpNo + 1, NewOps);

// Change the previous operand to a MemKind InlineAsm::Flag. The second param
// is the per-target number of operands that represent the memory operand
// excluding this one (MD). This includes MO.
InlineAsm::Flag F(InlineAsm::Kind::Mem, NewOps.size() + 1);
F.setMemConstraint(InlineAsm::ConstraintCode::m);
MachineOperand &MD = MI->getOperand(OpNo - 1);
MD.setImm(F);

// Update mayload/maystore metadata.
MachineOperand &ExtraMO = MI->getOperand(InlineAsm::MIOp_ExtraInfo);
if (RI.Reads)
ExtraMO.setImm(ExtraMO.getImm() | InlineAsm::Extra_MayLoad);
if (RI.Writes)
ExtraMO.setImm(ExtraMO.getImm() | InlineAsm::Extra_MayStore);
}

// Returns nullptr if not possible to fold.
static MachineInstr *foldInlineAsmMemOperand(MachineInstr &MI,
ArrayRef<unsigned> Ops, int FI,
const TargetInstrInfo &TII) {
assert(MI.isInlineAsm() && "wrong opcode");
if (Ops.size() > 1)
return nullptr;
unsigned Op = Ops[0];
assert(Op && "should never be first operand");
assert(MI.getOperand(Op).isReg() && "shouldn't be folding non-reg operands");

if (!MI.mayFoldInlineAsmRegOp(Op))
return nullptr;

MachineInstr &NewMI = TII.duplicate(*MI.getParent(), MI.getIterator(), MI);

foldInlineAsmMemOperand(&NewMI, Op, FI, TII);

return &NewMI;
}

MachineInstr *TargetInstrInfo::foldMemoryOperand(MachineInstr &MI,
ArrayRef<unsigned> Ops, int FI,
LiveIntervals *LIS,
Expand Down Expand Up @@ -612,6 +670,8 @@ MachineInstr *TargetInstrInfo::foldMemoryOperand(MachineInstr &MI,
NewMI = foldPatchpoint(MF, MI, Ops, FI, *this);
if (NewMI)
MBB->insert(MI, NewMI);
} else if (MI.isInlineAsm()) {
NewMI = foldInlineAsmMemOperand(MI, Ops, FI, *this);
} else {
// Ask the target to do the actual folding.
NewMI = foldMemoryOperandImpl(MF, MI, Ops, MI, FI, LIS, VRM);
Expand Down Expand Up @@ -683,6 +743,8 @@ MachineInstr *TargetInstrInfo::foldMemoryOperand(MachineInstr &MI,
NewMI = foldPatchpoint(MF, MI, Ops, FrameIndex, *this);
if (NewMI)
NewMI = &*MBB.insert(MI, NewMI);
} else if (MI.isInlineAsm() && isLoadFromStackSlot(LoadMI, FrameIndex)) {
NewMI = foldInlineAsmMemOperand(MI, Ops, FrameIndex, *this);
} else {
// Ask the target to do the actual folding.
NewMI = foldMemoryOperandImpl(MF, MI, Ops, MI, LoadMI, LIS);
Expand Down