[RFC][MC] Cache MCRegAliasIterator #93510

Pierre-vh · 2024-05-28T07:31:51Z

AMDGPU has a lot of registers, almost 9000. Many of those registers have aliases. For instance, SGPR0 has a ton of aliases due to the presence of register tuples. It's even worse if you query the aliases of a register tuple itself. A large register tuple can have hundreds of aliases because it may include 16 registers, and each of those registers have their own tuples as well.

The current implementation of MCRegAliasIterator is not good at this. In some extreme cases it can iterate (IIRC), 7000 more times than necessary, just giving duplicates over and over again and using a lot of expensive iterators.

This patch implements a cache system for MCRegAliasIterator. It does the expensive part only once and then saves it for us so the next iterations on that register's aliases are just a map lookup.

Furthermore, the cached data is uniqued (and sorted). Thus, this speeds up code by both speeding up the iterator itself, but also by minimizing the number of loop iterations users of the iterator do.

I think this is beneficial for AMDGPU (and some quick profiling showed a few seconds gained on a +-3 minute test build), but may introduce some overhead on smaller targets where this iterator was alreeady cheap. To avoid that, we might consider disabling the cache on targets with a small amount of registers. We could pretty much restrict this cache to AMDGPU only by only enabling it if NumReg is above a certain threshold, like 1000.

I still need to do memory usage measurements, but I think this cache should take at most 10MB on AMDGPU if we were to query the aliases of all registers (which isn't common I think). I don't think this is a concern but if it is, we can always find ways to alleviate this, for instance by only caching after a certain number of hits or by clearing the cache at regular intervals in the backend.

AMDGPU has a lot of registers, almost 9000. Many of those registers have aliases. For instance, SGPR0 has a ton of aliases due to the presence of register tuples. It's even worse if you query the aliases of a register tuple itself. A large register tuple can have hundreds of aliases because it may include 16 registers, and each of those registers have their own tuples as well. The current implementation of MCRegAliasIterator is not good at this. In some extreme cases it can iterate (IIRC), 7000 more times than necessary, just giving duplicates over and over again and using a lot of expensive iterators. This patch implements a cache system for MCRegAliasIterator. It does the expensive part only once and then saves it for us so the next iterations on that register's aliases are just a map lookup. Furthermore, the cached data is uniqued (and sorted). Thus, this speeds up code by both speeding up the iterator itself, but also by minimizing the number of loop iterations users of the iterator do. I think this is beneficial for AMDGPU (and some quick profiling showed a few seconds gained on a +-3 minute test build), but may introduce some overhead on smaller targets where this iterator was alreeady cheap. To avoid that, we might consider disabling the cache on targets with a small amount of registers. We could pretty much restrict this cache to AMDGPU only by only enabling it if NumReg is above a certain threshold, like 1000. I still need to do memory usage measurements, but I think this cache should take at most 10MB on AMDGPU if we were to query the aliases of all registers (which isn't common I think). I don't think this is a concern but if it is, we can always find ways to alleviate this, for instance by only caching after a certain number of hits or by clearing the cache at regular intervals in the backend.

llvmbot · 2024-05-28T07:32:26Z

@llvm/pr-subscribers-llvm-regalloc
@llvm/pr-subscribers-backend-arm

@llvm/pr-subscribers-mc

Author: Pierre van Houtryve (Pierre-vh)

Changes

AMDGPU has a lot of registers, almost 9000. Many of those registers have aliases. For instance, SGPR0 has a ton of aliases due to the presence of register tuples. It's even worse if you query the aliases of a register tuple itself. A large register tuple can have hundreds of aliases because it may include 16 registers, and each of those registers have their own tuples as well.

The current implementation of MCRegAliasIterator is not good at this. In some extreme cases it can iterate (IIRC), 7000 more times than necessary, just giving duplicates over and over again and using a lot of expensive iterators.

This patch implements a cache system for MCRegAliasIterator. It does the expensive part only once and then saves it for us so the next iterations on that register's aliases are just a map lookup.

Furthermore, the cached data is uniqued (and sorted). Thus, this speeds up code by both speeding up the iterator itself, but also by minimizing the number of loop iterations users of the iterator do.

I think this is beneficial for AMDGPU (and some quick profiling showed a few seconds gained on a +-3 minute test build), but may introduce some overhead on smaller targets where this iterator was alreeady cheap. To avoid that, we might consider disabling the cache on targets with a small amount of registers. We could pretty much restrict this cache to AMDGPU only by only enabling it if NumReg is above a certain threshold, like 1000.

I still need to do memory usage measurements, but I think this cache should take at most 10MB on AMDGPU if we were to query the aliases of all registers (which isn't common I think). I don't think this is a concern but if it is, we can always find ways to alleviate this, for instance by only caching after a certain number of hits or by clearing the cache at regular intervals in the backend.

Full diff: https://github.com/llvm/llvm-project/pull/93510.diff

3 Files Affected:

(modified) llvm/include/llvm/MC/MCRegisterInfo.h (+90-24)
(modified) llvm/lib/MC/MCRegisterInfo.cpp (+20)
(modified) llvm/test/CodeGen/ARM/constant-island-movwt.mir (+7-7)

diff --git a/llvm/include/llvm/MC/MCRegisterInfo.h b/llvm/include/llvm/MC/MCRegisterInfo.h
index af5be9186108a..79df17d33e420 100644
--- a/llvm/include/llvm/MC/MCRegisterInfo.h
+++ b/llvm/include/llvm/MC/MCRegisterInfo.h
@@ -187,6 +187,9 @@ class MCRegisterInfo {
   DenseMap<MCRegister, int> L2SEHRegs;        // LLVM to SEH regs mapping
   DenseMap<MCRegister, int> L2CVRegs;         // LLVM to CV regs mapping
 
+  mutable DenseMap<MCPhysReg, std::vector<MCPhysReg>> RegAliasesCache;
+  ArrayRef<MCPhysReg> getCachedAliasesOf(MCPhysReg R) const;
+
   /// Iterator class that can traverse the differentially encoded values in
   /// DiffLists. Don't use this class directly, use one of the adaptors below.
   class DiffListIterator
@@ -263,6 +266,7 @@ class MCRegisterInfo {
   friend class MCRegUnitIterator;
   friend class MCRegUnitMaskIterator;
   friend class MCRegUnitRootIterator;
+  friend class MCRegAliasIterator;
 
   /// Initialize MCRegisterInfo, called by TableGen
   /// auto-generated routines. *DO NOT USE*.
@@ -726,60 +730,122 @@ class MCRegUnitRootIterator {
 /// MCRegAliasIterator enumerates all registers aliasing Reg.  If IncludeSelf is
 /// set, Reg itself is included in the list.  This iterator does not guarantee
 /// any ordering or that entries are unique.
+///
+/// This iterator can work in two modes: cached and uncached.
+///
+/// In Uncached mode, this uses RegUnit/RegUnitRoot/SuperReg iterators to
+/// find all aliases. This is very expensive if the target (such as AMDGPU)
+/// has a lot of register and register aliases. It can also cause us
+/// to iterate the same register many times over.
+///
+/// In cached mode, the iterator requests MCRegisterInfo for a cache.
+/// MCRegisterInfo then returns a cached list of sorted, uniqued
+/// aliases for that register. This allows the iterator to be faster
+/// past the first request, but also to iterate much less in some
+/// cases, giving a slight speedup to any code that's using it.
 class MCRegAliasIterator {
 private:
   MCRegister Reg;
   const MCRegisterInfo *MCRI;
-  bool IncludeSelf;
+  bool IncludeSelf : 1;
+  bool IsCached : 1;
+
+  /// TODO: Should we dynamically allocate this? If we don't make caching
+  /// specific for each target, we probably should in order to keep the size of
+  /// this object under control. RegAliasIterator is currently 80 bytes.
+  struct UncachedData {
+    MCRegUnitIterator RI;
+    MCRegUnitRootIterator RRI;
+    MCSuperRegIterator SI;
+  };
 
-  MCRegUnitIterator RI;
-  MCRegUnitRootIterator RRI;
-  MCSuperRegIterator SI;
+  struct CachedData {
+    ArrayRef<MCPhysReg> Aliases;
+    // Index into Aliases. The actual index is (Idx - IncludeSelf) because, when
+    // IncludeSelf is used, we use zero to represent self.
+    unsigned Idx;
+  };
+
+  union {
+    UncachedData Iters;
+    CachedData Cache;
+  };
 
 public:
   MCRegAliasIterator(MCRegister Reg, const MCRegisterInfo *MCRI,
-                     bool IncludeSelf)
-    : Reg(Reg), MCRI(MCRI), IncludeSelf(IncludeSelf) {
+                     bool IncludeSelf, bool UseCache = true)
+      : Reg(Reg), MCRI(MCRI), IncludeSelf(IncludeSelf) {
+
+    IsCached = UseCache;
+    if (IsCached) {
+      Cache.Aliases = MCRI->getCachedAliasesOf(Reg);
+      Cache.Idx = 0;
+      return;
+    }
+
     // Initialize the iterators.
-    for (RI = MCRegUnitIterator(Reg, MCRI); RI.isValid(); ++RI) {
-      for (RRI = MCRegUnitRootIterator(*RI, MCRI); RRI.isValid(); ++RRI) {
-        for (SI = MCSuperRegIterator(*RRI, MCRI, true); SI.isValid(); ++SI) {
-          if (!(!IncludeSelf && Reg == *SI))
+    for (Iters.RI = MCRegUnitIterator(Reg, MCRI); Iters.RI.isValid();
+         ++Iters.RI) {
+      for (Iters.RRI = MCRegUnitRootIterator(*Iters.RI, MCRI);
+           Iters.RRI.isValid(); ++Iters.RRI) {
+        for (Iters.SI = MCSuperRegIterator(*Iters.RRI, MCRI, true);
+             Iters.SI.isValid(); ++Iters.SI) {
+          if (!(!IncludeSelf && Reg == *Iters.SI))
             return;
         }
       }
     }
   }
 
-  bool isValid() const { return RI.isValid(); }
+  bool isValid() const {
+    return IsCached ? (Cache.Idx < (Cache.Aliases.size() + IncludeSelf))
+                    : Iters.RI.isValid();
+  }
 
   MCRegister operator*() const {
-    assert(SI.isValid() && "Cannot dereference an invalid iterator.");
-    return *SI;
+    if (IsCached) {
+      if (IncludeSelf && (Cache.Idx == 0))
+        return Reg;
+      return Cache.Aliases[Cache.Idx - IncludeSelf];
+    }
+
+    assert(Iters.SI.isValid() && "Cannot dereference an invalid iterator.");
+    return *Iters.SI;
   }
 
   void advance() {
+    if (IsCached) {
+      ++Cache.Idx;
+      return;
+    }
+
     // Assuming SI is valid.
-    ++SI;
-    if (SI.isValid()) return;
+    ++Iters.SI;
+    if (Iters.SI.isValid())
+      return;
 
-    ++RRI;
-    if (RRI.isValid()) {
-      SI = MCSuperRegIterator(*RRI, MCRI, true);
+    ++Iters.RRI;
+    if (Iters.RRI.isValid()) {
+      Iters.SI = MCSuperRegIterator(*Iters.RRI, MCRI, true);
       return;
     }
 
-    ++RI;
-    if (RI.isValid()) {
-      RRI = MCRegUnitRootIterator(*RI, MCRI);
-      SI = MCSuperRegIterator(*RRI, MCRI, true);
+    ++Iters.RI;
+    if (Iters.RI.isValid()) {
+      Iters.RRI = MCRegUnitRootIterator(*Iters.RI, MCRI);
+      Iters.SI = MCSuperRegIterator(*Iters.RRI, MCRI, true);
     }
   }
 
   MCRegAliasIterator &operator++() {
     assert(isValid() && "Cannot move off the end of the list.");
-    do advance();
-    while (!IncludeSelf && isValid() && *SI == Reg);
+    if (IsCached)
+      advance();
+    else {
+      do
+        advance();
+      while (!IncludeSelf && isValid() && *Iters.SI == Reg);
+    }
     return *this;
   }
 };
diff --git a/llvm/lib/MC/MCRegisterInfo.cpp b/llvm/lib/MC/MCRegisterInfo.cpp
index 334655616d8db..4ce9eb40547be 100644
--- a/llvm/lib/MC/MCRegisterInfo.cpp
+++ b/llvm/lib/MC/MCRegisterInfo.cpp
@@ -20,6 +20,26 @@
 
 using namespace llvm;
 
+ArrayRef<MCPhysReg> MCRegisterInfo::getCachedAliasesOf(MCPhysReg R) const {
+  if (auto It = RegAliasesCache.find(R); It != RegAliasesCache.end())
+    return It->second;
+
+  // TODO: Should we have a DenseSet instead & then convert it
+  // to vector? Or even a BitVector that's then converted to a normal
+  // MCPhysReg vector?
+  auto &Aliases = RegAliasesCache[R];
+  for (MCRegAliasIterator It(R, this, /*IncludeSelf=*/false,
+                             /*UseCache=*/false);
+       It.isValid(); ++It) {
+    Aliases.push_back(*It);
+  }
+
+  llvm::sort(Aliases);
+  Aliases.erase(unique(Aliases), Aliases.end());
+  Aliases.shrink_to_fit();
+  return Aliases;
+}
+
 MCRegister
 MCRegisterInfo::getMatchingSuperReg(MCRegister Reg, unsigned SubIdx,
                                     const MCRegisterClass *RC) const {
diff --git a/llvm/test/CodeGen/ARM/constant-island-movwt.mir b/llvm/test/CodeGen/ARM/constant-island-movwt.mir
index 7d21a4e4875c3..16d2c478cf7ee 100644
--- a/llvm/test/CodeGen/ARM/constant-island-movwt.mir
+++ b/llvm/test/CodeGen/ARM/constant-island-movwt.mir
@@ -898,13 +898,13 @@ body:             |
 # CHECK-NEXT:    CONSTPOOL_ENTRY 1, %const.0, 4
 # CHECK-NEXT: {{^  $}}
 # CHECK-NEXT:   bb.2.entry (align 2):
-# CHECK-NEXT:   liveins: $d13, $s27, $r10, $r9, $r8, $s26, $d12, $s25, $s24,
-# CHECK-SAME:            $d15, $s30, $s31, $d14, $s28, $s29, $lr, $r0, $d21,
-# CHECK-SAME:            $r3, $q10, $d20, $d17, $r2, $d25, $q11, $d22, $d23,
-# CHECK-SAME:            $r1, $q8, $d16, $s3, $q14, $d28, $d29, $d19, $s17,
-# CHECK-SAME:            $d8, $s16, $r6, $r7, $r4, $q12, $q9, $d18, $s0, $q15,
-# CHECK-SAME:            $d30, $d31, $r12, $s1, $d0, $d24, $s2, $d1, $q0, $s6,
-# CHECK-SAME:            $d3, $d2, $s4, $q1, $s7, $s5, $d9, $s18, $s19, $q4
+# CHECK-NEXT:   liveins: $d13, $s26, $r10, $r9, $r8, $s27, $d12, $s24, $s25,
+# CHECK-SAME:             $d15, $s30, $s31, $d14, $s28, $s29, $lr, $d21, $r0,
+# CHECK-SAME:             $q10, $r7, $d20, $d17, $r2, $d25, $q11, $d22, $d23,
+# CHECK-SAME:             $s1, $q8, $d16, $s3, $q14, $d28, $d29, $d19, $s16,
+# CHECK-SAME:             $r4, $d8, $r6, $r3, $q12, $s17, $q9, $d18, $s0, $d31,
+# CHECK-SAME:             $q15, $d30, $r1, $d0, $r12, $d24, $s2, $d1, $q0, $s7,
+# CHECK-SAME:             $s4, $d2, $s6, $q1, $s5, $d3, $d9, $s18, $s19, $q4
 # CHECK-NEXT: {{^  $}}
 # CHECK-NEXT:     $r5 = t2MOVi16 target-flags(arm-lo16) @.str.84, 14 /* CC::al */, $noreg
 # CHECK-NEXT:     $r5 = t2MOVTi16 $r5, target-flags(arm-hi16) @.str.84, 14 /* CC::al */, $noreg

arsenm

The real fix should be to switch all uses of MCRegAliasIterator to use MCRegUnitIterator instead. No liveness infrastructure should operate in terms of aliases

Pierre-vh · 2024-05-28T10:46:42Z

The real fix should be to switch all uses of MCRegAliasIterator to use MCRegUnitIterator instead. No liveness infrastructure should operate in terms of aliases

I'm not too familiar with reg units but as far as I understand them, yes, that'd probably make more sense.

However I'm wondering how realistic it is to expect that itt'll happen anytime soon. I can fix one or two cases myself I think but we have about 80 uses in llvm/lib.

How do you feel about caching? Do you think it's a good addition nonetheless, or should we avoid it in favor of (hopefully, someday) fixing all the unnecessary RegAliasIterator uses?

Pierre-vh · 2024-05-28T10:48:26Z

llvm/include/llvm/MC/MCRegisterInfo.h

+/// MCRegisterInfo then returns a cached list of sorted, uniqued
+/// aliases for that register. This allows the iterator to be faster
+/// past the first request, but also to iterate much less in some
+/// cases, giving a slight speedup to any code that's using it.
 class MCRegAliasIterator {


I'm wondering if I should move the old implementation to a MCRegAliasIteratorUncached class, and then make MCRegAliasIterator the default cached implementation.

It makes things a bit cleaner (and makes the caching system easier to remove someday), but makes it very difficult to add a cache "killswitch" if we feel like one is necessary (like if we don't want caching on all targets).

Pierre-vh · 2024-05-28T10:50:14Z

llvm/include/llvm/MC/MCRegisterInfo.h

@@ -187,6 +187,9 @@ class MCRegisterInfo {
  DenseMap<MCRegister, int> L2SEHRegs;        // LLVM to SEH regs mapping
  DenseMap<MCRegister, int> L2CVRegs;         // LLVM to CV regs mapping

+  mutable DenseMap<MCPhysReg, std::vector<MCPhysReg>> RegAliasesCache;


This could also just be a std::vector<std::unique_ptr<std::vector<MCPhysReg>>> where register IDs are the indexes into the vector.

It'd use more space upfront (for AMDGPU, about 9000*8 bytes) but makes accesses a lot faster, and would be more memory efficient if we end up caching tons of aliases.

What is the total size of this map (worst case) for AMDGPU?

Realistically I don't know if a program could use all registers available. I don't think so.
Theoretically, there are 9000 registers and I'm not sure how much space DenseMap needs, I think it's a lot more than what std::vector would use

Theoretical minimum size of worst case would be 897289722 -> ~154MiB?
The upfront cost of a 70KiB vector version without the risk of multiple megabyte map seems preferable.

If I counted correctly, there will be 1'864'820 registers in all vectors in the map if every register is asked for its aliases. The number doesn't look like horribly large, so I guess caching is a reasonable trade-off.

jayfoad · 2024-05-28T11:05:05Z

I think @perlfu did some work on speeding up MCRegAliasIterator.

I've tried to tackle the same problem by removing uses of MCRegAliasIterator whenever I can.

arsenm · 2024-05-28T11:06:21Z

The real fix should be to switch all uses of MCRegAliasIterator to use MCRegUnitIterator instead. No liveness infrastructure should operate in terms of aliases

I'm not too familiar with reg units but as far as I understand them, yes, that'd probably make more sense.

RegUnits are the fundamental unit of liveness tracking in llvm. All of the generated register enums have to go through the awkward mapping to regunits, things would be better if the register representation looked more like regunit sets.

However I'm wondering how realistic it is to expect that itt'll happen anytime soon. I can fix one or two cases myself I think but we have about 80 uses in llvm/lib.

It's usually not too difficult to convert these uses. I've been suggesting this as an intern project for a while. There's some additional infrastructure changes that would make this easier, like moving block live ins and reserved registers to directly use regunits

How do you feel about caching? Do you think it's a good addition nonetheless, or should we avoid it in favor of (hopefully, someday) fixing all the unnecessary RegAliasIterator uses?

Maybe it's fine for a stopgap

Pierre-vh · 2024-05-28T11:14:48Z

Maybe it's fine for a stopgap

It's what I think too, but as the saying goes, there's nothing as permanent as a temporary solution. I'm not comfortable landing this unless everyone is sure it'd beneficial even if we end up removing most uses of RegAlias iterator at some point.

I personally think it is beneficial anyway because:

If we remove RegAlias iterator as a whole at some point, then this just goes away.
If we can't remove it and there are still some uses left in the hot path, caching is always a good thing.

s-barannikov · 2024-05-28T11:17:12Z

+1 for switching to regunit iterators

There's some additional infrastructure changes that would make this easier, like moving block live ins and reserved registers to directly use regunits

This is what I'd like to see, too. It is not obvious how to deal with e.g. X86 EAX high sub-register though.

qcolombet · 2024-05-28T11:25:29Z

Maybe it's fine for a stopgap

It's what I think too, but as the saying goes, there's nothing as permanent as a temporary solution. I'm not comfortable landing this unless everyone is sure it'd beneficial even if we end up removing most uses of RegAlias iterator at some point.

I personally think it is beneficial anyway because:

If we remove RegAlias iterator as a whole at some point, then this just goes away.

If we can't remove it and there are still some uses left in the hot path, caching is always a good thing.

Agree on everything that was discussed.

Just one comment on the caching (disclaimer I haven't looked at the implementation), how stable is it from run-to-run?

I want to make sure we do not create non-determinism.

Pierre-vh · 2024-05-28T12:21:24Z

Maybe it's fine for a stopgap

It's what I think too, but as the saying goes, there's nothing as permanent as a temporary solution. I'm not comfortable landing this unless everyone is sure it'd beneficial even if we end up removing most uses of RegAlias iterator at some point.
I personally think it is beneficial anyway because:

If we remove RegAlias iterator as a whole at some point, then this just goes away.

If we can't remove it and there are still some uses left in the hot path, caching is always a good thing.

Agree on everything that was discussed.

Just one comment on the caching (disclaimer I haven't looked at the implementation), how stable is it from run-to-run?

I want to make sure we do not create non-determinism.

The current implementation should be perfectly stable, sort + unique is used and register IDs are tablegen-erated so they're consistent.
If we make caching more complex, like say only cache when N hits happen, or above certain thresholds or with frequent cache clears, then it might introduce some non-determinism because the cached/uncached version return registers in a different order (and the uncached version returns them multiple times).

Agree on everything that was discussed.

To clarify, do you agree that caching is valuable anyway (so this diff could land), but we should still make efforts to move away from using register aliases for liveness?

Pierre-vh · 2024-05-30T07:03:13Z

ping

arsenm · 2024-05-31T10:20:35Z

Probably should test this on compile-time-tracker, touching this kind of stuff has been bad for X86 before

llvm/include/llvm/MC/MCRegisterInfo.h

Pierre-vh · 2024-06-12T05:38:17Z

gentle ping

llvm/lib/MC/MCRegisterInfo.cpp

Pierre-vh · 2024-06-14T06:41:08Z

llvm/include/llvm/MC/MCRegisterInfo.h

@@ -187,6 +187,9 @@ class MCRegisterInfo {
  DenseMap<MCRegister, int> L2SEHRegs;        // LLVM to SEH regs mapping
  DenseMap<MCRegister, int> L2CVRegs;         // LLVM to CV regs mapping

+  mutable std::vector<std::vector<MCPhysReg>> RegAliasesCache;


note: I wanted to use a vector of unique_ptr but it makes it too awkward to use. I can't even use reserve in that case, so I didn't do that indirection in the end.

The vector is resized exactly once upon init, and each vector is 3 pointers, so for AMDGPU we have 9000*24 bytes so about 216KB for this map, which is worth it IMO.

You can reduce the size by using SmallVector<, 0> instead of std::vector. (Using SmallVector is generally always preferred in LLVM, even if you don't use the inline storage.)

You could use std::optional in place of unique_ptr.
Sentinel looks fine, too.

@nikic SmallVector is 32 bytes, it has one more pointer for some reason.
Though it means I can use SmallVector<MCPhysReg, 4> and store small alias sets inline, so I think it's worth the size increase.

Ah, but SmallVector doesn't have a shrink_to_fit, so it may end up using quite a bit more memory. I don't think it's worth it then.

@nikic SmallVector is 32 bytes, it has one more pointer for some reason.

That doesn't sound right. SmallVector should be 16 bytes large if the element type is 4 bytes... uh, I just saw that MCPhyReg is only 2 bytes, which increases SmallVector size to 24 bytes. And of course there's no way to disable that heuristic. So yeah, SmallVector won't make things smaller here (but it also shouldn't be 32 bytes either...)

Though it means I can use SmallVector<MCPhysReg, 4> and store small alias sets inline, so I think it's worth the size increase.

But this also means that you allocate those 4 inline elements for all of the registers that are not actually queried. It's probably not worthwhile in this context.

SmallVector should be 16 bytes large if the element type is 4 bytes... uh, I just saw that MCPhyReg is only 2 bytes, which increases SmallVector size to 24 bytes.

Why does it get bigger when the element type gets smaller? Is it worth fixing that?

This is because SmallVector<char> sometimes needs to hold more than 4GB of data. Use of 64-bit sizes should probably be limited to just that case and not also extend to SmallVector<uint16_t>...

Pierre-vh requested review from jayfoad, arsenm, topperc and s-barannikov May 28, 2024 07:31

llvmbot added backend:ARM mc Machine (object) code labels May 28, 2024

arsenm reviewed May 28, 2024

View reviewed changes

Pierre-vh commented May 28, 2024

View reviewed changes

jayfoad requested a review from perlfu May 28, 2024 11:04

arsenm added the llvm:regalloc label May 28, 2024

arsenm requested review from MatzeB and qcolombet May 28, 2024 11:06

nikic reviewed Jun 3, 2024

View reviewed changes

llvm/include/llvm/MC/MCRegisterInfo.h Outdated Show resolved Hide resolved

refactor impl

073e1a9

Pierre-vh requested a review from arsenm June 3, 2024 13:46

Pierre-vh requested a review from nikic June 12, 2024 05:38

s-barannikov reviewed Jun 12, 2024

View reviewed changes

llvm/lib/MC/MCRegisterInfo.cpp Show resolved Hide resolved

s-barannikov reviewed Jun 12, 2024

View reviewed changes

llvm/lib/MC/MCRegisterInfo.cpp Show resolved Hide resolved

s-barannikov reviewed Jun 12, 2024

View reviewed changes

llvm/lib/MC/MCRegisterInfo.cpp Show resolved Hide resolved

Use vector instead of map

d911bab

Pierre-vh commented Jun 14, 2024

View reviewed changes

add assert

703162d

Pierre-vh requested a review from s-barannikov June 14, 2024 06:41

s-barannikov approved these changes Jun 14, 2024

View reviewed changes

nikic approved these changes Jun 14, 2024

View reviewed changes

Pierre-vh merged commit ab0d01a into llvm:main Jun 14, 2024
5 of 6 checks passed

Pierre-vh deleted the cached-regalias branch June 14, 2024 09:20

jpienaar mentioned this pull request Jun 17, 2024

[mlirc] Add missing extern C #95829

Merged

[RFC][MC] Cache MCRegAliasIterator #93510

[RFC][MC] Cache MCRegAliasIterator #93510

Uh oh!

Conversation

Pierre-vh commented May 28, 2024

Uh oh!

llvmbot commented May 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Pierre-vh commented May 28, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jayfoad commented May 28, 2024

Uh oh!

arsenm commented May 28, 2024

Uh oh!

Pierre-vh commented May 28, 2024

Uh oh!

s-barannikov commented May 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qcolombet commented May 28, 2024

Uh oh!

Pierre-vh commented May 28, 2024

Uh oh!

Pierre-vh commented May 30, 2024

Uh oh!

arsenm commented May 31, 2024

Uh oh!

Uh oh!

Pierre-vh commented Jun 12, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

llvmbot commented May 28, 2024 •

edited

Loading

s-barannikov commented May 28, 2024 •

edited

Loading