Skip to content

[SLP] The order of store chains needs to consider the size of the values. #101810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 7, 2024

Conversation

tcwzxx
Copy link
Member

@tcwzxx tcwzxx commented Aug 3, 2024

When store chains have the same value type ID and pointer type ID, they may mix different sizes of values, such as i8 and i64. This can lead to missed vectorization opportunities.

@llvmbot
Copy link
Member

llvmbot commented Aug 3, 2024

@llvm/pr-subscribers-llvm-transforms

Author: tcwzxx (tcwzxx)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/101810.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+8)
  • (added) llvm/test/Transforms/SLPVectorizer/X86/stores_mix_sizes.ll (+34)
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 8d2ce6bad6af7..97fb6798adcfc 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -19125,6 +19125,14 @@ bool SLPVectorizerPass::vectorizeStoreChains(BoUpSLP &R) {
     if (V->getPointerOperandType()->getTypeID() >
         V2->getPointerOperandType()->getTypeID())
       return false;
+    if (V->getValueOperand()->getType()->getScalarSizeInBits() <
+        V2->getValueOperand()->getType()->getScalarSizeInBits()) {
+      return true;
+    }
+    if (V->getValueOperand()->getType()->getScalarSizeInBits() >
+        V2->getValueOperand()->getType()->getScalarSizeInBits()) {
+      return false;
+    }
     // UndefValues are compatible with all other values.
     if (isa<UndefValue>(V->getValueOperand()) ||
         isa<UndefValue>(V2->getValueOperand()))
diff --git a/llvm/test/Transforms/SLPVectorizer/X86/stores_mix_sizes.ll b/llvm/test/Transforms/SLPVectorizer/X86/stores_mix_sizes.ll
new file mode 100644
index 0000000000000..c571372e40d16
--- /dev/null
+++ b/llvm/test/Transforms/SLPVectorizer/X86/stores_mix_sizes.ll
@@ -0,0 +1,34 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S --passes=slp-vectorizer -mtriple=x86_64-unknown-linux < %s | FileCheck %s
+
+define void @test(ptr %p) {
+; CHECK-LABEL: define void @test(
+; CHECK-SAME: ptr [[P:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[IDX1:%.*]] = getelementptr i8, ptr [[P]], i64 1
+; CHECK-NEXT:    [[IDX_64_9:%.*]] = getelementptr i64, ptr [[P]], i64 9
+; CHECK-NEXT:    store i64 1, ptr [[IDX_64_9]], align 8
+; CHECK-NEXT:    store <8 x i8> zeroinitializer, ptr [[IDX1]], align 4
+; CHECK-NEXT:    ret void
+;
+entry:
+  %idx1 = getelementptr i8, ptr %p, i64 1
+  store i8 0, ptr %idx1, align 4
+  %idx.64.9 = getelementptr i64, ptr %p, i64 9
+  store i64 1, ptr %idx.64.9, align 8
+  %idx2 = getelementptr i8, ptr %p, i64 2
+  store i8 0, ptr %idx2, align 4
+  %idx3 = getelementptr i8, ptr %p, i64 3
+  store i8 0, ptr %idx3, align 4
+  %idx4 = getelementptr i8, ptr %p, i64 4
+  store i8 0, ptr %idx4, align 4
+  %idx5 = getelementptr i8, ptr %p, i64 5
+  store i8 0, ptr %idx5, align 4
+  %idx6 = getelementptr i8, ptr %p, i64 6
+  store i8 0, ptr %idx6, align 4
+  %idx7 = getelementptr i8, ptr %p, i64 7
+  store i8 0, ptr %idx7, align 4
+  %idx8 = getelementptr i8, ptr %p, i64 8
+  store i8 0, ptr %idx8, align 4
+  ret void
+}

@alexey-bataev
Copy link
Member

Add a description of the patch

@alexey-bataev
Copy link
Member

Add a test in a separate patch

@tcwzxx
Copy link
Member Author

tcwzxx commented Aug 3, 2024

Add a description of the patch

OK, thanks

@alexey-bataev
Copy link
Member

I don't see test update

@tcwzxx
Copy link
Member Author

tcwzxx commented Aug 5, 2024

I don't see test update

The existing test cases are fine.

There is a new pattern :

; RUN: opt -S --passes=slp-vectorizer -mtriple=x86_64-unknown-linux < %s | FileCheck %s
define void @test(ptr %p) {
entry:
    %idx1 = getelementptr i8, ptr %p, i64 1
    store i8 0, ptr %idx1, align 4
    %idx.64.9 = getelementptr i64, ptr %p, i64 9
    store i64 1, ptr %idx.64.9, align 8
    %idx2 = getelementptr i8, ptr %p, i64 2
    store i8 0, ptr %idx2, align 4
    %idx3 = getelementptr i8, ptr %p, i64 3
    store i8 0, ptr %idx3, align 4
    %idx4 = getelementptr i8, ptr %p, i64 4
    store i8 0, ptr %idx4, align 4
    %idx5 = getelementptr i8, ptr %p, i64 5
    store i8 0, ptr %idx5, align 4
    %idx6 = getelementptr i8, ptr %p, i64 6
    store i8 0, ptr %idx6, align 4
    %idx7 = getelementptr i8, ptr %p, i64 7
    store i8 0, ptr %idx7, align 4
    %idx8 = getelementptr i8, ptr %p, i64 8
    store i8 0, ptr %idx8, align 4
    ret void
}

Before the patch:

define void @test(ptr %p) {
entry:
    %idx1 = getelementptr i8, ptr %p, i64 1
    store i8 0, ptr %idx1, align 4
    %idx.64.9 = getelementptr i64, ptr %p, i64 9
    store i64 1, ptr %idx.64.9, align 8
    %idx2 = getelementptr i8, ptr %p, i64 2
    store <4 x i8> zeroinitializer, ptr %idx2, align 4
    %idx6 = getelementptr i8, ptr %p, i64 6
    store i8 0, ptr %idx6, align 4
    %idx7 = getelementptr i8, ptr %p, i64 7
    store i8 0, ptr %idx7, align 4
    %idx8 = getelementptr i8, ptr %p, i64 8
    store i8 0, ptr %idx8, align 4
    ret void
}

After the patch:

define void @test(ptr %p) {
entry:
    %idx1 = getelementptr i8, ptr %p, i64 1
    %idx.64.9 = getelementptr i64, ptr %p, i64 9
    store i64 1, ptr %idx.64.9, align 8
    store <8 x i8> zeroinitializer, ptr %idx1, align 4
    ret void
}

Add a test in a separate patch

I will commit the above test case in a different PR.

@alexey-bataev
Copy link
Member

I don't see test update

The existing test cases are fine.

There is a new pattern :

; RUN: opt -S --passes=slp-vectorizer -mtriple=x86_64-unknown-linux < %s | FileCheck %s
define void @test(ptr %p) {
entry:
    %idx1 = getelementptr i8, ptr %p, i64 1
    store i8 0, ptr %idx1, align 4
    %idx.64.9 = getelementptr i64, ptr %p, i64 9
    store i64 1, ptr %idx.64.9, align 8
    %idx2 = getelementptr i8, ptr %p, i64 2
    store i8 0, ptr %idx2, align 4
    %idx3 = getelementptr i8, ptr %p, i64 3
    store i8 0, ptr %idx3, align 4
    %idx4 = getelementptr i8, ptr %p, i64 4
    store i8 0, ptr %idx4, align 4
    %idx5 = getelementptr i8, ptr %p, i64 5
    store i8 0, ptr %idx5, align 4
    %idx6 = getelementptr i8, ptr %p, i64 6
    store i8 0, ptr %idx6, align 4
    %idx7 = getelementptr i8, ptr %p, i64 7
    store i8 0, ptr %idx7, align 4
    %idx8 = getelementptr i8, ptr %p, i64 8
    store i8 0, ptr %idx8, align 4
    ret void
}

Before the patch:

define void @test(ptr %p) {
entry:
    %idx1 = getelementptr i8, ptr %p, i64 1
    store i8 0, ptr %idx1, align 4
    %idx.64.9 = getelementptr i64, ptr %p, i64 9
    store i64 1, ptr %idx.64.9, align 8
    %idx2 = getelementptr i8, ptr %p, i64 2
    store <4 x i8> zeroinitializer, ptr %idx2, align 4
    %idx6 = getelementptr i8, ptr %p, i64 6
    store i8 0, ptr %idx6, align 4
    %idx7 = getelementptr i8, ptr %p, i64 7
    store i8 0, ptr %idx7, align 4
    %idx8 = getelementptr i8, ptr %p, i64 8
    store i8 0, ptr %idx8, align 4
    ret void
}

After the patch:

define void @test(ptr %p) {
entry:
    %idx1 = getelementptr i8, ptr %p, i64 1
    %idx.64.9 = getelementptr i64, ptr %p, i64 9
    store i64 1, ptr %idx.64.9, align 8
    store <8 x i8> zeroinitializer, ptr %idx1, align 4
    ret void
}

Add a test in a separate patch

I will commit the above test case in a different PR.

I don't see test update

The existing test cases are fine.

There is a new pattern :

; RUN: opt -S --passes=slp-vectorizer -mtriple=x86_64-unknown-linux < %s | FileCheck %s
define void @test(ptr %p) {
entry:
    %idx1 = getelementptr i8, ptr %p, i64 1
    store i8 0, ptr %idx1, align 4
    %idx.64.9 = getelementptr i64, ptr %p, i64 9
    store i64 1, ptr %idx.64.9, align 8
    %idx2 = getelementptr i8, ptr %p, i64 2
    store i8 0, ptr %idx2, align 4
    %idx3 = getelementptr i8, ptr %p, i64 3
    store i8 0, ptr %idx3, align 4
    %idx4 = getelementptr i8, ptr %p, i64 4
    store i8 0, ptr %idx4, align 4
    %idx5 = getelementptr i8, ptr %p, i64 5
    store i8 0, ptr %idx5, align 4
    %idx6 = getelementptr i8, ptr %p, i64 6
    store i8 0, ptr %idx6, align 4
    %idx7 = getelementptr i8, ptr %p, i64 7
    store i8 0, ptr %idx7, align 4
    %idx8 = getelementptr i8, ptr %p, i64 8
    store i8 0, ptr %idx8, align 4
    ret void
}

Before the patch:

define void @test(ptr %p) {
entry:
    %idx1 = getelementptr i8, ptr %p, i64 1
    store i8 0, ptr %idx1, align 4
    %idx.64.9 = getelementptr i64, ptr %p, i64 9
    store i64 1, ptr %idx.64.9, align 8
    %idx2 = getelementptr i8, ptr %p, i64 2
    store <4 x i8> zeroinitializer, ptr %idx2, align 4
    %idx6 = getelementptr i8, ptr %p, i64 6
    store i8 0, ptr %idx6, align 4
    %idx7 = getelementptr i8, ptr %p, i64 7
    store i8 0, ptr %idx7, align 4
    %idx8 = getelementptr i8, ptr %p, i64 8
    store i8 0, ptr %idx8, align 4
    ret void
}

After the patch:

define void @test(ptr %p) {
entry:
    %idx1 = getelementptr i8, ptr %p, i64 1
    %idx.64.9 = getelementptr i64, ptr %p, i64 9
    store i64 1, ptr %idx.64.9, align 8
    store <8 x i8> zeroinitializer, ptr %idx1, align 4
    ret void
}

Add a test in a separate patch

I will commit the above test case in a different PR.

Ok, fine, but these changes must be reflected in the tests. There are no test changes in this patch

@tcwzxx
Copy link
Member Author

tcwzxx commented Aug 5, 2024

Add a test in a separate patch

Ok, fine, but these changes must be reflected in the tests. There are no test changes in this patch

I feel confused. So, do you mean I should add the above test case in this patch?

The existing test cases do not change after the patch.

@alexey-bataev
Copy link
Member

You need to add the test(s) in the separate NFC patch, commit it, and then show in this patch that it updates/affects these new tests

@tcwzxx
Copy link
Member Author

tcwzxx commented Aug 5, 2024

You need to add the test(s) in the separate NFC patch, commit it, and then show in this patch that it updates/affects these new tests

Thank you for your patience. Is everything OK now?

@@ -19123,6 +19123,14 @@ bool SLPVectorizerPass::vectorizeStoreChains(BoUpSLP &R) {
if (V->getPointerOperandType()->getTypeID() >
V2->getPointerOperandType()->getTypeID())
return false;
if (V->getValueOperand()->getType()->getScalarSizeInBits() <
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you also need to update the checks in AreCompatibleStores lambda

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to compare getTypeID() rather than size here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getTypeID has been compared above, but with the same TypeID, such as IntegerTyID, which can have different sizes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, getTypeID() above compared for pointer operands, which worked for typed pointers. You need to use getTypeID for value operand types

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you miss lines 19114 to 19122, which compare the value operand types?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. Then there is a question, of why it did not work, because it should work correctly for compatible types

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i8 and i64 are not compatible types, but they have the same Type ID, IntegerTyID

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, missed it.

Copy link
Member

@alexey-bataev alexey-bataev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

@tcwzxx tcwzxx merged commit b64ec3c into llvm:main Aug 7, 2024
5 of 7 checks passed
@tcwzxx tcwzxx deleted the slp branch August 17, 2024 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants