Skip to content

[WIP][X86] combineX86ShufflesRecursively - attempt to combine shuffles with larger types from EXTRACT_SUBVECTOR nodes #133947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

RKSimon
Copy link
Collaborator

@RKSimon RKSimon commented Apr 1, 2025

This replaces the rather limited combineX86ShuffleChainWithExtract function with handling for EXTRACT_SUBVECTOR node as we recurse down the shuffle chain, widening the shuffle mask to accomodate the larger value type.

This will mainly help AVX2/AVX512 cases with cross-lane shuffles, but it also helps collapse some cases where the same subvector has gotten reused in multiple lanes.

Exposed missing DemandedElts handling inside ISD::TRUNCATE nodes for ComputeNumSignBits

Fixes #143158

@RKSimon RKSimon force-pushed the x86-combine-shuffles-extract branch 2 times, most recently from cac4c3b to b54089c Compare April 2, 2025 09:09
Copy link

github-actions bot commented Apr 2, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@RKSimon RKSimon force-pushed the x86-combine-shuffles-extract branch from b54089c to ed7b501 Compare April 2, 2025 09:23
RKSimon added a commit to RKSimon/llvm-project that referenced this pull request Apr 2, 2025
…/dst sizes

Cleanup work for llvm#133947 - we need to handle VTRUNC nodes with large source vectors directly to allow us to widen the size of the shuffle combine

We currently discard these results in combineX86ShufflesRecursively anyhow as we don't allow inputs from getTargetShuffleInputs to be larger than the shuffle value type
RKSimon added a commit that referenced this pull request Apr 3, 2025
…/dst sizes (#134161)

Cleanup work for #133947 - we need to handle VTRUNC nodes with large
source vectors directly to allow us to widen the size of the shuffle
combine

We currently discard these results in combineX86ShufflesRecursively
anyhow as we don't allow inputs from getTargetShuffleInputs to be larger
than the shuffle value type
RKSimon added a commit to RKSimon/llvm-project that referenced this pull request Apr 3, 2025
…TOR(SRC1)) matching behind common one use bitcast checks

No need to ignore one use checks for the INSERT_SUBVECTOR(SRC0, EXTRACT_SUBVECTOR(SRC1)) fold

Noticed while working on the llvm#133947 regressions
RKSimon added a commit that referenced this pull request Apr 3, 2025
…TOR(SRC1)) matching behind common one use bitcast checks (#134227)

No need to ignore one use checks for the INSERT_SUBVECTOR(SRC0, EXTRACT_SUBVECTOR(SRC1)) fold

Noticed while working on the #133947 regressions
RKSimon added a commit to RKSimon/llvm-project that referenced this pull request Apr 3, 2025
RKSimon added a commit that referenced this pull request Apr 3, 2025
@RKSimon RKSimon force-pushed the x86-combine-shuffles-extract branch 4 times, most recently from d575414 to db33ee9 Compare April 7, 2025 10:10
RKSimon added a commit that referenced this pull request Apr 7, 2025
…ffle operands. NFC.

Merge loops to peek through free insert_subvector / bitcasts / extract_subvector.

To keep this NFC I haven't reordered the peek throughs - this will done in a future patch to help with #133947 regressions
@RKSimon RKSimon force-pushed the x86-combine-shuffles-extract branch from db33ee9 to a71e9fb Compare April 8, 2025 16:53
RKSimon added a commit to RKSimon/llvm-project that referenced this pull request Apr 9, 2025
…patterns should return bitcasted source values

Noticed while investigating llvm#133947 regressions - if we peek through bitcasts we can lose track of oneuse/combined nodes

Same current codegen as combineX86ShufflesRecursively still peeks through the bitcasts itself, but we will soon handle this consistently as another part of llvm#133947
RKSimon added a commit that referenced this pull request Apr 9, 2025
…patterns should return bitcasted source values (#134993)

Noticed while investigating #133947 regressions - if we peek through
bitcasts we can lose track of oneuse/combined nodes in shuffle combining

Currently the same codegen as combineX86ShufflesRecursively still peeks
through the bitcasts itself, but we will soon handle this consistently
as another part of #133947
AllinLeeYL pushed a commit to AllinLeeYL/llvm-project that referenced this pull request Apr 10, 2025
…patterns should return bitcasted source values (llvm#134993)

Noticed while investigating llvm#133947 regressions - if we peek through
bitcasts we can lose track of oneuse/combined nodes in shuffle combining

Currently the same codegen as combineX86ShufflesRecursively still peeks
through the bitcasts itself, but we will soon handle this consistently
as another part of llvm#133947
@RKSimon RKSimon force-pushed the x86-combine-shuffles-extract branch from a71e9fb to 96bdbb4 Compare April 10, 2025 17:00
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
…patterns should return bitcasted source values (llvm#134993)

Noticed while investigating llvm#133947 regressions - if we peek through
bitcasts we can lose track of oneuse/combined nodes in shuffle combining

Currently the same codegen as combineX86ShufflesRecursively still peeks
through the bitcasts itself, but we will soon handle this consistently
as another part of llvm#133947
@RKSimon RKSimon force-pushed the x86-combine-shuffles-extract branch 3 times, most recently from 3139775 to eec9df5 Compare April 28, 2025 13:54
@RKSimon RKSimon force-pushed the x86-combine-shuffles-extract branch from eec9df5 to f0dd9c3 Compare May 15, 2025 12:48
@RKSimon RKSimon force-pushed the x86-combine-shuffles-extract branch 3 times, most recently from 4b633b8 to a1ab605 Compare June 6, 2025 09:27
@RKSimon RKSimon force-pushed the x86-combine-shuffles-extract branch from a1ab605 to 1d033a2 Compare June 11, 2025 12:29
@RKSimon RKSimon force-pushed the x86-combine-shuffles-extract branch from 1d033a2 to 93bc1c2 Compare June 23, 2025 18:02
…s with larger types from EXTRACT_SUBVECTOR nodes

This replaces the rather limited combineX86ShuffleChainWithExtract function with handling for EXTRACT_SUBVECTOR node as we recurse down the shuffle chain, widening the shuffle mask to accommodate the larger value type.

This will mainly help AVX2/AVX512 cases with cross-lane shuffles, but it also helps collapse some cases where the same subvector has gotten reused in multiple lanes.

Exposed missing DemandedElts handling inside ISD::TRUNCATE nodes for ComputeNumSignBits
@RKSimon RKSimon force-pushed the x86-combine-shuffles-extract branch from 93bc1c2 to 5867b13 Compare June 23, 2025 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[X86] Merge combineX86ShuffleChainWithExtract into combineX86ShufflesRecursively
1 participant