Skip to content

[CodeGen][CodeLayout] Fix segfault on access to deleted block in MBP. #142357

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 23, 2025

Conversation

ivafanas
Copy link
Contributor

@ivafanas ivafanas commented Jun 2, 2025

Problem 1: There is a typo which reassigns BlockWorkList to EHPadWorkList on attempt to remove RemBB from work lists.

Problem 2: Chain->UnscheduledPredecessors == 0 is an incorrect way to check whether RemBB is enqueued or not. The root cause is a postponed deletion of WorkList from already scheduled blocks in selectBestCandidateBlock. Bug happens in the following scenario:

  • FunctionChain is being processed with non-zero UnscheduledPredecessors
  • Block B' is added to the BlockWorkList
  • Block B' is chosen as the best successor (selectBestSuccessor) for some another block and added into Chain
  • Block B' is removed by tail duplicator.

RemovalCallback erroneously won't erase B' from BlockWorkList, because UnscheduledPredecessors value of FunctionChain is not zero (and it is allowed to be non-zero).

Proposed solution is to always cleanup worklists on block deletion by tail duplicator.

@ivafanas
Copy link
Contributor Author

ivafanas commented Jun 2, 2025

We have caught the bug on the custom out-of-tree backend in a ~200Kb function of ~70 basic blocks with several loops, switch instructions etc.. Any attempt to reduce the test case results in bug disappearing.

I'm afraid, I'm not able to provide the reproducer / regression test for any in-tree backend in a reasonable time.

@ivafanas
Copy link
Contributor Author

ivafanas commented Jun 2, 2025

@arsenm , @weiguozhi

Could you please review the PR?

@weiguozhi
Copy link
Contributor

RemovalCallback erroneously won't erase B' from BlockWorkList, because UnscheduledPredecessors value of FunctionChain is not zero (and it is allowed to be non-zero).

From function markBlockSuccessors, a BB can be added to BlockWorkList only when UnscheduledPredecessors reaches 0. So how was B added to BlockWorkList in the first place?

@ivafanas
Copy link
Contributor Author

ivafanas commented Jun 2, 2025

From function markBlockSuccessors, a BB can be added to BlockWorkList only when UnscheduledPredecessors reaches 0. So how was B added to BlockWorkList in the first place?

Yes, it is a bit tricky:

  • When block B' is added to the work list, it belongs to the chain with zero UnscheduledPredecessors
  • When block B' is chosen as the best successor, it is merged into FunctionChain inside BlockChain::merge method. FunctionChain has non-zero UnscheduledPredecessor.

Work lists might contain blocks from chains with non-zero UnscheduledPredecessor. It happens for scheduled blocks which should be removed from work lists soon. However, postponed removal comes to play.

@ivafanas
Copy link
Contributor Author

ivafanas commented Jun 3, 2025

Some test in flang failed. Need to be investigated.

@ivafanas ivafanas force-pushed the dev/fix-segfault-in-block-placement branch from 501954d to 034458b Compare June 5, 2025 14:57
@ivafanas
Copy link
Contributor Author

ivafanas commented Jun 5, 2025

PR is ready for review.

Seems like flang test is flaky. I've failed to reproduce the problem locally and it disappeared after rerun.

@arsenm
Copy link
Contributor

arsenm commented Jun 6, 2025

Missing test

@ivafanas
Copy link
Contributor Author

ivafanas commented Jun 6, 2025

Missing test


We have caught the bug on the custom out-of-tree backend in a ~200Kb function of ~70 basic blocks with several loops, switch instructions etc.. Any attempt to reduce the test case results in bug disappearing.

I'm afraid, I'm not able to provide the reproducer / regression test for any in-tree backend in a reasonable time.


If test is a mandatory requirement for this PR, it is ok for me to reject PR.

@arsenm
Copy link
Contributor

arsenm commented Jun 6, 2025

Missing test

We have caught the bug on the custom out-of-tree backend in a ~200Kb function of ~70 basic blocks with several loops, switch instructions etc.. Any attempt to reduce the test case results in bug disappearing.
I'm afraid, I'm not able to provide the reproducer / regression test for any in-tree backend in a reasonable time.

If test is a mandatory requirement for this PR, it is ok for me to reject PR.

Did you try llvm-reduce on it?

@ivafanas
Copy link
Contributor Author

ivafanas commented Jun 6, 2025

Did you try llvm-reduce on it?

Yes. No effect.

I've spent ~1 day in different attempts to reduce the test case =)

@ivafanas
Copy link
Contributor Author

Ping.

Should PR be closed or merged?

@ivafanas
Copy link
Contributor Author

Ping.

It's ok for me to leave the fix in our customer's fork.
If test absence is a key problem, let's reject PR and let it be.

@ivafanas
Copy link
Contributor Author

Ok, I'll close the PR so as no response.

@ivafanas ivafanas closed this Jun 18, 2025
@arsenm arsenm reopened this Jun 18, 2025
@arsenm
Copy link
Contributor

arsenm commented Jun 18, 2025

Ok, I'll close the PR so as no response.

The volume of PRs is impossible to keep up with, you should not expect prompt replies

@arsenm
Copy link
Contributor

arsenm commented Jun 18, 2025

Did you try llvm-reduce on it?

Yes. No effect.

I've spent ~1 day in different attempts to reduce the test case =)

Did you try llvm-reduce on the MIR?

@arsenm arsenm merged commit d0e5d6f into llvm:main Jun 23, 2025
23 checks passed
miguelcsx pushed a commit to miguelcsx/llvm-project that referenced this pull request Jun 23, 2025
…llvm#142357)

Problem 1: There is a typo which reassigns `BlockWorkList` to
`EHPadWorkList` on attempt to remove `RemBB` from work lists.

Problem 2: `Chain->UnscheduledPredecessors == 0` is an incorrect way to
check whether `RemBB` is enqueued or not. The root cause is a postponed
deletion of `WorkList` from already scheduled blocks in
`selectBestCandidateBlock`. Bug happens in the following scenario:

* `FunctionChain` is being processed with non-zero
`UnscheduledPredecessors`
* Block `B'` is added to the `BlockWorkList`
* Block `B'` is chosen as the best successor (`selectBestSuccessor`) for
some another block and added into `Chain`
* Block `B'` is removed by tail duplicator.

`RemovalCallback` erroneously won't erase `B'` from `BlockWorkList`,
because `UnscheduledPredecessors` value of `FunctionChain` is not zero
(and it is allowed to be non-zero).

Proposed solution is to always cleanup worklists on block deletion by
tail duplicator.
Jaddyen pushed a commit to Jaddyen/llvm-project that referenced this pull request Jun 23, 2025
…llvm#142357)

Problem 1: There is a typo which reassigns `BlockWorkList` to
`EHPadWorkList` on attempt to remove `RemBB` from work lists.

Problem 2: `Chain->UnscheduledPredecessors == 0` is an incorrect way to
check whether `RemBB` is enqueued or not. The root cause is a postponed
deletion of `WorkList` from already scheduled blocks in
`selectBestCandidateBlock`. Bug happens in the following scenario:

* `FunctionChain` is being processed with non-zero
`UnscheduledPredecessors`
* Block `B'` is added to the `BlockWorkList`
* Block `B'` is chosen as the best successor (`selectBestSuccessor`) for
some another block and added into `Chain`
* Block `B'` is removed by tail duplicator.

`RemovalCallback` erroneously won't erase `B'` from `BlockWorkList`,
because `UnscheduledPredecessors` value of `FunctionChain` is not zero
(and it is allowed to be non-zero).

Proposed solution is to always cleanup worklists on block deletion by
tail duplicator.
anthonyhatran pushed a commit to anthonyhatran/llvm-project that referenced this pull request Jun 26, 2025
…llvm#142357)

Problem 1: There is a typo which reassigns `BlockWorkList` to
`EHPadWorkList` on attempt to remove `RemBB` from work lists.

Problem 2: `Chain->UnscheduledPredecessors == 0` is an incorrect way to
check whether `RemBB` is enqueued or not. The root cause is a postponed
deletion of `WorkList` from already scheduled blocks in
`selectBestCandidateBlock`. Bug happens in the following scenario:

* `FunctionChain` is being processed with non-zero
`UnscheduledPredecessors`
* Block `B'` is added to the `BlockWorkList`
* Block `B'` is chosen as the best successor (`selectBestSuccessor`) for
some another block and added into `Chain`
* Block `B'` is removed by tail duplicator.

`RemovalCallback` erroneously won't erase `B'` from `BlockWorkList`,
because `UnscheduledPredecessors` value of `FunctionChain` is not zero
(and it is allowed to be non-zero).

Proposed solution is to always cleanup worklists on block deletion by
tail duplicator.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants