Skip to content

[CodeExtractor] Refactor extractCodeRegion, fix alloca emission. #114419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 222 commits into from
Nov 12, 2024

Conversation

Meinersbur
Copy link
Member

@Meinersbur Meinersbur commented Oct 31, 2024

Reorganize the code into phases:

  • Analyze/normalize
  • Create extracted function prototype
  • Generate the new function's implementation
  • Generate call to new function
  • Connect call to original function's CFG

The motivation is #114669 to optionally clone the selected code region into the new function instead of moving it. The current structure made it difficult to add such functionality since there was no obvious place to do so, not made easier by some functions doing more than their name suggests. For instance, constructFunction modifies code outside the constructed function, but also function properties such as setPersonalityFn are derived somewhere else. Another example is emitCallAndSwitchStatement, which despite its name also inserts stores for output parameters.

Many operations also implicitly depend on the order they are applied which this patch tries to reduce. For instance, ExtractedFuncRetVals becomes the list exit blocks which also defines the return value when leaving via that block. It is computed early such that the new function's return instructions and the switch can be generated independently. Also, ExtractedFuncRetVals is combining the lists ExitBlocks and OldTargets which were not always kept consistent with each other or NumExitBlocks. The method recomputeExitBlocks() will update it when necessary.

The coding style partially contradict the current coding standard. For instance some local variable start with lower case letters. I updated some, but not all occurrences to make the diff match at least some lines as unchanged.

The patch D96854 introduced some confusion of function argument indexes this is fixed here as well, hence the patch is not NFC anymore. Tested in modified CodeExtractorTest.cpp. Patch D121061 introduced AllocationBlock, but not all allocas were inserted there.

Efectively includes the following fixes:

  1. ce73b16
  2. 4aaa925
  3. Missing allocas, still unfixed

Originally submitted as https://reviews.llvm.org/D115218

@Meinersbur Meinersbur requested a review from goldsteinn November 5, 2024 20:17
@Meinersbur Meinersbur changed the title [CodeExtractor] Refactor extractCodeRegion, fix parameter index confusion. [CodeExtractor] Refactor extractCodeRegion, alloca emission. Nov 5, 2024
@kiranchandramohan
Copy link
Contributor

Thanks @Meinersbur for this refactoring. Would it be OK to upstream this incrementally in small patches for ease of review? I have taken one small change from this patch and created #115078.
Note: This is only to demonstrate and express my willingness to help.

kiranchandramohan added a commit to kiranchandramohan/llvm-project that referenced this pull request Nov 5, 2024
This is one of the patches split-off from llvm#114419 for
ease of review.

Co-authored-by: Michael Kruse <[email protected]>
@Meinersbur Meinersbur changed the title [CodeExtractor] Refactor extractCodeRegion, alloca emission. [CodeExtractor] Refactor extractCodeRegion, fix alloca emission. Nov 6, 2024
Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for answering my questions. In the absence of anyone else to review this patch, it looks good to me.

Thanks for taking the time to clean up this code and fix the longstanding bugs.

/// When there are 3 or more exit blocks, leaving the extracted function via
/// the first block it returns 0. When leaving via the second entry it
/// returns 1, etc.
SmallVector<BasicBlock *> SwitchCases;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this should be "ExitBlocks"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally named it SwitchCases to indicate that it is not just an unordered list of exist blocks, but the order is used to determine the extracted function's return values and the replacement switch statement. I am not married to the name (could also be ExtractedFunctionReturnValues), but I honestly think that ExitBlocks would be a misleading name.

Despite this, do you still prefer ExitBlocks?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just any name that doesn't imply it's only used by a switch instruction.

@kiranchandramohan
Copy link
Contributor

Thanks @Meinersbur for this refactoring. Would it be OK to upstream this incrementally in small patches for ease of review? I have taken one small change from this patch and created #115078. Note: This is only to demonstrate and express my willingness to help.

In a similiar spirit as #115078 we could trivially refactor applyFirstDebugLoc in an NFC patch. Next we could just correct emitCallAndSwitchStatement to do something like emitReplacerCall. However, I don't know whether this will be a worthwhile exercise. Given that you have an approval, would you prefer to go ahead and submit it? Or would you like me to help here with making smaller incremental refactoring?

@Meinersbur
Copy link
Member Author

@kiranchandramohan I will do refactorings like getSwitchType and applyFirstDebugLoc separately if requested, put consider that patch series significantly increases the churn of splitting, creating, updating, formatting, merging, and testing each individual PR.

In contrast I have no idea how emitReplacerCall could be isolated. The problem with the current state of CodeExtractor is that all the function calls are intermingled such that that emitCallAndSwitchStatement does more and at the same time less than emitReplacerCall does. Those things would need to moved to somewhere else and at the same time code that does things that emitReplacerCall does but not emitCallAndSwitchStatement deleted elsewhere. That is, the order in which things are done are neither consistent with the old order, nor with the refactored order, needing additional intermitted fixes (or intentional preservation of bugs) that would be eventually removed again, increasing the amount of work for reviewers. I am not sure I would even be able to get such an intermediate state working correctly.

Consider that one can review this code like new code by ignoring the previous state. If we are not able to do so we would never be able to add any new feature. Imagine I would add CodeExtractor2.cpp and in a follow-up redirect uses of CodeExtractor to CodeExtractor2.

@kiranchandramohan
Copy link
Contributor

@kiranchandramohan I will do refactorings like getSwitchType and applyFirstDebugLoc separately if requested, put consider that patch series significantly increases the churn of splitting, creating, updating, formatting, merging, and testing each individual PR.

In contrast I have no idea how emitReplacerCall could be isolated. The problem with the current state of CodeExtractor is that all the function calls are intermingled such that that emitCallAndSwitchStatement does more and at the same time less than emitReplacerCall does. Those things would need to moved to somewhere else and at the same time code that does things that emitReplacerCall does but not emitCallAndSwitchStatement deleted elsewhere. That is, the order in which things are done are neither consistent with the old order, nor with the refactored order, needing additional intermitted fixes (or intentional preservation of bugs) that would be eventually removed again, increasing the amount of work for reviewers. I am not sure I would even be able to get such an intermediate state working correctly.

Consider that one can review this code like new code by ignoring the previous state. If we are not able to do so we would never be able to add any new feature. Imagine I would add CodeExtractor2.cpp and in a follow-up redirect uses of CodeExtractor to CodeExtractor2.

I was thinking of the smaller refactoring approach since I thought unfamiliar reviewers would struggle with making sense of the huge change. But since @tblah has gone ahead and made sense of it all, this does not hold.

I agree that going ahead with this patch (as it is) is the best way forward and the pragmatic approach.

…into users/meinersbur/irbuilder-extract-refactor
@Meinersbur Meinersbur changed the base branch from main to users/meinersbur/irbuilder-extract-applyFirstDebugLoc November 7, 2024 18:42
Meinersbur added a commit that referenced this pull request Nov 8, 2024
Base automatically changed from users/meinersbur/irbuilder-extract-applyFirstDebugLoc to main November 8, 2024 14:27
@Meinersbur Meinersbur merged commit f6795e6 into main Nov 12, 2024
5 of 8 checks passed
@Meinersbur Meinersbur deleted the users/meinersbur/irbuilder-extract-refactor branch November 12, 2024 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants