Skip to content

Code Size Reduction: Pass large loadable types by address instead of by value #8909

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

shajrawi
Copy link

radar rdar://problem/28680453

Reduce the size of the text area (code size of the compiled output) by changing the way we generate LLVM IR for loadable types:

We do a SIL level transformation, during IRGen, that, from a bird's eye view, does the following:
For every every function that takes as an input a big type, re-write it to take a pointer to a big type instead.
: $@convention(method) (BigStruct) turns into $@convention(method) (@in_constant BigStruct)

There are a few new SIL pieces to support this:

  • A @in_constant calling convention
  • New retain_value_addr and release_value_addr SIL instructions that take as an input an address, load the value inside it and call retain_value and release_value respectively
  • New outlined functions that do retains, releases and memcpy of large values

Please note:

  • For outlined memcpy, I currently create a function per large type in IRGen (when we encounter the copy_addr instruction) - reusing the value-witness function instead is less than ideal: it (needlessly) takes as a 3rd parameter %swift.type + takes all the parameters as i8 - requiring a bitcast at the caller (v.s. at the callee in my outlined function)
  • Likewise, re-using the retain/releases from the value-witness table is less efficient than a direct call to a smaller function
  • We currently add new alloc_stack instructions and, in some cases, load of the function arguments to make this work. The current implementation, while it has some peephole optimizations for the most painful cases, is still very conservative: we can update this in the future, further reducing the code size
  • In case we had call to a function that takes a $BigType, and now takes $*BigType, there are cases wherein we create new alloc_stack and store instructions in order to call said function. Just like the point made above - the current implementation is very conservative / can be further optimized. this will be done in phases

The radar mentioned in this PR contains a test program called BigTypeProg: This program contains a big struct with some loadable types + a class / need for retain and release. Passes it around to different functions that access the struct.

Under -Onone we have 3X binary size reduction, under -O a 2X binary size reduction. The text area is reduced by around 80% and 60% respectively.

Last but not least: this PR is still a work-in-progress. There aren't nearly enough test-cases either for this huge change nor for the newly added SIL instructions. There are also some minor code cleanups / refactoring that will be done to the new SIL-level transformation. The PR will be updated in the next couple of days to remedy that. I decided to open this PR now to allow people enough time to review it. It is in an almost-final condition and passes all the smoke tests on my machine. @aschwaighofer , @rjmccall , @jckarter : Can you please start taking a look?

@shajrawi shajrawi requested a review from aschwaighofer April 21, 2017 03:05
@eeckstein
Copy link
Contributor

nice work!

Some general comments:
*) You should split the whole big change into several commits. For example: one commit for the mangling extensions, another commit for just adding the new instructions, one for the new pass, etc.
It makes it easier to review.
*) Please add at least part of your detailed PR description also in the commit messages. So that it's visible in a git log.
*) I think this big change deserves a full test (not only a smoke test)

@shajrawi
Copy link
Author

Thanks @eeckstein I just started a new (clean) full test and smoke test. The previous one failed to build, either due to a new change in master or the incremental build process - will investigate tomorrow...
Once I see that everything works I will abandon this PR / branch and spilt it into several commits / PRs per your suggestion.

@slavapestov
Copy link
Contributor

I wonder if long-term we want to do something about these "IRGen SIL passes". Perhaps if SIL type lowering could calculate the sizes of things, they wouldn't need IRGen to be around and we could simplify things by keeping such indirect types as address-only all the way through the SIL pipeline.

It doesn't seem like a big stretch to refactor IRGen type lowering to not depend on most of the rest of IRGen, and move it into a separate library that SIL can use.

But you're also working on severely curtailing SILs use of address only types, so you'll probably have better ideas.

@aschwaighofer
Copy link
Contributor

@slavapestov This would require introducing the notion of target platform into SIL.

The decision whether to be kept address only is currently tied to whether we will pass this type indirectly as per SwiftCC ABI. This decision is currently intimately tied to how the IRGen explosion schema (llvm types) is lowered to the SwiftCC (https://github.com/apple/swift-clang/blob/stable/include/clang/CodeGen/SwiftCallingConv.h, https://github.com/apple/swift/blob/master/lib/IRGen/GenCall.cpp#L299).

You could model this earlier but that would require platform knowledge of type sizes before IRGen lowering which has been mostly avoided as a design goal (beyond the std library knowing the size of Int ...) [my interpretation, John or Joe please correct me if this interpretation is wrong]

There are valid reason why SIL optimization passes might want to convert by-address values into SSA values (SROA, value propagation, all the things that an SSA optimizer likes to do). One could say that the optimizer should do those only on demand but the analysis of whether this is profitable is better expressed on SSA ... (the same reason we are moving to address only types as SSA until the end of the pipeline)

@@ -54,7 +54,7 @@ const uint16_t VERSION_MAJOR = 0;
/// in source control, you should also update the comment to briefly
/// describe what change you made. The content of this comment isn't important;
/// it just ensures a conflict if two people change the module format.
const uint16_t VERSION_MINOR = 335; // Last change: no type in objc method table
const uint16_t VERSION_MINOR = 336; // Last change: no type in objc method table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick flyby - also change the comment

@jckarter
Copy link
Contributor

jckarter commented Apr 21, 2017

@slavapestov It makes sense to me to have a transitional phase where we prepare SIL for lowering to IR, since the alternatives are either to expose more low-level concerns in SIL or to make IRGen more complex. Most of the optimizer team's work this year has been toward making normal SIL higher-level, if anything.

@jrose-apple
Copy link
Contributor

@gottesmm and I have talked idly about adding a "lowered" SIL phase in addition to "raw" and "canonical".

@shajrawi shajrawi force-pushed the loadable_pass_by_addr branch 4 times, most recently from ebd9688 to 521087f Compare April 22, 2017 06:39
@shajrawi shajrawi force-pushed the loadable_pass_by_addr branch 17 times, most recently from 2529296 to 9e8cf66 Compare April 27, 2017 00:04
@shajrawi shajrawi force-pushed the loadable_pass_by_addr branch from 535c503 to fe0fd57 Compare April 28, 2017 02:21
@atrick
Copy link
Contributor

atrick commented Apr 28, 2017

@slavapestov @jrose-apple The "lowered" SIL stage was the plan of record last year. It's in tree as of my commit in January.

@atrick
Copy link
Contributor

atrick commented Apr 28, 2017

Incidentally, a "lowered" SIL stage is much more problematic than raw/canonical stages because the representation of address-only is incompatible in both directions (in addition to only one side having platform knowledge).

@shajrawi shajrawi force-pushed the loadable_pass_by_addr branch 17 times, most recently from 5139657 to 4b1e7b6 Compare April 30, 2017 02:12
@shajrawi shajrawi force-pushed the loadable_pass_by_addr branch from 4b1e7b6 to d7a7530 Compare April 30, 2017 03:04
@shajrawi
Copy link
Author

@swift-ci Please smoke test

@shajrawi
Copy link
Author

@swift-ci Please test

@shajrawi
Copy link
Author

@swift-ci Please Test Source Compatibility

1 similar comment
@shahmishal
Copy link
Member

@swift-ci Please Test Source Compatibility

@shajrawi
Copy link
Author

shajrawi commented May 1, 2017

Merged newer PR - closing this one

@shajrawi shajrawi closed this May 1, 2017
@shajrawi shajrawi deleted the loadable_pass_by_addr branch May 5, 2017 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants