Skip to content

⚠️Add a new TempRValue optimization to the CopyForwarding pass. #11350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

gottesmm
Copy link
Contributor

@gottesmm gottesmm commented Aug 4, 2017

This PR continues the work from #11328. I closed the other PR since I can not push to Andy's branch. Andy is on vacation so is unable to make that change now.

atrick and others added 4 commits August 3, 2017 20:41
This is a separate optimization that detects short-lived temporaries that can be
eliminated. This is necessary now that SILGen no longer performs basic RValue
forwarding in some cases.

SR-5508: Performance regression in benchmarks caused by removing SILGen peephole
for LoadExpr in +0 context
…ization.

This is a much more aggressive attempt at removing temporary "rvalues" inside
the CopyForwarding pass. It is now handles basic switch CFGs and does some
memory disambiguation.

This is still insufficient to handle the case that caused the worst benchmark
regressions, which are caused by UnfoldSequence. The problem is, that executes
an escaping closure during the temporary rvalue's lifetime. The only way around
that is to fully analyze both sides of the copy down through all uses to prove
they are completely nonescaping.

Note, the original intention of the CopyForwarding pass was a very quick Onone
peephole that did not require escape analysis or alias analysis.
This version successfully optimizes UnfoldSequence, which is the holy grail.

Unfortunately, there's a problem building the stdlib which seems to be a
use-after-free. Note that we need to remove copied from CopiedDefs when we
eliminate them, but it doesn't seem to be working properly.

Once that's fixed, we need to run the benchmarks.
The problem here is that previously the patch was removing the copy instruction
instead of the underlying alloc_stack from the def map. We track the
alloc_stack, not the def.
@gottesmm
Copy link
Contributor Author

gottesmm commented Aug 4, 2017

The remaining crash can be caused via the following test case. To get it to crash, I needed to run with guard malloc and add -debug. One of the debug statements shows the crash.

sil_stage canonical

import Swift
import Builtin

// specialized Optional.map<A>(_:)
sil shared [serializable] @tmp123 : $@convention(method) (@owned @callee_owned (@in AnyObject) -> (@out _StringBuffer, @error Error), @in_guaranteed Optional<AnyObject>) -> (@owned Optional<_StringBuffer>, @error Error) {
// %0                                             // users: %16, %32, %23, %14, %3
// %1                                             // users: %6, %4
bb0(%0 : @owned $@callee_owned (@in AnyObject) -> (@out _StringBuffer, @error Error), %1 : @trivial $*Optional<AnyObject>):
  %2 = alloc_stack $Optional<_StringBuffer>, var, name "$return_value" // users: %33, %26, %25, %36, %19, %11
  debug_value %0 : $@callee_owned (@in AnyObject) -> (@out _StringBuffer, @error Error), let, name "transform", argno 1 // id: %3
  debug_value_addr %1 : $*Optional<AnyObject>, let, name "self", argno 2 // id: %4
  %5 = alloc_stack $Optional<AnyObject>           // users: %7, %35, %31, %21, %8, %6
  copy_addr %1 to [initialization] %5 : $*Optional<AnyObject> // id: %6
  switch_enum_addr %5 : $*Optional<AnyObject>, case #Optional.some!enumelt.1: bb1, case #Optional.none!enumelt: bb5 // id: %7

bb1:                                              // Preds: bb0
  %8 = unchecked_take_enum_data_addr %5 : $*Optional<AnyObject>, #Optional.some!enumelt.1 // user: %10
  %9 = alloc_stack $AnyObject, let, name "y"      // users: %30, %20, %15, %13, %10
  copy_addr [take] %8 to [initialization] %9 : $*AnyObject // id: %10
  %11 = init_enum_data_addr %2 : $*Optional<_StringBuffer>, #Optional.some!enumelt.1 // user: %16
  %12 = alloc_stack $AnyObject                    // users: %16, %29, %18, %13
  copy_addr %9 to [initialization] %12 : $*AnyObject // id: %13
  %0a = copy_value %0 : $@callee_owned (@in AnyObject) -> (@out _StringBuffer, @error Error)
  destroy_addr %9 : $*AnyObject                   // id: %15
  try_apply %0a(%11, %12) : $@callee_owned (@in AnyObject) -> (@out _StringBuffer, @error Error), normal bb2, error bb4 // id: %16

bb2(%17 : @trivial $()):                          // Preds: bb1
  dealloc_stack %12 : $*AnyObject                 // id: %18
  inject_enum_addr %2 : $*Optional<_StringBuffer>, #Optional.some!enumelt.1 // id: %19
  dealloc_stack %9 : $*AnyObject                  // id: %20
  dealloc_stack %5 : $*Optional<AnyObject>        // id: %21
  br bb3                                          // id: %22

bb3:                                              // Preds: bb5 bb2
  destroy_value %0 : $@callee_owned (@in AnyObject) -> (@out _StringBuffer, @error Error) // id: %23
  %24 = tuple ()
  %25 = load [take] %2 : $*Optional<_StringBuffer>       // user: %27
  dealloc_stack %2 : $*Optional<_StringBuffer>    // id: %26
  return %25 : $Optional<_StringBuffer>           // id: %27

// %28                                            // user: %34
bb4(%28 : @owned $Error):                         // Preds: bb1
  dealloc_stack %12 : $*AnyObject                 // id: %29
  dealloc_stack %9 : $*AnyObject                  // id: %30
  dealloc_stack %5 : $*Optional<AnyObject>        // id: %31
  destroy_value %0 : $@callee_owned (@in AnyObject) -> (@out _StringBuffer, @error Error) // id: %32
  dealloc_stack %2 : $*Optional<_StringBuffer>    // id: %33
  throw %28 : $Error                              // id: %34

bb5:                                              // Preds: bb0
  dealloc_stack %5 : $*Optional<AnyObject>        // id: %35
  inject_enum_addr %2 : $*Optional<_StringBuffer>, #Optional.none!enumelt // id: %36
  br bb3                                          // id: %37
} // end sil function '_T0Sq3mapqd__Sgqd__xKcKlFyXl_s13_StringBufferVTgq5'

@gottesmm
Copy link
Contributor Author

gottesmm commented Aug 4, 2017

Given that test case, I can get a ToT compiler to crash. This makes me think that with the changes in this PR that we are hitting an already existing copy forwarding correctness bug that is unrelated to Andy's change.

@gottesmm
Copy link
Contributor Author

gottesmm commented Aug 4, 2017

Smaller test case:

sil_stage canonical

import Swift
import Builtin

sil @user : $@convention(thin) (@in AnyObject) -> ()

sil @tmp223 : $@convention(thin) (@in Optional<AnyObject>) -> () {
bb0(%0 : @trivial $*Optional<AnyObject>):
  %1 = unchecked_take_enum_data_addr %0 : $*Optional<AnyObject>, #Optional.some!enumelt.1
  %2 = alloc_stack $AnyObject
  copy_addr [take] %1 to [initialization] %2 : $*AnyObject

  %12 = alloc_stack $AnyObject
  copy_addr %2 to [initialization] %12 : $*AnyObject
  destroy_addr %2 : $*AnyObject
  %13 = function_ref @user : $@convention(thin) (@in AnyObject) -> ()
  apply %13(%12) : $@convention(thin) (@in AnyObject) -> ()
  dealloc_stack %12 : $*AnyObject
  dealloc_stack %2 : $*AnyObject

  %9999 = tuple()
  return %9999 : $()
}

@gottesmm
Copy link
Contributor Author

gottesmm commented Aug 4, 2017

We blow up while processing %2.

…ating it. Otherwise when we are emitting debug statements via -debug, we get a use after free.
@gottesmm
Copy link
Contributor Author

gottesmm commented Aug 4, 2017

The problem here is that CopyForwarding without Andy's patch tries to log that it is removing an instruction, after it has already erased the instruction. Easy 1 line fix that is unrelated to Andy's patch. Building again.

@gottesmm
Copy link
Contributor Author

gottesmm commented Aug 4, 2017

Not sure why my build is doing a rebuild. While I am waiting, grabbing food real quick.

@gottesmm
Copy link
Contributor Author

gottesmm commented Aug 4, 2017

This is the cause of the other crasher:

// DoubleWidth.high.getter
sil [serialized] @_T0s11DoubleWidthV4highxfg : $@convention(method) <Base where Base : FixedWidthInteger, Base.Words : Collection, Base.Magnitude.Words : Collection> (@in_guaranteed DoubleWidth<Base>) -> @out Base {
// %0                                             // user: %7
// %1                                             // users: %3, %2
bb0(%0 : @trivial $*Base, %1 : @trivial $*DoubleWidth<Base>):
  debug_value_addr %1 : $*DoubleWidth<Base>, let, name "self", argno 1 // id: %2
  %3 = struct_element_addr %1 : $*DoubleWidth<Base>, #DoubleWidth._storage // user: %5
  %4 = alloc_stack $Base                          // users: %8, %7, %6
  %5 = tuple_element_addr %3 : $*(high: Base, low: Base.Magnitude), 0 // user: %6
  copy_addr %5 to [initialization] %4 : $*Base    // id: %6
  copy_addr [take] %4 to [initialization] %0 : $*Base // id: %7
  dealloc_stack %4 : $*Base                       // id: %8
  %9 = tuple ()                                   // user: %10
  return %9 : $()                                 // id: %10
} // end sil function '_T0s11DoubleWidthV4highxfg'

@gottesmm
Copy link
Contributor Author

gottesmm commented Aug 4, 2017

Like I thought the NRVO input values have been modified by eliminating the temporary. This makes the separate pass (or at least finding the copies in a separate loop rather than doing one large computation a safer change).

*** Optimizing the module (Pass List Pipeline) *** 
Copy Forwarding in Func _T0s11DoubleWidthV4highxfg
  Skipping Def:   %5 = tuple_element_addr %3 : $*(high: Base, low: Base.Magnitude), 0 // user: %6
    not an argument or local var!
Checking read only temp  copy_addr %5 to [initialization] %4 : $*Base    // id: %6
Eliminating temp RValue copy  copy_addr %5 to [initialization] %4 : $*Base    // id: %6
NRVO eliminates copy  copy_addr %4 to [initialization] %0 : $*Base    // id: %5

Assertion failed: (isa<X>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file /Volumes/Files/work/solon/llvm/include/llvm/Support/Casting.h, line 241.
0  sil-opt                  0x0000000105114de8 llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 40
1  sil-opt                  0x0000000105113da6 llvm::sys::RunSignalHandlers() + 86
2  sil-opt                  0x00000001051153a9 SignalHandler(int) + 361
3  libsystem_platform.dylib 0x00007fffb54bfb3a _sigtramp + 26
4  libsystem_platform.dylib 000000000000000000 _sigtramp + 1253311712
5  libsystem_c.dylib        0x00007fffb5344420 abort + 129
6  libsystem_c.dylib        0x00007fffb530b893 basename_r + 0
7  sil-opt                  0x00000001018ec304 std::__1::enable_if<!(is_simple_type<swift::SILValue>::value), llvm::cast_retty<swift::AllocStackInst, swift::SILValue const>::ret_type>::type llvm::cast<swift::AllocStackInst, swift::SILValue>(swift::SILValue const&) + 100
8  sil-opt                  0x00000001018eb0a7 performNRVO(swift::CopyAddrInst*) + 135
9  sil-opt                  0x00000001018eaa6a (anonymous namespace)::CopyForwardingPass::run() + 1274
10 sil-opt                  0x000000010172aa4c swift::SILPassManager::runPassOnFunction(swift::SILFunctionTransform*, swift::SILFunction*) + 2428
11 sil-opt                  0x000000010172fcee swift::SILPassManager::runFunctionPasses(llvm::ArrayRef<swift::SILFunctionTransform*>) + 2126
12 sil-opt                  0x0000000101732146 swift::SILPassManager::runOneIteration() + 870
13 sil-opt                  0x0000000100f1ee65 swift::SILPassManager::execute() + 21
14 sil-opt                  0x0000000100f1d4bc swift::SILPassManager::executePassPipelinePlan(swift::SILPassPipelinePlan const&) + 556
15 sil-opt                  0x0000000100f0fffb runCommandLineSelectedPasses(swift::SILModule*, swift::irgen::IRGenModule*) + 683
16 sil-opt                  0x0000000100f0d324 main + 11460
17 libdyld.dylib            0x00007fffb52b0235 start + 1
18 libdyld.dylib            0x0000000000000008 start + 1255472596
Segmentation fault: 11

@gottesmm
Copy link
Contributor Author

gottesmm commented Aug 4, 2017

I am going to be at the doctor for an hour or so, I will be back in a bit.

@gottesmm
Copy link
Contributor Author

gottesmm commented Aug 5, 2017

Erik extracted this optimization into its own pass to by pass the issue that we ran into here. He is doing it in #11361. Closing this PR.

@gottesmm gottesmm closed this Aug 5, 2017
@gottesmm gottesmm deleted the andy-rvalue-fix branch August 5, 2017 01:50
@atrick
Copy link
Contributor

atrick commented Aug 5, 2017

Thanks for debugging. I should have isolated the new optimization sooner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants