ARCEntryPointBuilder: Don't mark the call instructions as tail-calls #23476

shajrawi · 2019-03-21T23:59:44Z

rdar://problem/48833545

From the LLVM Manual regarding tail/musttail : "Both markers imply that the callee does not access allocas from the caller”

Swift’s LLVMARCContract just marks all the calls it creates as tail call without any analysis and/or checking if we are allowed to do that. This created an interesting runtime crash that was a pain to debug - story time:

I traced a runtime crash back to Swift’s LLVMARCContract, but could not grok why the transformation there is wrong: we replaced two consecutive _swift_bridgeObjectRelease(x) calls with _swift_bridgeObjectRelease_n(x, 2), which is a perfectly valid thing to do.

I noticed that the new call is marked as a tail call, disabling that portion of the pass “solved” the runtime crash, but I wanted to understand why:

This code worked:

	pushq	$2
	popq	%rsi
	movq	-168(%rbp), %rdi
	callq	_swift_bridgeObjectRelease_n
	leaq	-40(%rbp), %rsp
	popq	%rbx
	popq	%r12
	popq	%r13
	popq	%r14
	popq	%r15
	popq	%rbp
	retq

While this version crashed further on during the run:

	movq	-168(%rbp), %rdi
	leaq	-40(%rbp), %rsp
	popq	%rbx
	popq	%r12
	popq	%r13
	popq	%r14
	popq	%r15
	popq	%rbp
	jmp	_swift_bridgeObjectRelease_n

As you can see, the call is the last thing we do before returning, so nothing appeared out of the ordinary at first…

Dumping the heap object at the release basic block looked perfectly fine: the ref count was 2 and all the fields looked valid.

However, when we reached the callee the value was modified / dumping it showed it changed somewhere. Which did not make any sense.

Setting up a memory watchpoint on the heap object and/or its reference count did not get us anywhere: the watchpoint triggered on unrelated code in the entry to the callee..

I then realized what’s going on, here’s a an amusing reproducer that you can checkout in LLDB:
Experiment 1:
Setup a breakpoint at leaq -40(%rbp), %rsp
Dump the heap object - it looks good
Experiment 2:
Rerun the same test with a small modification:
Setup a breakpoint at popq %rbx (the instruction after leaq, do not set a breakpoint at leaq)
Dump the heap object - it looks bad!

So what is going on there? The SIL Optimizer changed an alloc_ref instruction into an alloc_ref [stack], which is a perfectly valid thing to do.

However, this means we allocated the heap object on the stack, and then tail-called into the swift runtime with said object. After having modified the stack pointer in the caller’s epilogue.

So why does experiment 2 show garbage? We’ve updated the stack pointer, and it just so happens that we are after the red zone on the stack. When the breakpoint is hit (the OS passes control back to LLDB), it is perfectly allowed to use the memory where the heap object used to reside.

Note: I then realized something even more concerning, that we were lucky not have hit so far: not only did we not check if we are allowed to mark a call as ’tail’ in this situation, which could have been considered a corner case, we could have if we have not promoted it from heap to stack, but we marked ALL the call instructions created in this pass as tail call even if they are not the last thing that occurred in the calling function! Looking at the LVMPasses/contract.ll test case, which is modified in this PR, we see some scary checks that are just wrong: we are checking if a call is marked as ‘tail’ in the middle of the function, then check the rest of the function in CHECK-NEXT lines. Knowing full well that the new ‘tail call’ is not the last thing that should execute in the caller.

rdar://problem/48833545 From the LLVM Manual regarding tail/musttail : "Both markers imply that the callee does not access allocas from the caller” Swift’s LLVMARCContract just marks all the calls it creates as tail call without any analysis and/or checking if we are allowed to do that. This created an interesting runtime crash that was a pain to debug - story time: I traced a runtime crash back to Swift’s LLVMARCContract, but could not grok why the transformation there is wrong: we replaced two consecutive _swift_bridgeObjectRelease(x) calls with _swift_bridgeObjectRelease_n(x, 2), which is a perfectly valid thing to do. I noticed that the new call is marked as a tail call, disabling that portion of the pass “solved” the runtime crash, but I wanted to understand *why*: This code worked: pushq $2 popq %rsi movq -168(%rbp), %rdi callq _swift_bridgeObjectRelease_n leaq -40(%rbp), %rsp popq %rbx popq %r12 popq %r13 popq %r14 popq %r15 popq %rbp retq While this version crashed further on during the run: movq -168(%rbp), %rdi leaq -40(%rbp), %rsp popq %rbx popq %r12 popq %r13 popq %r14 popq %r15 popq %rbp jmp _swift_bridgeObjectRelease_n As you can see, the call is the last thing we do before returning, so nothing appeared out of the ordinary at first… Dumping the heap object at the release basic block looked perfectly fine: the ref count was 2 and all the fields looked valid. However, when we reached the callee the value was modified / dumping it showed it changed somewhere. Which did not make any sense. Setting up a memory watchpoint on the heap object and/or its reference count did not get us anywhere: the watchpoint triggered on unrelated code in the entry to the callee.. I then realized what’s going on, here’s a an amusing reproducer that you can checkout in LLDB: Experiment 1: Setup a breakpoint at leaq -40(%rbp), %rsp Dump the heap object - it looks good Experiment 2: Rerun the same test with a small modification: Setup a breakpoint at popq %rbx (the instruction after leaq, do not set a breakpoint at leaq) Dump the heap object - it looks bad! So what is going on there? The SIL Optimizer changed an alloc_ref instruction into an alloc_ref [stack], which is a perfectly valid thing to do. However, this means we allocated the heap object on the stack, and then tail-called into the swift runtime with said object. After having modified the stack pointer in the caller’s epilogue. So why does experiment 2 show garbage? We’ve updated the stack pointer, and it just so happens that we are after the red zone on the stack. When the breakpoint is hit (the OS passes control back to LLDB), it is perfectly allowed to use the memory where the heap object used to reside. Note: I then realized something even more concerning, that we were lucky not have hit so far: not only did we not check if we are allowed to mark a call as ’tail’ in this situation, which could have been considered a corner case, we could have if we have not promoted it from heap to stack, but we marked *ALL* the call instructions created in this pass as tail call even if they are not the last thing that occurred in the calling function! Looking at the LVMPasses/contract.ll test case, which is modified in this PR, we see some scary checks that are just wrong: we are checking if a call is marked as ‘tail’ in the middle of the function, then check the rest of the function in CHECK-NEXT lines. Knowing full well that the new ‘tail call’ is not the last thing that should execute in the caller.

shajrawi · 2019-03-22T00:00:03Z

@swift-ci Please test

shajrawi · 2019-03-22T00:01:40Z

@eeckstein , @atrick @gottesmm and @mikeash thought you might enjoy this write up - can one of you please review?

compnerd · 2019-03-22T00:08:18Z

So, one of the interesting things about this is that tail call is not really a tail call as far as LLVM is concerned. Removing the tail is fine, and I don't really expect to see any material difference in the release build, but, it may make the debug build ever so slightly slower/larger, but, then again, that is the debug build. However, correctness is preferable in all cases.

shajrawi · 2019-03-22T00:11:37Z

@compnerd You'd be breaking the LLVM language rules if you mark it as tail call - see here: https://www.llvm.org/docs/LangRef.html#id320

Both markers imply that the callee does not access allocas from the caller.

What we have in this pass is calls marked as tail wherein the callee access allocas from the caller.

compnerd · 2019-03-22T00:14:43Z

@shajrawi - right, I think that the restriction on the alloca is newer (since I ran into some similar problems with ARC even!) and I don't remember that being there at that point. But, the tail there definitely seems wrong with RR functions for ARC. I brought up the point about tail being a hint only since you point out that there were calls marked with tail in the middle of the function.

atrick

Great work! I agree with your changes since there's no explanation anywhere as to why the tail call attribute was used. As a follow up, it certainly would be possible to use tail calls when we know the object outlives the frame. What are the benchmark results?

compnerd · 2019-03-22T00:19:20Z

@atrick - I believe that is a legacy of the behaviour of LLVM from ages ago where without the tail it wouldn't really try to do the TCO and with ObjC ARC, you really want the TCO so you can possibly do the autorelease elision. I think that behaviour was copied over into swift - but I suspect that @gottesmm may remember the real reason.

shajrawi · 2019-03-22T00:20:14Z

@compnerd Ah, I apologize, you are obviously correct it will be treated as a hint in the middle of the function. my point when saying that was: the test case was not meaningful / just confusing - it makes no sense to see a 'tail call' in the middle of a function in this particular hand-written test case. even if LLVM will later ignore it as a hint.

shajrawi · 2019-03-22T00:20:42Z

@swift-ci Please Benchmark

swift-ci · 2019-03-22T01:54:32Z

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Regression
StrComplexWalk	8220	8950	+8.9%	0.92x (?)
Improvement
StringToDataLargeUnicode	15000	13750	-8.3%	1.09x (?)

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

eeckstein

nice work!

ARCEntryPointBuilder: Don't mark the call instructions as tail-calls

aschwaighofer · 2019-03-22T15:10:36Z

@atrick It made sense to mark reference counting operations as tail in a world were objects are always allocated on the heap. This code almost certainly predates us promoting objects to the stack.

atrick · 2019-03-22T16:10:39Z

My point is that it's never ok to make unverified assumptions about the rest of the compiler, and if the optimization was important it should have been commented.

shajrawi requested review from eeckstein, atrick, gottesmm and mikeash March 22, 2019 00:00

atrick approved these changes Mar 22, 2019

View reviewed changes

eeckstein approved these changes Mar 22, 2019

View reviewed changes

shajrawi merged commit dcf7a10 into swiftlang:master Mar 22, 2019

shajrawi pushed a commit that referenced this pull request Mar 22, 2019

Merge pull request #23476 from shajrawi/no_tail

85a6777

ARCEntryPointBuilder: Don't mark the call instructions as tail-calls

shajrawi mentioned this pull request Mar 22, 2019

Merge pull request #23476 from shajrawi/no_tail #23481

Merged

shajrawi deleted the no_tail branch March 22, 2019 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARCEntryPointBuilder: Don't mark the call instructions as tail-calls #23476

ARCEntryPointBuilder: Don't mark the call instructions as tail-calls #23476

Uh oh!

shajrawi commented Mar 21, 2019

Uh oh!

shajrawi commented Mar 22, 2019

Uh oh!

shajrawi commented Mar 22, 2019

Uh oh!

compnerd commented Mar 22, 2019

Uh oh!

shajrawi commented Mar 22, 2019

Uh oh!

compnerd commented Mar 22, 2019

Uh oh!

atrick left a comment

Uh oh!

compnerd commented Mar 22, 2019

Uh oh!

shajrawi commented Mar 22, 2019

Uh oh!

shajrawi commented Mar 22, 2019

Uh oh!

swift-ci commented Mar 22, 2019

Uh oh!

eeckstein left a comment

Uh oh!

aschwaighofer commented Mar 22, 2019

Uh oh!

atrick commented Mar 22, 2019

Uh oh!

Uh oh!

ARCEntryPointBuilder: Don't mark the call instructions as tail-calls #23476

ARCEntryPointBuilder: Don't mark the call instructions as tail-calls #23476

Uh oh!

Conversation

shajrawi commented Mar 21, 2019

Uh oh!

shajrawi commented Mar 22, 2019

Uh oh!

shajrawi commented Mar 22, 2019

Uh oh!

compnerd commented Mar 22, 2019

Uh oh!

shajrawi commented Mar 22, 2019

Uh oh!

compnerd commented Mar 22, 2019

Uh oh!

atrick left a comment

Choose a reason for hiding this comment

Uh oh!

compnerd commented Mar 22, 2019

Uh oh!

shajrawi commented Mar 22, 2019

Uh oh!

shajrawi commented Mar 22, 2019

Uh oh!

swift-ci commented Mar 22, 2019

Performance: -Onone

Uh oh!

eeckstein left a comment

Choose a reason for hiding this comment

Uh oh!

aschwaighofer commented Mar 22, 2019

Uh oh!

atrick commented Mar 22, 2019

Uh oh!

Uh oh!