Skip to content

Use getMetadataLayout offset instead of manual gep #29204

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jan 22, 2020

Conversation

zoecarver
Copy link
Contributor

This patch updates getOffsetForIndex to use StructMetadataLayout::getFieldOffset instead of emitting a swift type metadata call. In some cases, this generates smaller/faster assembly because there is now only an alloc and gep instead of also having the previous swift type metadata calls.

All the IRGen tests succeed but I'm not very familiar with this part of the codebase so, there may be things I am missing (tests, other methods that need to be updated, etc.).

@compnerd
Copy link
Member

This looks right to me. Thanks for the cleanup!

@compnerd
Copy link
Member

@swift-ci please test

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - e19f40a

@zoecarver
Copy link
Contributor Author

@compnerd thanks!

I'll update the broken tests. Could you also run the benchmarks?

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - e19f40a

@zoecarver
Copy link
Contributor Author

I spent a bit of time debugging this. At least some of the tests are failing because TestSuite seems to be "broken." I'm not familiar with the standard library unit test framework, but the function _childProcess splits an input string and expects that there are at least two elements in that array. Somehow, this change made it so that there are no longer two elements in the input.

Anyway, I'll keep debugging, but I'm curious if anyone has ideas as to the source of the issue.

@compnerd
Copy link
Member

@swift-ci please benchmark

@zoecarver
Copy link
Contributor Author

I originally misdiagnosed the problem. I'm still not quite sure what the issue is but, somehow the standard library was miscompiled.

Here are a few examples of failures, both these fail when being jitted but not compiled. The following triggers the assertion: Can't form Range with upperBound < lowerBound (even when I switch the bounds).

let foo = (0 ..< 10).map { $0 }

And the following hits a bad access in _swift_retain_.

struct Foo { var x : Int }
let mirror = Mirror(reflecting: Foo(x: 0))

Still trying to find a reproducer that doesn't make use of the miscompiled standard library.

@jckarter
Copy link
Contributor

You could try having the compiler generate both the old and new code patterns, and insert an assertion function call that the offsets match, to help track down where the behavior is changing.

@zoecarver
Copy link
Contributor Author

@jckarter thanks for the suggestion! I tried doing something similar. I dumped the LLVM IR modules and looked at a diff. I couldn't spot any issues and they both compiled and ran fine. It seems the issue is only when they're executed by the JIT. Is there a way to give the swift JIT an LLVM IR file?

@jckarter
Copy link
Contributor

IIRC you can pass the frontend -use-jit to force it into JIT mode, which could let you inspect whether changes in IR generation are affecting behavior. I don't think there's a way that's already implemented to feed an IR file back into the JIT, but you could probably hack one up.

@jckarter
Copy link
Contributor

What I was thinking was that you could change the compiler to emit a runtime assertion that the two codegen paths produce equivalent results, which you could then use to run the test suite and hopefully catch a trap near the point where the runtime behavior diverges.

@zoecarver
Copy link
Contributor Author

@jckarter thanks, the -use-jit did produce a binary that errored and I was able to track down the issue to the following bit of IR:

define i32 @main(i32, i8**) {
  %8 = alloca swifterror %swift.error*, align 8
  store %swift.error* null, %swift.error** %8, align 8
  ; ...
  %40 = call swiftcc %swift.bridge* @"$sSlsE3mapySayqd__Gqd__7ElementQzKXEKlF"(i8* bitcast (void (%TSi*, %TSi*, %swift.refcounted*, %swift.error**)* @"$sS2is5Error_pIgydzo_S2isAA_pIegnrzo_TRTA" to i8*), %swift.opaque* %32, %swift.type* %37, %swift.type* @"$sSiN", i8** %38, %swift.opaque* noalias nocapture swiftself %39, %swift.error** noalias nocapture swifterror dereferenceable(8) %8)
  %41 = load %swift.error*, %swift.error** %8, align 8
  %42 = icmp ne %swift.error* %41, null
  br i1 %42, label %bar, label %foo

foo:
  ; ...
  ret i32 0

bar:
  store %swift.error* null, %swift.error** %8, align 8
  call void @llvm.stackrestore(i8* %30)
  unreachable
}

The program errors because unreachable is reached meaning Collection.map updated %8.

I'll try adding Collection.map to my source file to see if I can get further.

Sorry, but I still don't quite understand what you are suggesting with the runtime assertion. How would you suggest I test the codegen paths?

@jckarter
Copy link
Contributor

jckarter commented Jan 17, 2020

For example, you could emit the old way, take the result, emit the new way using getMetadataLayout, take that result, then emit a call void noreturn @__swift_assert_equal(i8* %old, i8* %new) afterward, and write a small function in the runtime that aborts if the values aren't equal. Sounds like you might've already tracked down the issue, though.

@zoecarver
Copy link
Contributor Author

@jckarter ahh, I see! That's a really good idea. Now I see what you mean. I'll try that.

@zoecarver
Copy link
Contributor Author

zoecarver commented Jan 20, 2020

The assertion was extremely helpful when debugging this.

@jckarter What is StructMetadataLayout::getFieldOffsetVectorOffset() supposed to return? For the below example, it returns 40 for LazyMapSequence._transform which seems odd to me.

Example (from the standard library):

public func asHex<T : FixedWidthInteger>(_ x: T) -> String {
  return "0x\(String(x, radix: 16))"
}

public func asHex<S : Sequence>(_ x: S) -> String
  where
  S.Element : FixedWidthInteger {
  return "[ \(x.lazy.map { asHex($0) }.joined(separator: ", ")) ]"
}

Edit: I should add that the old code returned 16.

@jckarter
Copy link
Contributor

jckarter commented Jan 21, 2020

getFieldOffsetVectorOffset returns the offset within the metadata record at which the field offset vector begins. You would still need to add the index of the particular field you're looking for to that base offset, and load the offset from the metadata record at that indexed offset, to get the field offset for a specific generic instance.

In the case of LazyMapSequence, the type layout is:

public struct LazyMapSequence<Base: Sequence, Element> {
  internal var _base: Base
  internal let _transform: (Base.Element) -> Element
}

so a metadata record layout is going to be like this:

struct LazyMapSequenceMetadata {
  // Common value type metadata header
  uintptr_t kind = MetadataKind::Struct;
  TypeContextDescriptor *descriptor = &`type context descriptor for LazyMapSequence`;
  
  // Generic arguments
  Metadata *Base;
  Metadata *Element;
  WitnessTable *Base_Sequence;

  // Start of field offsets, where getFieldOffsetVectorOffset refers to
  uint32_t _base, _transform;
}

so 40 looks to be right for getFieldOffsetVectorOffset to return. The offset for _transform would have to be loaded at metadata offset 44, since it's the second field.

@zoecarver
Copy link
Contributor Author

zoecarver commented Jan 21, 2020

@jckarter thanks a lot for that explanation. Very helpful.

I understand the problem now, I thought that I could get the _transform offset directly from StructMetadataLayout, but in reality, I have to keep the call to emitTypeMetadataRefForLayout and load it from there.

Anyway, I'll update the code later today, which should fix it. Thanks again for all the help/explaining.

@zoecarver
Copy link
Contributor Author

zoecarver commented Jan 22, 2020

All fixed. The tests are passing now.

@jckarter thanks again for the help.

Edit: could someone re-trigger the bots?

@jckarter
Copy link
Contributor

@swift-ci Please test

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - e19f40a

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - e19f40a

@zoecarver
Copy link
Contributor Author

This patch eliminates one GEP instruction in the llvm-ir. Do you think it would be worth running the benchmarks too? I know the original patch had code-size improvements but, I think those may not exist anymore (now that the codegen is correct).

@jckarter
Copy link
Contributor

@swift-ci Please benchmark

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
RandomShuffleLCG2 416 464 +11.5% 0.90x (?)
ArrayPlusEqualFiveElementCollection 4255 4736 +11.3% 0.90x (?)
 
Improvement OLD NEW DELTA RATIO
Calculator 155 136 -12.3% 1.14x (?)
Chars2 3400 3050 -10.3% 1.11x (?)
NSStringConversion.LongUTF8 320 292 -8.7% 1.10x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
ArrayLiteral2 102 117 +14.7% 0.87x (?)
ArrayPlusEqualFiveElementCollection 4218 4736 +12.3% 0.89x (?)
RemoveWhereMoveInts 18 20 +11.1% 0.90x
DistinctClassFieldAccesses 180 197 +9.4% 0.91x
IterateData 870 948 +9.0% 0.92x (?)
ArrayInClass 845 915 +8.3% 0.92x (?)
Set.isStrictSubset.Int.Empty 49 53 +8.2% 0.92x (?)
Hanoi 2220 2400 +8.1% 0.93x
ArraySetElement 262 283 +8.0% 0.93x
 
Improvement OLD NEW DELTA RATIO
Calculator 155 137 -11.6% 1.13x
ObjectiveCBridgeStubToNSDate2 1070 950 -11.2% 1.13x (?)
PrefixWhileCountableRangeLazy 18 16 -11.1% 1.12x (?)
Chars2 3400 3050 -10.3% 1.11x

Code size: -Osize

Performance: -Onone

Improvement OLD NEW DELTA RATIO
DataSubscriptMedium 58 54 -6.9% 1.07x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

@zoecarver
Copy link
Contributor Author

Intresting-- those results are not at all what I would have expected. The Osize performance regressions are especially unexpected. I wish there was also a table for Onone code size. Anyway, at least there's a small improvement in one of the Onone benchmarks.

Should I be worried about any of the regressions? Is there any way they're incorrect?

@jckarter
Copy link
Contributor

The performance numbers all look like noise to me. I wouldn't worry about them. I think this is good to go.

@jckarter jckarter merged commit ea1e219 into swiftlang:master Jan 22, 2020
@zoecarver
Copy link
Contributor Author

zoecarver commented Jan 22, 2020

Sounds good. Thanks for all the help with this patch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants