-
Notifications
You must be signed in to change notification settings - Fork 1.1k
add dummy fields to trees to test cache locality #3133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
test performance please |
performance test scheduled: 1 job(s) in queue, 0 running. |
Performance test finished successfully: Visit http://dotty-bench.epfl.ch/3133 to see the changes. Benchmarks is based on merge(s) with master |
Very very strange. Can you try to have half of the fields before |
FWIW, you can use http://openjdk.java.net/projects/code-tools/jol/ to check the layout of objects on the JVM. |
test performance please |
performance test scheduled: 1 job(s) in queue, 0 running. |
Here's how to use jol: $ wget http://central.maven.org/maven2/org/openjdk/jol/jol-cli/0.8/jol-cli-0.8-full.jar
$ java -jar jol-cli-0.8-full.jar internals -cp $HOME/.ivy2/cache/org.scala-sbt/interface/jars/interface-0.13.15.jar:$HOME/.ivy2/cache/org.scala-lang/scala-library/jars/scala-library-2.12.3.jar:interfaces/target/dotty-interfaces-0.4.0-bin-SNAPSHOT.jar:out/bootstrap/dotty-library-bootstrapped/scala-0.4/classes:out/bootstrap/dotty-compiler-bootstrapped/scala-0.4/classes 'dotty.tools.dotc.ast.Trees$Ident'
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
# Objects are 8 bytes aligned.
# Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
# Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Instantiated the sample instance via public dotty.tools.dotc.ast.Trees$Ident(dotty.tools.dotc.core.Names$Name)
dotty.tools.dotc.ast.Trees$Ident object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) 74 a0 01 f8 (01110100 10100000 00000001 11111000) (-134111116)
12 4 (alignment/padding gap)
16 8 long Positioned.curPos -4503599627370495
24 4 int Tree.myUniqueId 3
28 4 dotty.tools.dotc.util.Attachment.Link Tree.next null
32 4 java.lang.Object Tree.myTpe null
36 4 dotty.tools.dotc.core.Names.Name Ident.name null
Instance size: 40 bytes
Space losses: 4 bytes internal + 0 bytes external = 4 bytes total |
The output:
|
Performance test finished successfully: Visit http://dotty-bench.epfl.ch/3133 to see the changes. Benchmarks is based on merge(s) with master |
test performance please |
performance test scheduled: 1 job(s) in queue, 0 running. |
Performance test finished successfully: Visit http://dotty-bench.epfl.ch/3133/ to see the changes. Benchmarks is based on merge(s) with master |
The perf statistics show that the change indeed don't affect the cache miss much. With padding
Without padding
|
The ~6% L1d cache miss rate correspond to bench values in @DarkDimius 's MiniPhase paper. The branch miss is also low, 2.3-2.5% (no such data in MiniPhase paper). The current bench machine doesn't support the last-level-cache miss: |
The Given that Cite Miniphase Paper (section 5.3):
|
add dummy fields to trees to test cache locality