-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[BOLT] Set call to continuation count in pre-aggregated profile #109486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BOLT] Set call to continuation count in pre-aggregated profile #109486
Conversation
Created using spr 1.3.4
@llvm/pr-subscribers-bolt Author: Amir Ayupov (aaupov) ChangesAdd a test reproducing one source of profile discontinuity: missing Test Plan: bin/llvm-lit -a tools/bolt/test/X86/callcont-fallthru.test Patch is 32.43 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/109486.diff 3 Files Affected:
diff --git a/bolt/test/X86/Inputs/callcont-fallthru.preagg b/bolt/test/X86/Inputs/callcont-fallthru.preagg
new file mode 100644
index 00000000000000..0b5f344540573a
--- /dev/null
+++ b/bolt/test/X86/Inputs/callcont-fallthru.preagg
@@ -0,0 +1,21 @@
+B ffffffff81e01006 401194 8 0
+B 401180 401199 98482 96
+B 401199 401166 99542 0
+B 401177 401130 102776 0
+B 401135 40117c 103204 0
+B 401186 40118b 1022983 0
+B 401194 40117c 1021645 1
+F 40117c 401135 1161
+F 40117c 401180 92267
+F 40118b 401194 991002
+F 40117c 401186 968072
+F 40118b 401186 11468
+F 401130 401135 100015
+F 401166 401177 96992
+F 401199 401199 96168
+F 40117c ffffffff81e01006 7
+F 401199 401180 1140
+F 401194 ffffffff81e01006 1
+F 40117c 401194 11522
+F 401166 401199 1151
+F 401130 401177 1154
diff --git a/bolt/test/X86/Inputs/callcont-fallthru.yaml b/bolt/test/X86/Inputs/callcont-fallthru.yaml
new file mode 100644
index 00000000000000..a1f8417d1e217d
--- /dev/null
+++ b/bolt/test/X86/Inputs/callcont-fallthru.yaml
@@ -0,0 +1,889 @@
+--- !ELF
+FileHeader:
+ Class: ELFCLASS64
+ Data: ELFDATA2LSB
+ Type: ET_EXEC
+ Machine: EM_X86_64
+ Entry: 0x401040
+ProgramHeaders:
+ - Type: PT_PHDR
+ Flags: [ PF_R ]
+ VAddr: 0x400040
+ Align: 0x8
+ - Type: PT_INTERP
+ Flags: [ PF_R ]
+ FirstSec: .interp
+ LastSec: .interp
+ VAddr: 0x400318
+ - Type: PT_LOAD
+ Flags: [ PF_R ]
+ FirstSec: .interp
+ LastSec: .rela.plt
+ VAddr: 0x400000
+ Align: 0x1000
+ - Type: PT_LOAD
+ Flags: [ PF_X, PF_R ]
+ FirstSec: .init
+ LastSec: .fini
+ VAddr: 0x401000
+ Align: 0x1000
+ - Type: PT_LOAD
+ Flags: [ PF_R ]
+ FirstSec: .rodata
+ LastSec: .eh_frame
+ VAddr: 0x402000
+ Align: 0x1000
+ - Type: PT_LOAD
+ Flags: [ PF_W, PF_R ]
+ FirstSec: .init_array
+ LastSec: .bss
+ VAddr: 0x403DE8
+ Align: 0x1000
+ - Type: PT_DYNAMIC
+ Flags: [ PF_W, PF_R ]
+ FirstSec: .dynamic
+ LastSec: .dynamic
+ VAddr: 0x403DF8
+ Align: 0x8
+ - Type: PT_NOTE
+ Flags: [ PF_R ]
+ FirstSec: .note.gnu.property
+ LastSec: .note.gnu.property
+ VAddr: 0x400338
+ Align: 0x8
+ - Type: PT_NOTE
+ Flags: [ PF_R ]
+ FirstSec: .note.gnu.build-id
+ LastSec: .note.ABI-tag
+ VAddr: 0x400358
+ Align: 0x4
+ - Type: PT_GNU_PROPERTY
+ Flags: [ PF_R ]
+ FirstSec: .note.gnu.property
+ LastSec: .note.gnu.property
+ VAddr: 0x400338
+ Align: 0x8
+ - Type: PT_GNU_EH_FRAME
+ Flags: [ PF_R ]
+ FirstSec: .eh_frame_hdr
+ LastSec: .eh_frame_hdr
+ VAddr: 0x402010
+ Align: 0x4
+ - Type: PT_GNU_STACK
+ Flags: [ PF_W, PF_R ]
+ Align: 0x10
+ - Type: PT_GNU_RELRO
+ Flags: [ PF_R ]
+ FirstSec: .init_array
+ LastSec: .got
+ VAddr: 0x403DE8
+Sections:
+ - Name: .interp
+ Type: SHT_PROGBITS
+ Flags: [ SHF_ALLOC ]
+ Address: 0x400318
+ AddressAlign: 0x1
+ Content: 2F6C696236342F6C642D6C696E75782D7838362D36342E736F2E3200
+ - Name: .note.gnu.property
+ Type: SHT_NOTE
+ Flags: [ SHF_ALLOC ]
+ Address: 0x400338
+ AddressAlign: 0x8
+ Notes:
+ - Name: GNU
+ Desc: 028000C0040000000300000000000000
+ Type: NT_GNU_PROPERTY_TYPE_0
+ - Name: .note.gnu.build-id
+ Type: SHT_NOTE
+ Flags: [ SHF_ALLOC ]
+ Address: 0x400358
+ AddressAlign: 0x4
+ Notes:
+ - Name: GNU
+ Desc: A77EA471B9AAA21E180E5FD02A0A0B2E4AB643E9
+ Type: NT_PRPSINFO
+ - Name: .note.ABI-tag
+ Type: SHT_NOTE
+ Flags: [ SHF_ALLOC ]
+ Address: 0x40037C
+ AddressAlign: 0x4
+ Notes:
+ - Name: GNU
+ Desc: '00000000030000000200000000000000'
+ Type: NT_VERSION
+ - Name: .gnu.hash
+ Type: SHT_GNU_HASH
+ Flags: [ SHF_ALLOC ]
+ Address: 0x4003A0
+ Link: .dynsym
+ AddressAlign: 0x8
+ Header:
+ SymNdx: 0x1
+ Shift2: 0x0
+ BloomFilter: [ 0x0 ]
+ HashBuckets: [ 0x0 ]
+ HashValues: [ ]
+ - Name: .dynsym
+ Type: SHT_DYNSYM
+ Flags: [ SHF_ALLOC ]
+ Address: 0x4003C0
+ Link: .dynstr
+ AddressAlign: 0x8
+ - Name: .dynstr
+ Type: SHT_STRTAB
+ Flags: [ SHF_ALLOC ]
+ Address: 0x400450
+ AddressAlign: 0x1
+ - Name: .gnu.version
+ Type: SHT_GNU_versym
+ Flags: [ SHF_ALLOC ]
+ Address: 0x4004CE
+ Link: .dynsym
+ AddressAlign: 0x2
+ Entries: [ 0, 2, 1, 1, 3, 1 ]
+ - Name: .gnu.version_r
+ Type: SHT_GNU_verneed
+ Flags: [ SHF_ALLOC ]
+ Address: 0x4004E0
+ Link: .dynstr
+ AddressAlign: 0x8
+ Dependencies:
+ - Version: 1
+ File: libc.so.6
+ Entries:
+ - Name: GLIBC_2.2.5
+ Hash: 157882997
+ Flags: 0
+ Other: 3
+ - Name: GLIBC_2.34
+ Hash: 110530996
+ Flags: 0
+ Other: 2
+ - Name: .rela.dyn
+ Type: SHT_RELA
+ Flags: [ SHF_ALLOC ]
+ Address: 0x400510
+ Link: .dynsym
+ AddressAlign: 0x8
+ Relocations:
+ - Offset: 0x403FC8
+ Symbol: __libc_start_main
+ Type: R_X86_64_GLOB_DAT
+ - Offset: 0x403FD0
+ Symbol: _ITM_deregisterTMCloneTable
+ Type: R_X86_64_GLOB_DAT
+ - Offset: 0x403FD8
+ Symbol: __gmon_start__
+ Type: R_X86_64_GLOB_DAT
+ - Offset: 0x403FE0
+ Symbol: _ITM_registerTMCloneTable
+ Type: R_X86_64_GLOB_DAT
+ - Name: .rela.plt
+ Type: SHT_RELA
+ Flags: [ SHF_ALLOC, SHF_INFO_LINK ]
+ Address: 0x400570
+ Link: .dynsym
+ AddressAlign: 0x8
+ Info: .got.plt
+ Relocations:
+ - Offset: 0x404000
+ Symbol: atoi
+ Type: R_X86_64_JUMP_SLOT
+ - Name: .init
+ Type: SHT_PROGBITS
+ Flags: [ SHF_ALLOC, SHF_EXECINSTR ]
+ Address: 0x401000
+ AddressAlign: 0x4
+ Offset: 0x1000
+ Content: F30F1EFA4883EC08488B05C92F00004885C07402FFD04883C408C3
+ - Name: .plt
+ Type: SHT_PROGBITS
+ Flags: [ SHF_ALLOC, SHF_EXECINSTR ]
+ Address: 0x401020
+ AddressAlign: 0x10
+ EntSize: 0x10
+ Content: FF35CA2F0000FF25CC2F00000F1F4000FF25CA2F00006800000000E9E0FFFFFF
+ - Name: .text
+ Type: SHT_PROGBITS
+ Flags: [ SHF_ALLOC, SHF_EXECINSTR ]
+ Address: 0x401040
+ AddressAlign: 0x10
+ Content: F30F1EFA31ED4989D15E4889E24883E4F050544531C031C948C7C740114000FF15632F0000F4662E0F1F840000000000F30F1EFAC3662E0F1F84000000000090488D3D892F0000488D05822F00004839F87415488B05362F00004885C07409FFE00F1F8000000000C30F1F8000000000488D3D592F0000488D35522F00004829FE4889F048C1EE3F48C1F8034801C648D1FE7414488B05052F00004885C07408FFE0660F1F440000C30F1F8000000000F30F1EFA803D152F0000007513554889E5E87AFFFFFFC605032F0000015DC390C366662E0F1F8400000000000F1F4000F30F1EFAEB8A662E0F1F840000000000554889E55DC3662E0F1F840000000000554889E54883EC20C745FC00000000897DF8488975F0488B45F0488B7808E8CDFEFFFF8945EC837DEC000F842E000000C745E80A000000E8B4FFFFFF837DE8000F8413000000E9000000008B45E883C0FF8945E8E9E3FFFFFFE9C8FFFFFF31C04883C4205DC3
+ - Name: .fini
+ Type: SHT_PROGBITS
+ Flags: [ SHF_ALLOC, SHF_EXECINSTR ]
+ Address: 0x4011A8
+ AddressAlign: 0x4
+ Content: F30F1EFA4883EC084883C408C3
+ - Name: .rodata
+ Type: SHT_PROGBITS
+ Flags: [ SHF_ALLOC ]
+ Address: 0x402000
+ AddressAlign: 0x8
+ Offset: 0x2000
+ Content: '01000200000000000000000000000000'
+ - Name: .eh_frame_hdr
+ Type: SHT_PROGBITS
+ Flags: [ SHF_ALLOC ]
+ Address: 0x402010
+ AddressAlign: 0x4
+ Content: 011B033B340000000500000010F0FFFF7800000030F0FFFF5000000060F0FFFF6400000020F1FFFFA000000030F1FFFFC0000000
+ - Name: .eh_frame
+ Type: SHT_PROGBITS
+ Flags: [ SHF_ALLOC ]
+ Address: 0x402048
+ AddressAlign: 0x8
+ Content: 1400000000000000017A5200017810011B0C070890010000100000001C000000D8EFFFFF26000000004407101000000030000000F4EFFFFF0500000000000000240000004400000090EFFFFF20000000000E10460E184A0F0B770880003F1A3B2A332422000000001C0000006C00000078F0FFFF0600000000410E108602430D06410C07080000001C0000008C00000068F0FFFF6600000000410E108602430D0602610C0708000000000000
+ - Name: .init_array
+ Type: SHT_INIT_ARRAY
+ Flags: [ SHF_WRITE, SHF_ALLOC ]
+ Address: 0x403DE8
+ AddressAlign: 0x8
+ EntSize: 0x8
+ Offset: 0x2DE8
+ Content: '2011400000000000'
+ - Name: .fini_array
+ Type: SHT_FINI_ARRAY
+ Flags: [ SHF_WRITE, SHF_ALLOC ]
+ Address: 0x403DF0
+ AddressAlign: 0x8
+ EntSize: 0x8
+ Content: F010400000000000
+ - Name: .dynamic
+ Type: SHT_DYNAMIC
+ Flags: [ SHF_WRITE, SHF_ALLOC ]
+ Address: 0x403DF8
+ Link: .dynstr
+ AddressAlign: 0x8
+ Entries:
+ - Tag: DT_NEEDED
+ Value: 0x18
+ - Tag: DT_INIT
+ Value: 0x401000
+ - Tag: DT_FINI
+ Value: 0x4011A8
+ - Tag: DT_INIT_ARRAY
+ Value: 0x403DE8
+ - Tag: DT_INIT_ARRAYSZ
+ Value: 0x8
+ - Tag: DT_FINI_ARRAY
+ Value: 0x403DF0
+ - Tag: DT_FINI_ARRAYSZ
+ Value: 0x8
+ - Tag: DT_GNU_HASH
+ Value: 0x4003A0
+ - Tag: DT_STRTAB
+ Value: 0x400450
+ - Tag: DT_SYMTAB
+ Value: 0x4003C0
+ - Tag: DT_STRSZ
+ Value: 0x7E
+ - Tag: DT_SYMENT
+ Value: 0x18
+ - Tag: DT_DEBUG
+ Value: 0x0
+ - Tag: DT_PLTGOT
+ Value: 0x403FE8
+ - Tag: DT_PLTRELSZ
+ Value: 0x18
+ - Tag: DT_PLTREL
+ Value: 0x7
+ - Tag: DT_JMPREL
+ Value: 0x400570
+ - Tag: DT_RELA
+ Value: 0x400510
+ - Tag: DT_RELASZ
+ Value: 0x60
+ - Tag: DT_RELAENT
+ Value: 0x18
+ - Tag: DT_VERNEED
+ Value: 0x4004E0
+ - Tag: DT_VERNEEDNUM
+ Value: 0x1
+ - Tag: DT_VERSYM
+ Value: 0x4004CE
+ - Tag: DT_NULL
+ Value: 0x0
+ - Tag: DT_NULL
+ Value: 0x0
+ - Tag: DT_NULL
+ Value: 0x0
+ - Tag: DT_NULL
+ Value: 0x0
+ - Tag: DT_NULL
+ Value: 0x0
+ - Tag: DT_NULL
+ Value: 0x0
+ - Name: .got
+ Type: SHT_PROGBITS
+ Flags: [ SHF_WRITE, SHF_ALLOC ]
+ Address: 0x403FC8
+ AddressAlign: 0x8
+ EntSize: 0x8
+ Content: '0000000000000000000000000000000000000000000000000000000000000000'
+ - Name: .got.plt
+ Type: SHT_PROGBITS
+ Flags: [ SHF_WRITE, SHF_ALLOC ]
+ Address: 0x403FE8
+ AddressAlign: 0x8
+ EntSize: 0x8
+ Content: F83D400000000000000000000000000000000000000000003610400000000000
+ - Name: .data
+ Type: SHT_PROGBITS
+ Flags: [ SHF_WRITE, SHF_ALLOC ]
+ Address: 0x404008
+ AddressAlign: 0x1
+ Content: '00000000'
+ - Name: .tm_clone_table
+ Type: SHT_PROGBITS
+ Flags: [ SHF_WRITE, SHF_ALLOC ]
+ Address: 0x404010
+ AddressAlign: 0x8
+ - Name: .bss
+ Type: SHT_NOBITS
+ Flags: [ SHF_WRITE, SHF_ALLOC ]
+ Address: 0x404010
+ AddressAlign: 0x1
+ Size: 0x8
+ - Name: .comment
+ Type: SHT_PROGBITS
+ Flags: [ SHF_MERGE, SHF_STRINGS ]
+ AddressAlign: 0x1
+ EntSize: 0x1
+ Content: 4743433A2028474E55292031312E352E302032303234303731392028526564204861742031312E352E302D3229004743433A2028474E55292031332E332E312032303234303631312028526564204861742031332E332E312D322900636C616E672076657273696F6E2031382E312E38202843656E744F532031382E312E382D332E656C392900
+ - Name: .gnu.build.attributes
+ Type: SHT_NOTE
+ Address: 0x406018
+ AddressAlign: 0x4
+ Notes:
+ - Name: "GA$\x013a1"
+ Desc: '40104000000000006610400000000000'
+ Type: NT_GNU_BUILD_ATTRIBUTE_OPEN
+ - Name: "GA$\x013a1"
+ Desc: '75104000000000007510400000000000'
+ Type: NT_GNU_BUILD_ATTRIBUTE_OPEN
+ - Name: "GA$\x013a1"
+ Desc: '00104000000000001610400000000000'
+ Type: NT_GNU_BUILD_ATTRIBUTE_OPEN
+ - Name: "GA$\x013a1"
+ Desc: A811400000000000B011400000000000
+ Type: NT_GNU_BUILD_ATTRIBUTE_OPEN
+ - Name: "GA$\x013a1"
+ Desc: '80104000000000002611400000000000'
+ Type: NT_GNU_BUILD_ATTRIBUTE_OPEN
+ - Name: "GA$\x013a1"
+ Desc: A611400000000000A611400000000000
+ Type: NT_GNU_BUILD_ATTRIBUTE_OPEN
+ - Name: "GA$\x013a1"
+ Desc: A611400000000000A611400000000000
+ Type: NT_GNU_BUILD_ATTRIBUTE_OPEN
+ - Name: "GA$\x013a1"
+ Desc: 16104000000000001B10400000000000
+ Type: NT_GNU_BUILD_ATTRIBUTE_OPEN
+ - Name: "GA$\x013a1"
+ Desc: B011400000000000B511400000000000
+ Type: NT_GNU_BUILD_ATTRIBUTE_OPEN
+ - Name: .rela.init
+ Type: SHT_RELA
+ Flags: [ SHF_INFO_LINK ]
+ Link: .symtab
+ AddressAlign: 0x8
+ Info: .init
+ Relocations:
+ - Offset: 0x40100B
+ Symbol: __gmon_start__
+ Type: R_X86_64_REX_GOTPCRELX
+ Addend: -4
+ - Name: .rela.text
+ Type: SHT_RELA
+ Flags: [ SHF_INFO_LINK ]
+ Link: .symtab
+ AddressAlign: 0x8
+ Info: .text
+ Relocations:
+ - Offset: 0x40105B
+ Symbol: main
+ Type: R_X86_64_32S
+ - Offset: 0x401061
+ Symbol: '__libc_start_main@GLIBC_2.34'
+ Type: R_X86_64_GOTPCRELX
+ Addend: -4
+ - Offset: 0x401083
+ Symbol: .tm_clone_table
+ Type: R_X86_64_PC32
+ Addend: -4
+ - Offset: 0x40108A
+ Symbol: __TMC_END__
+ Type: R_X86_64_PC32
+ Addend: -4
+ - Offset: 0x401096
+ Symbol: _ITM_deregisterTMCloneTable
+ Type: R_X86_64_REX_GOTPCRELX
+ Addend: -4
+ - Offset: 0x4010B3
+ Symbol: .tm_clone_table
+ Type: R_X86_64_PC32
+ Addend: -4
+ - Offset: 0x4010BA
+ Symbol: __TMC_END__
+ Type: R_X86_64_PC32
+ Addend: -4
+ - Offset: 0x4010D7
+ Symbol: _ITM_registerTMCloneTable
+ Type: R_X86_64_REX_GOTPCRELX
+ Addend: -4
+ - Offset: 0x4010F6
+ Symbol: .bss
+ Type: R_X86_64_PC32
+ Addend: -5
+ - Offset: 0x401108
+ Symbol: .bss
+ Type: R_X86_64_PC32
+ Addend: -5
+ - Offset: 0x40115F
+ Symbol: 'atoi@GLIBC_2.2.5'
+ Type: R_X86_64_PLT32
+ Addend: -4
+ - Offset: 0x401178
+ Symbol: foo
+ Type: R_X86_64_PLT32
+ Addend: -4
+ - Name: .rela.eh_frame
+ Type: SHT_RELA
+ Flags: [ SHF_INFO_LINK ]
+ Link: .symtab
+ AddressAlign: 0x8
+ Info: .eh_frame
+ Relocations:
+ - Offset: 0x402068
+ Symbol: .text
+ Type: R_X86_64_PC32
+ - Offset: 0x40207C
+ Symbol: .text
+ Type: R_X86_64_PC32
+ Addend: 48
+ - Offset: 0x4020B8
+ Symbol: .text
+ Type: R_X86_64_PC32
+ Addend: 240
+ - Offset: 0x4020D8
+ Symbol: .text
+ Type: R_X86_64_PC32
+ Addend: 256
+ - Name: .rela.init_array
+ Type: SHT_RELA
+ Flags: [ SHF_INFO_LINK ]
+ Link: .symtab
+ AddressAlign: 0x8
+ Info: .init_array
+ Relocations:
+ - Offset: 0x403DE8
+ Symbol: .text
+ Type: R_X86_64_64
+ Addend: 224
+ - Name: .rela.fini_array
+ Type: SHT_RELA
+ Flags: [ SHF_INFO_LINK ]
+ Link: .symtab
+ AddressAlign: 0x8
+ Info: .fini_array
+ Relocations:
+ - Offset: 0x403DF0
+ Symbol: .text
+ Type: R_X86_64_64
+ Addend: 176
+ - Name: .rela.gnu.build.attributes
+ Type: SHT_RELA
+ Flags: [ SHF_INFO_LINK ]
+ Link: .symtab
+ AddressAlign: 0x8
+ Info: .gnu.build.attributes
+ Relocations:
+ - Offset: 0x40602C
+ Symbol: .text
+ Type: R_X86_64_64
+ - Offset: 0x406034
+ Symbol: .text
+ Type: R_X86_64_64
+ Addend: 38
+ - Offset: 0x406050
+ Symbol: .text
+ Type: R_X86_64_64
+ Addend: 53
+ - Offset: 0x406058
+ Symbol: .text
+ Type: R_X86_64_64
+ Addend: 53
+ - Offset: 0x406074
+ Symbol: .init
+ Type: R_X86_64_64
+ - Offset: 0x40607C
+ Symbol: .init
+ Type: ...
[truncated]
|
It's hard to understand the test case without knowing the context. Is there any way we can make it more clear? |
Created using spr 1.3.4
You can test this locally with the following command:git-clang-format --diff 76a52db1edbd681058c291da0314af24b42925a3 9b3db9b2f14d9c05d1ebee0383cd57ba6b22fba0 --extensions cpp,h -- bolt/include/bolt/Core/BinaryFunction.h bolt/include/bolt/Profile/DataAggregator.h bolt/lib/Profile/DataAggregator.cpp View the diff from clang-format here.diff --git a/bolt/lib/Profile/DataAggregator.cpp b/bolt/lib/Profile/DataAggregator.cpp
index 967499063b..697cac9fbc 100644
--- a/bolt/lib/Profile/DataAggregator.cpp
+++ b/bolt/lib/Profile/DataAggregator.cpp
@@ -833,8 +833,8 @@ bool DataAggregator::doBranch(uint64_t From, uint64_t To, uint64_t Count,
};
uint64_t ToOrig = To;
- auto [FromFunc, IsReturn] = handleAddress(From, /*IsFrom*/true);
- auto [ToFunc, IsCallCont] = handleAddress(To, /*IsFrom*/false);
+ auto [FromFunc, IsReturn] = handleAddress(From, /*IsFrom*/ true);
+ auto [ToFunc, IsCallCont] = handleAddress(To, /*IsFrom*/ false);
if (!FromFunc && !ToFunc)
return false;
@@ -1824,7 +1824,7 @@ void DataAggregator::processPreAggregated() {
switch (AggrEntry.EntryType) {
case AggregatedLBREntry::BRANCH:
doBranch(AggrEntry.From.Offset, AggrEntry.To.Offset, AggrEntry.Count,
- AggrEntry.Mispreds, /*IsPreagg*/true);
+ AggrEntry.Mispreds, /*IsPreagg*/ true);
break;
case AggregatedLBREntry::FT:
case AggregatedLBREntry::FT_EXTERNAL_ORIGIN: {
|
Created using spr 1.3.4
Created using spr 1.3.4
Created using spr 1.3.4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Amir, what's the status here, do you plan to make this only kick in for preaggregated profile, or are you done equalizing both cases by removing the adjustment in line 906?
bolt/lib/Profile/DataAggregator.cpp
Outdated
@@ -903,24 +956,6 @@ DataAggregator::getFallthroughsInTrace(BinaryFunction &BF, | |||
if (!FromBB || !ToBB) | |||
return std::nullopt; | |||
|
|||
// Adjust FromBB if the first LBR is a return from the last instruction in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference to aid in review the code being removed here comes from 5599c01
Created using spr 1.3.4
Created using spr 1.3.4
Created using spr 1.3.4
Performance testing shows it's better to restrict return profile conversion to pre-aggregated profile only. I'll update the description. |
LGTM. Thanks for working on this! |
Created using spr 1.3.4
Created using spr 1.3.4
@maksfb – I've dropped the check that the preceding instruction is a call. Performance is neutral. |
Created using spr 1.3.4
Created using spr 1.3.4
The test consumes pre-aggregated profile directly. We extract more information out of it with llvm/llvm-project#109486, which changes the optimized binary.
llvm#109683 identified an issue with pre-aggregated profile where a call to continuation fallthrough edge count is missing (profile discontinuity). This issue only affects pre-aggregated profile but not perf data since LBR stack has the necessary information to determine if the trace (fall- through) starts at call continuation, whereas pre-aggregated fallthrough lacks this information. The solution is to look at branch records in pre-aggregated profiles that correspond to returns and assign counts to call to continuation fallthrough: - BranchFrom is in another function or DSO, - BranchTo may be a call continuation site: - not an entry point/landing pad. Note that we can't directly check if BranchFrom corresponds to a return instruction if it's in external DSO. Keep call continuation handling for perf data (`getFallthroughsInTrace`) [1] as-is due to marginally better performance. The difference is that return-converted call to continuation fallthrough is slightly more frequent than other fallthroughs since the former only requires one LBR address while the latter need two that belong to the profiled binary. Hence return-converted fallthroughs have larger "weight" which affects code layout. [1] `DataAggregator::getFallthroughsInTrace` https://github.com/llvm/llvm-project/blob/fea18afeed39fe4435d67eee1834f0f34b23013d/bolt/lib/Profile/DataAggregator.cpp#L906-L915 Test Plan: added callcont-fallthru.s Reviewers: maksfb, ayermolo, ShatianWang, dcci Reviewed By: maksfb, ShatianWang Pull Request: llvm#109486
#109683 identified an issue with pre-aggregated profile where a call to
continuation fallthrough edge count is missing (profile discontinuity).
This issue only affects pre-aggregated profile but not perf data since
LBR stack has the necessary information to determine if the trace (fall-
through) starts at call continuation, whereas pre-aggregated fallthrough
lacks this information.
The solution is to look at branch records in pre-aggregated profiles
that correspond to returns and assign counts to call to continuation
fallthrough:
Note that we can't directly check if BranchFrom corresponds to a return
instruction if it's in external DSO.
Keep call continuation handling for perf data (
getFallthroughsInTrace
)[1] as-is due to marginally better performance. The difference is that
return-converted call to continuation fallthrough is slightly more
frequent than other fallthroughs since the former only requires one LBR
address while the latter need two that belong to the profiled binary.
Hence return-converted fallthroughs have larger "weight" which affects
code layout.
[1]
DataAggregator::getFallthroughsInTrace
llvm-project/bolt/lib/Profile/DataAggregator.cpp
Lines 906 to 915 in fea18af
Test Plan: added callcont-fallthru.s