Skip to content

[BOLT] Set call to continuation count in pre-aggregated profile #109486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Nov 8, 2024

Conversation

aaupov
Copy link
Contributor

@aaupov aaupov commented Sep 20, 2024

#109683 identified an issue with pre-aggregated profile where a call to
continuation fallthrough edge count is missing (profile discontinuity).

This issue only affects pre-aggregated profile but not perf data since
LBR stack has the necessary information to determine if the trace (fall-
through) starts at call continuation, whereas pre-aggregated fallthrough
lacks this information.

The solution is to look at branch records in pre-aggregated profiles
that correspond to returns and assign counts to call to continuation
fallthrough:

  • BranchFrom is in another function or DSO,
  • BranchTo may be a call continuation site:
    • not an entry point/landing pad.

Note that we can't directly check if BranchFrom corresponds to a return
instruction if it's in external DSO.

Keep call continuation handling for perf data (getFallthroughsInTrace)
[1] as-is due to marginally better performance. The difference is that
return-converted call to continuation fallthrough is slightly more
frequent than other fallthroughs since the former only requires one LBR
address while the latter need two that belong to the profiled binary.
Hence return-converted fallthroughs have larger "weight" which affects
code layout.

[1] DataAggregator::getFallthroughsInTrace

// Adjust FromBB if the first LBR is a return from the last instruction in
// the previous block (that instruction should be a call).
if (From == FromBB->getOffset() && !BF.containsAddress(FirstLBR.From) &&
!FromBB->isEntryPoint() && !FromBB->isLandingPad()) {
const BinaryBasicBlock *PrevBB =
BF.getLayout().getBlock(FromBB->getIndex() - 1);
if (PrevBB->getSuccessor(FromBB->getLabel())) {
const MCInst *Instr = PrevBB->getLastNonPseudoInstr();
if (Instr && BC.MIB->isCall(*Instr))
FromBB = PrevBB;

Test Plan: added callcont-fallthru.s

Created using spr 1.3.4
@llvmbot
Copy link
Member

llvmbot commented Sep 20, 2024

@llvm/pr-subscribers-bolt

Author: Amir Ayupov (aaupov)

Changes

Add a test reproducing one source of profile discontinuity: missing
call->call continuation fallthrough count when using pre-aggregated
profile.

Test Plan: bin/llvm-lit -a tools/bolt/test/X86/callcont-fallthru.test


Patch is 32.43 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/109486.diff

3 Files Affected:

  • (added) bolt/test/X86/Inputs/callcont-fallthru.preagg (+21)
  • (added) bolt/test/X86/Inputs/callcont-fallthru.yaml (+889)
  • (added) bolt/test/X86/callcont-fallthru.test (+9)
diff --git a/bolt/test/X86/Inputs/callcont-fallthru.preagg b/bolt/test/X86/Inputs/callcont-fallthru.preagg
new file mode 100644
index 00000000000000..0b5f344540573a
--- /dev/null
+++ b/bolt/test/X86/Inputs/callcont-fallthru.preagg
@@ -0,0 +1,21 @@
+B ffffffff81e01006 401194 8 0
+B 401180 401199 98482 96
+B 401199 401166 99542 0
+B 401177 401130 102776 0
+B 401135 40117c 103204 0
+B 401186 40118b 1022983 0
+B 401194 40117c 1021645 1
+F 40117c 401135 1161
+F 40117c 401180 92267
+F 40118b 401194 991002
+F 40117c 401186 968072
+F 40118b 401186 11468
+F 401130 401135 100015
+F 401166 401177 96992
+F 401199 401199 96168
+F 40117c ffffffff81e01006 7
+F 401199 401180 1140
+F 401194 ffffffff81e01006 1
+F 40117c 401194 11522
+F 401166 401199 1151
+F 401130 401177 1154
diff --git a/bolt/test/X86/Inputs/callcont-fallthru.yaml b/bolt/test/X86/Inputs/callcont-fallthru.yaml
new file mode 100644
index 00000000000000..a1f8417d1e217d
--- /dev/null
+++ b/bolt/test/X86/Inputs/callcont-fallthru.yaml
@@ -0,0 +1,889 @@
+--- !ELF
+FileHeader:
+  Class:           ELFCLASS64
+  Data:            ELFDATA2LSB
+  Type:            ET_EXEC
+  Machine:         EM_X86_64
+  Entry:           0x401040
+ProgramHeaders:
+  - Type:            PT_PHDR
+    Flags:           [ PF_R ]
+    VAddr:           0x400040
+    Align:           0x8
+  - Type:            PT_INTERP
+    Flags:           [ PF_R ]
+    FirstSec:        .interp
+    LastSec:         .interp
+    VAddr:           0x400318
+  - Type:            PT_LOAD
+    Flags:           [ PF_R ]
+    FirstSec:        .interp
+    LastSec:         .rela.plt
+    VAddr:           0x400000
+    Align:           0x1000
+  - Type:            PT_LOAD
+    Flags:           [ PF_X, PF_R ]
+    FirstSec:        .init
+    LastSec:         .fini
+    VAddr:           0x401000
+    Align:           0x1000
+  - Type:            PT_LOAD
+    Flags:           [ PF_R ]
+    FirstSec:        .rodata
+    LastSec:         .eh_frame
+    VAddr:           0x402000
+    Align:           0x1000
+  - Type:            PT_LOAD
+    Flags:           [ PF_W, PF_R ]
+    FirstSec:        .init_array
+    LastSec:         .bss
+    VAddr:           0x403DE8
+    Align:           0x1000
+  - Type:            PT_DYNAMIC
+    Flags:           [ PF_W, PF_R ]
+    FirstSec:        .dynamic
+    LastSec:         .dynamic
+    VAddr:           0x403DF8
+    Align:           0x8
+  - Type:            PT_NOTE
+    Flags:           [ PF_R ]
+    FirstSec:        .note.gnu.property
+    LastSec:         .note.gnu.property
+    VAddr:           0x400338
+    Align:           0x8
+  - Type:            PT_NOTE
+    Flags:           [ PF_R ]
+    FirstSec:        .note.gnu.build-id
+    LastSec:         .note.ABI-tag
+    VAddr:           0x400358
+    Align:           0x4
+  - Type:            PT_GNU_PROPERTY
+    Flags:           [ PF_R ]
+    FirstSec:        .note.gnu.property
+    LastSec:         .note.gnu.property
+    VAddr:           0x400338
+    Align:           0x8
+  - Type:            PT_GNU_EH_FRAME
+    Flags:           [ PF_R ]
+    FirstSec:        .eh_frame_hdr
+    LastSec:         .eh_frame_hdr
+    VAddr:           0x402010
+    Align:           0x4
+  - Type:            PT_GNU_STACK
+    Flags:           [ PF_W, PF_R ]
+    Align:           0x10
+  - Type:            PT_GNU_RELRO
+    Flags:           [ PF_R ]
+    FirstSec:        .init_array
+    LastSec:         .got
+    VAddr:           0x403DE8
+Sections:
+  - Name:            .interp
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x400318
+    AddressAlign:    0x1
+    Content:         2F6C696236342F6C642D6C696E75782D7838362D36342E736F2E3200
+  - Name:            .note.gnu.property
+    Type:            SHT_NOTE
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x400338
+    AddressAlign:    0x8
+    Notes:
+      - Name:            GNU
+        Desc:            028000C0040000000300000000000000
+        Type:            NT_GNU_PROPERTY_TYPE_0
+  - Name:            .note.gnu.build-id
+    Type:            SHT_NOTE
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x400358
+    AddressAlign:    0x4
+    Notes:
+      - Name:            GNU
+        Desc:            A77EA471B9AAA21E180E5FD02A0A0B2E4AB643E9
+        Type:            NT_PRPSINFO
+  - Name:            .note.ABI-tag
+    Type:            SHT_NOTE
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x40037C
+    AddressAlign:    0x4
+    Notes:
+      - Name:            GNU
+        Desc:            '00000000030000000200000000000000'
+        Type:            NT_VERSION
+  - Name:            .gnu.hash
+    Type:            SHT_GNU_HASH
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x4003A0
+    Link:            .dynsym
+    AddressAlign:    0x8
+    Header:
+      SymNdx:          0x1
+      Shift2:          0x0
+    BloomFilter:     [ 0x0 ]
+    HashBuckets:     [ 0x0 ]
+    HashValues:      [  ]
+  - Name:            .dynsym
+    Type:            SHT_DYNSYM
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x4003C0
+    Link:            .dynstr
+    AddressAlign:    0x8
+  - Name:            .dynstr
+    Type:            SHT_STRTAB
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x400450
+    AddressAlign:    0x1
+  - Name:            .gnu.version
+    Type:            SHT_GNU_versym
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x4004CE
+    Link:            .dynsym
+    AddressAlign:    0x2
+    Entries:         [ 0, 2, 1, 1, 3, 1 ]
+  - Name:            .gnu.version_r
+    Type:            SHT_GNU_verneed
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x4004E0
+    Link:            .dynstr
+    AddressAlign:    0x8
+    Dependencies:
+      - Version:         1
+        File:            libc.so.6
+        Entries:
+          - Name:            GLIBC_2.2.5
+            Hash:            157882997
+            Flags:           0
+            Other:           3
+          - Name:            GLIBC_2.34
+            Hash:            110530996
+            Flags:           0
+            Other:           2
+  - Name:            .rela.dyn
+    Type:            SHT_RELA
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x400510
+    Link:            .dynsym
+    AddressAlign:    0x8
+    Relocations:
+      - Offset:          0x403FC8
+        Symbol:          __libc_start_main
+        Type:            R_X86_64_GLOB_DAT
+      - Offset:          0x403FD0
+        Symbol:          _ITM_deregisterTMCloneTable
+        Type:            R_X86_64_GLOB_DAT
+      - Offset:          0x403FD8
+        Symbol:          __gmon_start__
+        Type:            R_X86_64_GLOB_DAT
+      - Offset:          0x403FE0
+        Symbol:          _ITM_registerTMCloneTable
+        Type:            R_X86_64_GLOB_DAT
+  - Name:            .rela.plt
+    Type:            SHT_RELA
+    Flags:           [ SHF_ALLOC, SHF_INFO_LINK ]
+    Address:         0x400570
+    Link:            .dynsym
+    AddressAlign:    0x8
+    Info:            .got.plt
+    Relocations:
+      - Offset:          0x404000
+        Symbol:          atoi
+        Type:            R_X86_64_JUMP_SLOT
+  - Name:            .init
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
+    Address:         0x401000
+    AddressAlign:    0x4
+    Offset:          0x1000
+    Content:         F30F1EFA4883EC08488B05C92F00004885C07402FFD04883C408C3
+  - Name:            .plt
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
+    Address:         0x401020
+    AddressAlign:    0x10
+    EntSize:         0x10
+    Content:         FF35CA2F0000FF25CC2F00000F1F4000FF25CA2F00006800000000E9E0FFFFFF
+  - Name:            .text
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
+    Address:         0x401040
+    AddressAlign:    0x10
+    Content:         F30F1EFA31ED4989D15E4889E24883E4F050544531C031C948C7C740114000FF15632F0000F4662E0F1F840000000000F30F1EFAC3662E0F1F84000000000090488D3D892F0000488D05822F00004839F87415488B05362F00004885C07409FFE00F1F8000000000C30F1F8000000000488D3D592F0000488D35522F00004829FE4889F048C1EE3F48C1F8034801C648D1FE7414488B05052F00004885C07408FFE0660F1F440000C30F1F8000000000F30F1EFA803D152F0000007513554889E5E87AFFFFFFC605032F0000015DC390C366662E0F1F8400000000000F1F4000F30F1EFAEB8A662E0F1F840000000000554889E55DC3662E0F1F840000000000554889E54883EC20C745FC00000000897DF8488975F0488B45F0488B7808E8CDFEFFFF8945EC837DEC000F842E000000C745E80A000000E8B4FFFFFF837DE8000F8413000000E9000000008B45E883C0FF8945E8E9E3FFFFFFE9C8FFFFFF31C04883C4205DC3
+  - Name:            .fini
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
+    Address:         0x4011A8
+    AddressAlign:    0x4
+    Content:         F30F1EFA4883EC084883C408C3
+  - Name:            .rodata
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x402000
+    AddressAlign:    0x8
+    Offset:          0x2000
+    Content:         '01000200000000000000000000000000'
+  - Name:            .eh_frame_hdr
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x402010
+    AddressAlign:    0x4
+    Content:         011B033B340000000500000010F0FFFF7800000030F0FFFF5000000060F0FFFF6400000020F1FFFFA000000030F1FFFFC0000000
+  - Name:            .eh_frame
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC ]
+    Address:         0x402048
+    AddressAlign:    0x8
+    Content:         1400000000000000017A5200017810011B0C070890010000100000001C000000D8EFFFFF26000000004407101000000030000000F4EFFFFF0500000000000000240000004400000090EFFFFF20000000000E10460E184A0F0B770880003F1A3B2A332422000000001C0000006C00000078F0FFFF0600000000410E108602430D06410C07080000001C0000008C00000068F0FFFF6600000000410E108602430D0602610C0708000000000000
+  - Name:            .init_array
+    Type:            SHT_INIT_ARRAY
+    Flags:           [ SHF_WRITE, SHF_ALLOC ]
+    Address:         0x403DE8
+    AddressAlign:    0x8
+    EntSize:         0x8
+    Offset:          0x2DE8
+    Content:         '2011400000000000'
+  - Name:            .fini_array
+    Type:            SHT_FINI_ARRAY
+    Flags:           [ SHF_WRITE, SHF_ALLOC ]
+    Address:         0x403DF0
+    AddressAlign:    0x8
+    EntSize:         0x8
+    Content:         F010400000000000
+  - Name:            .dynamic
+    Type:            SHT_DYNAMIC
+    Flags:           [ SHF_WRITE, SHF_ALLOC ]
+    Address:         0x403DF8
+    Link:            .dynstr
+    AddressAlign:    0x8
+    Entries:
+      - Tag:             DT_NEEDED
+        Value:           0x18
+      - Tag:             DT_INIT
+        Value:           0x401000
+      - Tag:             DT_FINI
+        Value:           0x4011A8
+      - Tag:             DT_INIT_ARRAY
+        Value:           0x403DE8
+      - Tag:             DT_INIT_ARRAYSZ
+        Value:           0x8
+      - Tag:             DT_FINI_ARRAY
+        Value:           0x403DF0
+      - Tag:             DT_FINI_ARRAYSZ
+        Value:           0x8
+      - Tag:             DT_GNU_HASH
+        Value:           0x4003A0
+      - Tag:             DT_STRTAB
+        Value:           0x400450
+      - Tag:             DT_SYMTAB
+        Value:           0x4003C0
+      - Tag:             DT_STRSZ
+        Value:           0x7E
+      - Tag:             DT_SYMENT
+        Value:           0x18
+      - Tag:             DT_DEBUG
+        Value:           0x0
+      - Tag:             DT_PLTGOT
+        Value:           0x403FE8
+      - Tag:             DT_PLTRELSZ
+        Value:           0x18
+      - Tag:             DT_PLTREL
+        Value:           0x7
+      - Tag:             DT_JMPREL
+        Value:           0x400570
+      - Tag:             DT_RELA
+        Value:           0x400510
+      - Tag:             DT_RELASZ
+        Value:           0x60
+      - Tag:             DT_RELAENT
+        Value:           0x18
+      - Tag:             DT_VERNEED
+        Value:           0x4004E0
+      - Tag:             DT_VERNEEDNUM
+        Value:           0x1
+      - Tag:             DT_VERSYM
+        Value:           0x4004CE
+      - Tag:             DT_NULL
+        Value:           0x0
+      - Tag:             DT_NULL
+        Value:           0x0
+      - Tag:             DT_NULL
+        Value:           0x0
+      - Tag:             DT_NULL
+        Value:           0x0
+      - Tag:             DT_NULL
+        Value:           0x0
+      - Tag:             DT_NULL
+        Value:           0x0
+  - Name:            .got
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_WRITE, SHF_ALLOC ]
+    Address:         0x403FC8
+    AddressAlign:    0x8
+    EntSize:         0x8
+    Content:         '0000000000000000000000000000000000000000000000000000000000000000'
+  - Name:            .got.plt
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_WRITE, SHF_ALLOC ]
+    Address:         0x403FE8
+    AddressAlign:    0x8
+    EntSize:         0x8
+    Content:         F83D400000000000000000000000000000000000000000003610400000000000
+  - Name:            .data
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_WRITE, SHF_ALLOC ]
+    Address:         0x404008
+    AddressAlign:    0x1
+    Content:         '00000000'
+  - Name:            .tm_clone_table
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_WRITE, SHF_ALLOC ]
+    Address:         0x404010
+    AddressAlign:    0x8
+  - Name:            .bss
+    Type:            SHT_NOBITS
+    Flags:           [ SHF_WRITE, SHF_ALLOC ]
+    Address:         0x404010
+    AddressAlign:    0x1
+    Size:            0x8
+  - Name:            .comment
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_MERGE, SHF_STRINGS ]
+    AddressAlign:    0x1
+    EntSize:         0x1
+    Content:         4743433A2028474E55292031312E352E302032303234303731392028526564204861742031312E352E302D3229004743433A2028474E55292031332E332E312032303234303631312028526564204861742031332E332E312D322900636C616E672076657273696F6E2031382E312E38202843656E744F532031382E312E382D332E656C392900
+  - Name:            .gnu.build.attributes
+    Type:            SHT_NOTE
+    Address:         0x406018
+    AddressAlign:    0x4
+    Notes:
+      - Name:            "GA$\x013a1"
+        Desc:            '40104000000000006610400000000000'
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013a1"
+        Desc:            '75104000000000007510400000000000'
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013a1"
+        Desc:            '00104000000000001610400000000000'
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013a1"
+        Desc:            A811400000000000B011400000000000
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013a1"
+        Desc:            '80104000000000002611400000000000'
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013a1"
+        Desc:            A611400000000000A611400000000000
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013a1"
+        Desc:            A611400000000000A611400000000000
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013a1"
+        Desc:            16104000000000001B10400000000000
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+      - Name:            "GA$\x013a1"
+        Desc:            B011400000000000B511400000000000
+        Type:            NT_GNU_BUILD_ATTRIBUTE_OPEN
+  - Name:            .rela.init
+    Type:            SHT_RELA
+    Flags:           [ SHF_INFO_LINK ]
+    Link:            .symtab
+    AddressAlign:    0x8
+    Info:            .init
+    Relocations:
+      - Offset:          0x40100B
+        Symbol:          __gmon_start__
+        Type:            R_X86_64_REX_GOTPCRELX
+        Addend:          -4
+  - Name:            .rela.text
+    Type:            SHT_RELA
+    Flags:           [ SHF_INFO_LINK ]
+    Link:            .symtab
+    AddressAlign:    0x8
+    Info:            .text
+    Relocations:
+      - Offset:          0x40105B
+        Symbol:          main
+        Type:            R_X86_64_32S
+      - Offset:          0x401061
+        Symbol:          '__libc_start_main@GLIBC_2.34'
+        Type:            R_X86_64_GOTPCRELX
+        Addend:          -4
+      - Offset:          0x401083
+        Symbol:          .tm_clone_table
+        Type:            R_X86_64_PC32
+        Addend:          -4
+      - Offset:          0x40108A
+        Symbol:          __TMC_END__
+        Type:            R_X86_64_PC32
+        Addend:          -4
+      - Offset:          0x401096
+        Symbol:          _ITM_deregisterTMCloneTable
+        Type:            R_X86_64_REX_GOTPCRELX
+        Addend:          -4
+      - Offset:          0x4010B3
+        Symbol:          .tm_clone_table
+        Type:            R_X86_64_PC32
+        Addend:          -4
+      - Offset:          0x4010BA
+        Symbol:          __TMC_END__
+        Type:            R_X86_64_PC32
+        Addend:          -4
+      - Offset:          0x4010D7
+        Symbol:          _ITM_registerTMCloneTable
+        Type:            R_X86_64_REX_GOTPCRELX
+        Addend:          -4
+      - Offset:          0x4010F6
+        Symbol:          .bss
+        Type:            R_X86_64_PC32
+        Addend:          -5
+      - Offset:          0x401108
+        Symbol:          .bss
+        Type:            R_X86_64_PC32
+        Addend:          -5
+      - Offset:          0x40115F
+        Symbol:          'atoi@GLIBC_2.2.5'
+        Type:            R_X86_64_PLT32
+        Addend:          -4
+      - Offset:          0x401178
+        Symbol:          foo
+        Type:            R_X86_64_PLT32
+        Addend:          -4
+  - Name:            .rela.eh_frame
+    Type:            SHT_RELA
+    Flags:           [ SHF_INFO_LINK ]
+    Link:            .symtab
+    AddressAlign:    0x8
+    Info:            .eh_frame
+    Relocations:
+      - Offset:          0x402068
+        Symbol:          .text
+        Type:            R_X86_64_PC32
+      - Offset:          0x40207C
+        Symbol:          .text
+        Type:            R_X86_64_PC32
+        Addend:          48
+      - Offset:          0x4020B8
+        Symbol:          .text
+        Type:            R_X86_64_PC32
+        Addend:          240
+      - Offset:          0x4020D8
+        Symbol:          .text
+        Type:            R_X86_64_PC32
+        Addend:          256
+  - Name:            .rela.init_array
+    Type:            SHT_RELA
+    Flags:           [ SHF_INFO_LINK ]
+    Link:            .symtab
+    AddressAlign:    0x8
+    Info:            .init_array
+    Relocations:
+      - Offset:          0x403DE8
+        Symbol:          .text
+        Type:            R_X86_64_64
+        Addend:          224
+  - Name:            .rela.fini_array
+    Type:            SHT_RELA
+    Flags:           [ SHF_INFO_LINK ]
+    Link:            .symtab
+    AddressAlign:    0x8
+    Info:            .fini_array
+    Relocations:
+      - Offset:          0x403DF0
+        Symbol:          .text
+        Type:            R_X86_64_64
+        Addend:          176
+  - Name:            .rela.gnu.build.attributes
+    Type:            SHT_RELA
+    Flags:           [ SHF_INFO_LINK ]
+    Link:            .symtab
+    AddressAlign:    0x8
+    Info:            .gnu.build.attributes
+    Relocations:
+      - Offset:          0x40602C
+        Symbol:          .text
+        Type:            R_X86_64_64
+      - Offset:          0x406034
+        Symbol:          .text
+        Type:            R_X86_64_64
+        Addend:          38
+      - Offset:          0x406050
+        Symbol:          .text
+        Type:            R_X86_64_64
+        Addend:          53
+      - Offset:          0x406058
+        Symbol:          .text
+        Type:            R_X86_64_64
+        Addend:          53
+      - Offset:          0x406074
+        Symbol:          .init
+        Type:            R_X86_64_64
+      - Offset:          0x40607C
+        Symbol:          .init
+        Type:           ...
[truncated]

@maksfb
Copy link
Contributor

maksfb commented Sep 21, 2024

It's hard to understand the test case without knowing the context. Is there any way we can make it more clear?

@aaupov aaupov changed the title [BOLT][test] Add callcont-fallthru.test [BOLT] Record return profile as call to continuation fallthrough Sep 21, 2024
Copy link

github-actions bot commented Sep 21, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 76a52db1edbd681058c291da0314af24b42925a3 9b3db9b2f14d9c05d1ebee0383cd57ba6b22fba0 --extensions cpp,h -- bolt/include/bolt/Core/BinaryFunction.h bolt/include/bolt/Profile/DataAggregator.h bolt/lib/Profile/DataAggregator.cpp
View the diff from clang-format here.
diff --git a/bolt/lib/Profile/DataAggregator.cpp b/bolt/lib/Profile/DataAggregator.cpp
index 967499063b..697cac9fbc 100644
--- a/bolt/lib/Profile/DataAggregator.cpp
+++ b/bolt/lib/Profile/DataAggregator.cpp
@@ -833,8 +833,8 @@ bool DataAggregator::doBranch(uint64_t From, uint64_t To, uint64_t Count,
   };
 
   uint64_t ToOrig = To;
-  auto [FromFunc, IsReturn] = handleAddress(From, /*IsFrom*/true);
-  auto [ToFunc, IsCallCont] = handleAddress(To, /*IsFrom*/false);
+  auto [FromFunc, IsReturn] = handleAddress(From, /*IsFrom*/ true);
+  auto [ToFunc, IsCallCont] = handleAddress(To, /*IsFrom*/ false);
   if (!FromFunc && !ToFunc)
     return false;
 
@@ -1824,7 +1824,7 @@ void DataAggregator::processPreAggregated() {
     switch (AggrEntry.EntryType) {
     case AggregatedLBREntry::BRANCH:
       doBranch(AggrEntry.From.Offset, AggrEntry.To.Offset, AggrEntry.Count,
-               AggrEntry.Mispreds, /*IsPreagg*/true);
+               AggrEntry.Mispreds, /*IsPreagg*/ true);
       break;
     case AggregatedLBREntry::FT:
     case AggregatedLBREntry::FT_EXTERNAL_ORIGIN: {

Created using spr 1.3.4
@aaupov aaupov changed the title [BOLT] Record return profile as call to continuation fallthrough [BOLT] Extract call continuation traces from pre-aggregated profile Oct 2, 2024
Copy link
Contributor

@rafaelauler rafaelauler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Amir, what's the status here, do you plan to make this only kick in for preaggregated profile, or are you done equalizing both cases by removing the adjustment in line 906?

@@ -903,24 +956,6 @@ DataAggregator::getFallthroughsInTrace(BinaryFunction &BF,
if (!FromBB || !ToBB)
return std::nullopt;

// Adjust FromBB if the first LBR is a return from the last instruction in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference to aid in review the code being removed here comes from 5599c01

@aaupov
Copy link
Contributor Author

aaupov commented Oct 27, 2024

Hi Amir, what's the status here, do you plan to make this only kick in for preaggregated profile, or are you done equalizing both cases by removing the adjustment in line 906?

Performance testing shows it's better to restrict return profile conversion to pre-aggregated profile only. I'll update the description.

@aaupov aaupov changed the title [BOLT] Extract call continuation traces from pre-aggregated profile [BOLT] Set call to continuation count in pre-aggregated profile Oct 28, 2024
@ShatianWang
Copy link
Contributor

LGTM. Thanks for working on this!

@aaupov aaupov requested a review from ShatianWang October 29, 2024 19:44
Created using spr 1.3.4
Created using spr 1.3.4
@aaupov
Copy link
Contributor Author

aaupov commented Nov 7, 2024

@maksfb – I've dropped the check that the preceding instruction is a call. Performance is neutral.

Created using spr 1.3.4
Created using spr 1.3.4
@aaupov aaupov merged commit 74e6478 into main Nov 8, 2024
5 of 6 checks passed
@aaupov aaupov deleted the users/aaupov/spr/bolttest-add-callcont-fallthrutest branch November 8, 2024 00:20
aaupov added a commit to rafaelauler/bolt-tests that referenced this pull request Nov 8, 2024
The test consumes pre-aggregated profile directly. We extract more
information out of it with
llvm/llvm-project#109486, which changes the
optimized binary.
Groverkss pushed a commit to iree-org/llvm-project that referenced this pull request Nov 15, 2024
llvm#109683 identified an issue with pre-aggregated profile where a call to
continuation fallthrough edge count is missing (profile discontinuity).

This issue only affects pre-aggregated profile but not perf data since
LBR stack has the necessary information to determine if the trace (fall-
through) starts at call continuation, whereas pre-aggregated fallthrough
lacks this information.

The solution is to look at branch records in pre-aggregated profiles
that correspond to returns and assign counts to call to continuation
fallthrough:
- BranchFrom is in another function or DSO,
- BranchTo may be a call continuation site:
  - not an entry point/landing pad.

Note that we can't directly check if BranchFrom corresponds to a return
instruction if it's in external DSO.

Keep call continuation handling for perf data (`getFallthroughsInTrace`)
[1] as-is due to marginally better performance. The difference is that
return-converted call to continuation fallthrough is slightly more
frequent than other fallthroughs since the former only requires one LBR
address while the latter need two that belong to the profiled binary.
Hence return-converted fallthroughs have larger "weight" which affects
code layout.

[1] `DataAggregator::getFallthroughsInTrace`
https://github.com/llvm/llvm-project/blob/fea18afeed39fe4435d67eee1834f0f34b23013d/bolt/lib/Profile/DataAggregator.cpp#L906-L915

Test Plan: added callcont-fallthru.s

Reviewers: maksfb, ayermolo, ShatianWang, dcci

Reviewed By: maksfb, ShatianWang

Pull Request: llvm#109486
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants