Skip to content

[RISCV] Let -data-sections also work on sbss/sdata sections #87040

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 23, 2024

Conversation

KaiYG
Copy link
Contributor

@KaiYG KaiYG commented Mar 29, 2024

Add an unique suffix to .sbss/.sdata if -fdata-sections.
Without assigning an unique .sbss/.sdata section to each symbols, a linker may not be able to remove unused part when gc-section since all used and unused symbols are all mixed in the same .sbss/.sdata section.
I believe this also matches the behavior of gcc.

KaiYG added 2 commits March 29, 2024 15:05
…sections

Copied from sdata-limit-8.ll and update the CHECKs, otherwise it matches
any .sbss/.sdata regardless having a suffix or not.
Without assigning an unique .sbss/.sdata section to each symbols, a linker
may not be able to remove unused part when gc-section since all used and
unused symbols are all mixed in the same .sbss/.sdata section.
I believe this also matches the behavior of gcc.
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Mar 29, 2024

@llvm/pr-subscribers-backend-risc-v

Author: KaiWeng (KaiYG)

Changes

Add an unique suffix to .sbss/.sdata if -fdata-sections.
Without assigning an unique .sbss/.sdata section to each symbols, a linker may not be able to remove unused part when gc-section since all used and unused symbols are all mixed in the same .sbss/.sdata section.
I believe this also matches the behavior of gcc.


Full diff: https://github.com/llvm/llvm-project/pull/87040.diff

2 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVTargetObjectFile.cpp (+30-4)
  • (added) llvm/test/CodeGen/RISCV/sdata-sections.ll (+17)
diff --git a/llvm/lib/Target/RISCV/RISCVTargetObjectFile.cpp b/llvm/lib/Target/RISCV/RISCVTargetObjectFile.cpp
index e7c1a7e5d8bca1..b13df905249ab5 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetObjectFile.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetObjectFile.cpp
@@ -104,10 +104,36 @@ bool RISCVELFTargetObjectFile::isGlobalInSmallSection(
 MCSection *RISCVELFTargetObjectFile::SelectSectionForGlobal(
     const GlobalObject *GO, SectionKind Kind, const TargetMachine &TM) const {
   // Handle Small Section classification here.
-  if (Kind.isBSS() && isGlobalInSmallSection(GO, TM))
-    return SmallBSSSection;
-  if (Kind.isData() && isGlobalInSmallSection(GO, TM))
-    return SmallDataSection;
+  if (isGlobalInSmallSection(GO, TM)) {
+    // Emit a unique section for sdata/sbss when -fdata-section.
+    bool EmitUniquedSection = TM.getDataSections();
+
+    if (Kind.isBSS()) {
+      if (EmitUniquedSection) {
+        StringRef Prefix(".sbss");
+        SmallString<128> Name(Prefix);
+        Name.append(".");
+        Name.append(GO->getName());
+        return getContext().getELFSection(Name.str(), ELF::SHT_NOBITS,
+                                          ELF::SHF_WRITE | ELF::SHF_ALLOC);
+      }
+
+      return SmallBSSSection;
+    }
+
+    if (Kind.isData()) {
+      if (EmitUniquedSection) {
+        StringRef Prefix(".sdata");
+        SmallString<128> Name(Prefix);
+        Name.append(".");
+        Name.append(GO->getName());
+        return getContext().getELFSection(Name.str(), ELF::SHT_PROGBITS,
+                                          ELF::SHF_WRITE | ELF::SHF_ALLOC);
+      }
+
+      return SmallDataSection;
+    }
+  }
 
   // Otherwise, we work the same as ELF.
   return TargetLoweringObjectFileELF::SelectSectionForGlobal(GO, Kind, TM);
diff --git a/llvm/test/CodeGen/RISCV/sdata-sections.ll b/llvm/test/CodeGen/RISCV/sdata-sections.ll
new file mode 100644
index 00000000000000..175e6d744843a8
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/sdata-sections.ll
@@ -0,0 +1,17 @@
+; RUN: llc -mtriple=riscv32 -data-sections < %s | FileCheck -check-prefix=RV32 %s
+; RUN: llc -mtriple=riscv64 -data-sections < %s | FileCheck -check-prefix=RV64 %s
+
+; Append an unique name to each sdata/sbss section when -data-section.
+
+@v = dso_local global i32 0, align 4
+@r = dso_local global i64 7, align 8
+
+; SmallDataLimit set to 8, so we expect @v will be put in sbss
+; and @r will be put in sdata.
+!llvm.module.flags = !{!0}
+!0 = !{i32 8, !"SmallDataLimit", i32 8}
+
+; RV32:    .section        .sbss.v,"aw"
+; RV32:    .section        .sdata.r,"aw"
+; RV64:    .section        .sbss.v,"aw"
+; RV64:    .section        .sdata.r,"aw"

@KaiYG
Copy link
Contributor Author

KaiYG commented Jun 12, 2024

Ping.
I think this patch will save code size on both bfd/lld linker, and a similar patch has been done on the Hexagon backend.

Copy link
Member

@kito-cheng kito-cheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

@jonathonpenix
Copy link
Contributor

Are there any outstanding concerns on this patch? We recently saw this issue and I was wondering if it would be ok to merge this at this point. Thanks!

@topperc
Copy link
Collaborator

topperc commented Aug 15, 2024

Do we need to update this code in lld/ELF/Writer.cpp to handle the suffix?

  if (config->emachine == EM_RISCV) {                                            
    // .sdata and .sbss are placed closer to make GP relaxation more profitable  
    // and match GNU ld.                                                         
    StringRef name = osec.name;                                                  
    if (name == ".sdata" || (osec.type == SHT_NOBITS && name != ".sbss"))        
      rank |= 1;                                                                 
  } 

Probably should be a separate patch.

@lenary
Copy link
Member

lenary commented Aug 15, 2024

@topperc yeah, I was thinking that too - I'll take a closer look.

lenary added a commit to lenary/llvm-project that referenced this pull request Aug 15, 2024
RISC-V GCC with `-fdata-sections` will emit `.sbss.<name>` and
`.sdata.<name>` sections for small data items. We intend to add the same
behaviour to Clang/LLVM in llvm#87040.

This change ensures that any section starting `.sbss` is put in an
output section called `.sbss`, and the same for `.sdata`, so that the
RISC-V specific code for allocating an output order for `.sbss` and
`.sdata` sections can apply.

I did not add a `.sdata.rel.ro` prefix to the list as clang/llvm does
not seem to create these at all. Whether it should be or not is a
different matter.
Copy link
Member

@MaskRay MaskRay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we do more work with sdata/sbss I think we should consider
#83093

or at least ensure regular non-baremetal targets don't get weird non-zero -msmall-data-limit=

lenary added a commit to lenary/llvm-project that referenced this pull request Aug 16, 2024
RISC-V GCC with `-fdata-sections` will emit `.sbss.<name>`,
`.srodata.<name>`, and `.sdata.<name>` sections for small data items of
different kinds. Clang/LLVM already emits `.srodata.*` sections, and we
intend to emit the other two section name patterns in llvm#87040.

This change ensures that any input sections starting `.sbss` are
combined into one output section called `.sbss`, and the same
respectively for `.srodata` and `.sdata`. This also allows the existing
RISC-V specific code for determining an output order for `.sbss` and
`.sdata` sections to apply to placing the sections.
lenary added a commit to lenary/llvm-project that referenced this pull request Aug 19, 2024
RISC-V GCC with `-fdata-sections` will emit `.sbss.<name>`,
`.srodata.<name>`, and `.sdata.<name>` sections for small data items of
different kinds. Clang/LLVM already emits `.srodata.*` sections, and we
intend to emit the other two section name patterns in llvm#87040.

This change ensures that any input sections starting `.sbss` are
combined into one output section called `.sbss`, and the same
respectively for `.srodata` and `.sdata`. This also allows the existing
RISC-V specific code for determining an output order for `.sbss` and
`.sdata` sections to apply to placing the sections.
lenary added a commit that referenced this pull request Aug 19, 2024
RISC-V GCC with `-fdata-sections` will emit `.sbss.<name>`,
`.srodata.<name>`, and `.sdata.<name>` sections for small data items of
different kinds. Clang/LLVM already emits `.srodata.*` sections, and we
intend to emit the other two section name patterns in #87040.

This change ensures that any input sections starting `.sbss` are
combined into one output section called `.sbss`, and the same
respectively for `.srodata` and `.sdata`. This also allows the existing
RISC-V specific code for determining an output order for `.sbss` and
`.sdata` sections to apply to placing the sections.
@petrhosek
Copy link
Member

There's a change in semantics which I think is undesirable due to https://github.com/llvm/llvm-project/pull/87040/files#diff-16406b22032eaeb81527fc5cf1938dd0981c926fcad2f5b2e9a9f9db7d31f03cR79-R82

For example, if you have the following:

__attribute__((section(".bss"))) int foo = 0;

foo will end up in .bss even when -fdata-sections is set.

However, if you have the following:

__attribute__((section(".sbss"))) int foo = 0;

foo will end up in .sbss.foo after this change and I think that's undesirable.

If a symbol has an explicit section, we should honor that. This could be achieved simply by always returning false if https://github.com/llvm/llvm-project/pull/87040/files#diff-16406b22032eaeb81527fc5cf1938dd0981c926fcad2f5b2e9a9f9db7d31f03cL76 is true.

@KaiYG
Copy link
Contributor Author

KaiYG commented Aug 20, 2024

There's a change in semantics which I think is undesirable due to https://github.com/llvm/llvm-project/pull/87040/files#diff-16406b22032eaeb81527fc5cf1938dd0981c926fcad2f5b2e9a9f9db7d31f03cR79-R82

Good catch. But I patched this issue in another way, as I think isGlobalInSmallSection should still return true for explicit sdata/sbss sections, and EmitUniquedSection should be set to false instead.

Copy link
Member

@lenary lenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-looked at this code and there are still issues with globals with explicit sections

Copy link
Member

@lenary lenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lenary
Copy link
Member

lenary commented Aug 22, 2024

@MaskRay ping?

@lenary
Copy link
Member

lenary commented Aug 23, 2024

@KaiYG right now, if i try to squash merge, the commit will use your @users.noreply.github.com email address, which LLVM doesn't want in the commit history - please can you follow the instructions here: Setting your commit email address on GitHub to change it to your andestech email? Then I will squash/merge.

@KaiYG
Copy link
Contributor Author

KaiYG commented Aug 23, 2024

@lenary thanks for the heads up! I've changed my email settings and it should show my public email now.

@lenary lenary merged commit 4d348f7 into llvm:main Aug 23, 2024
8 checks passed
@lenary
Copy link
Member

lenary commented Aug 23, 2024

Thanks, that did the right thing :)

Copy link

@KaiYG Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

cjdb pushed a commit to cjdb/llvm-project that referenced this pull request Aug 23, 2024
Add an unique suffix to .sbss/.sdata if -fdata-sections.
Without assigning an unique .sbss/.sdata section to each symbols, a
linker may not be able to remove unused part when gc-section since all
used and unused symbols are all mixed in the same .sbss/.sdata section.
I believe this also matches the behavior of gcc.
@lenary
Copy link
Member

lenary commented Aug 27, 2024

I forgot that a question about backporting this change came up in the RISC-V sync-up. I think we're too late to backport this to 19.x, as it isn't a bugfix, it's an improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants