Skip to content

[NFC] [clang] Use std::string instead of StringRef to reduce stack usage #114285

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

chinmaydd
Copy link
Contributor

Comparisons between StringRef and string literals are lowered by MSVC to happen on the stack. All string literals are allocated unique stack slots increasing the overall stack usage of calculateAttributeSpellingListIndex. Use of std::string forces allocation of a string type per literal, reducing the overall stack usage of the function.

Comparisons between StringRef and string literals are lowered by MSVC to happen on the stack.
All string literals are allocated unique stack slots increasing the overall stack usage of
calculateAttributeSpellingListIndex. Use of std::string forces allocation of a string type per
literal, reducing the overall stack usage of the function.

Change-Id: I7ac39fc96ab2e559318413fa26ef304c3fb0bd6e
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Oct 30, 2024
@chinmaydd chinmaydd requested a review from lamb-j October 30, 2024 18:23
@llvmbot
Copy link
Member

llvmbot commented Oct 30, 2024

@llvm/pr-subscribers-clang

Author: Chinmay Deshpande (chinmaydd)

Changes

Comparisons between StringRef and string literals are lowered by MSVC to happen on the stack. All string literals are allocated unique stack slots increasing the overall stack usage of calculateAttributeSpellingListIndex. Use of std::string forces allocation of a string type per literal, reducing the overall stack usage of the function.


Full diff: https://github.com/llvm/llvm-project/pull/114285.diff

1 Files Affected:

  • (modified) clang/lib/Basic/Attributes.cpp (+4-2)
diff --git a/clang/lib/Basic/Attributes.cpp b/clang/lib/Basic/Attributes.cpp
index 867d241a2cf847..0083a5ae0a55fb 100644
--- a/clang/lib/Basic/Attributes.cpp
+++ b/clang/lib/Basic/Attributes.cpp
@@ -157,8 +157,10 @@ unsigned AttributeCommonInfo::calculateAttributeSpellingListIndex() const {
   // Both variables will be used in tablegen generated
   // attribute spell list index matching code.
   auto Syntax = static_cast<AttributeCommonInfo::Syntax>(getSyntax());
-  StringRef Scope = normalizeAttrScopeName(getScopeName(), Syntax);
-  StringRef Name = normalizeAttrName(getAttrName(), Scope, Syntax);
+  // We use std::string instead of StringRef to prevent local stack 
+  // allocation of literal strings for comparison.
+  const std::string Scope = normalizeAttrScopeName(getScopeName(), Syntax);
+  const std::string Name = normalizeAttrName(getAttrName(), Scope, Syntax);
 
 #include "clang/Sema/AttrSpellingListIndex.inc"
 }

@chinmaydd chinmaydd requested a review from arsenm October 30, 2024 18:24
Copy link

github-actions bot commented Oct 30, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see how or why this would introduce stackusage. std::string temporaries should be avoided whenever possible

@kparzysz
Copy link
Contributor

Don't see how or why this would introduce stackusage. std::string temporaries should be avoided whenever possible

The code that's included afterwards contains a lot of comparisons with string literals:

if (Name == "aarch64_sve_pcs" && getSyntax() == AttributeCommonInfo::AS_GNU && Scope == "")
    return 0;
if (Name == "aarch64_sve_pcs" && getSyntax() == AttributeCommonInfo::AS_CXX11 && Scope == "clang")
    return 1;
if (Name == "aarch64_sve_pcs" && getSyntax() == AttributeCommonInfo::AS_C23 && Scope == "clang")
    return 2;

IIUC, @chinmaydd claims that MSVC allocates these literals on the stack, and they take up more space than a std::string would.

@arsenm
Copy link
Contributor

arsenm commented Oct 30, 2024

That sounds like MSVC's problem to solve. Why does the amount of stack size matter

@AaronBallman
Copy link
Collaborator

That sounds like MSVC's problem to solve. Why does the amount of stack size matter

Yeah, I'd like to know what the benefit is -- StringRef seems like the correct type to use, not std::string, IMO. Is this showing up in a performance profile or something?

@kparzysz
Copy link
Contributor

kparzysz commented Oct 30, 2024

I was able to see the internal bug report. This shows up when some code on Windows runs clang (via a library call), and it ends up crashing due to stack overflow. The crash happens in this function, in "__chkstk" to be exact.

Edit: I'm afraid that reducing stack usage in this function will only delay the problem until later. I don't know if the code can be set up to have more stack space allocated, but I think that this would be the proper thing to do.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you really want to work around MSVC's bug, you could change tablegen to emit the string literals as constant globals / StringLiteral

@erichkeane
Copy link
Collaborator

If you really want to work around MSVC's bug, you could change tablegen to emit the string literals as constant globals / StringLiteral

I spent a bit thinking about this, and this function is incredibly wasteful. It ends up re-doing a ton of work that is ALREADY done. It was convenient at the time to write it that way, but with a little intelligence at tablegen, we can simplify this function a million-times.

Some things we can do (and should do all):
1- We can calculate the scope, so we don't actually compare those. We're checking against "clang" and "" constantly, lets pre-calculate that into some sort of enum. There are only so many valid values here (and, IIRC, this has already been checked to be valid), so we can switch on those easily. PERHAPS another tablegen file to make sure we get all the normalized ones, but pretty trivial.

2- we could reduce the number of comparisons against the name:
At that point, it ends up being:

if (Name == "aarch64_sve_pcs") {
  if (getSyntax() == AttributeCommonInfo::AS_GNU && calculatedScope == Scope::None)
     return 0;
if (getSyntax() == AttributeCommonInfo::AS_CXX11 && calculatedScope == Scope::Clang)
    return 1;
if (getSyntax() == AttributeCommonInfo::AS_C23 && calculatedScope == Clang)
    return 2;

3- STOP being dummies and re-doing all the work that the trie in getAttrKind already did! We already know that the spelling is one-of-a-couple for each of the variants, so the actual 'does this string match this other one' is useless, we just need to know which it is in a list. MOST (including the one in the example above), all are the same, so those comparisons are dumb/wasted. Lets see if we can remove those.

For cases where there are multiple spellings, we can compare length! For example:
1st I found where spelling matters, is AT_ArgumentWithTypeTag:, which is either argument_with_type_tag or pointer_with_type_tag. At THAT point, we could just do Name.length() == 22 and Name.length() == 21.

We already know it is ONE of those two, and just need to figure out which.

IN SUMMARY: I am convinced somewhat we can just teach tablegen to be 'smarter' about this file and save ourselves ~100% of those string compares and replace them with numeric comparisons, or enum comparisons.

The result is something that is:
A- acceptable to the community/maintainers
B- sidesteps/never has this problem again for MSVC
C- Improves performance.

@erichkeane
Copy link
Collaborator

If you really want to work around MSVC's bug, you could change tablegen to emit the string literals as constant globals / StringLiteral

I spent a bit thinking about this, and this function is incredibly wasteful. It ends up re-doing a ton of work that is ALREADY done. It was convenient at the time to write it that way, but with a little intelligence at tablegen, we can simplify this function a million-times.

Some things we can do (and should do all): 1- We can calculate the scope, so we don't actually compare those. We're checking against "clang" and "" constantly, lets pre-calculate that into some sort of enum. There are only so many valid values here (and, IIRC, this has already been checked to be valid), so we can switch on those easily. PERHAPS another tablegen file to make sure we get all the normalized ones, but pretty trivial.

2- we could reduce the number of comparisons against the name: At that point, it ends up being:

if (Name == "aarch64_sve_pcs") {
  if (getSyntax() == AttributeCommonInfo::AS_GNU && calculatedScope == Scope::None)
     return 0;
if (getSyntax() == AttributeCommonInfo::AS_CXX11 && calculatedScope == Scope::Clang)
    return 1;
if (getSyntax() == AttributeCommonInfo::AS_C23 && calculatedScope == Clang)
    return 2;

3- STOP being dummies and re-doing all the work that the trie in getAttrKind already did! We already know that the spelling is one-of-a-couple for each of the variants, so the actual 'does this string match this other one' is useless, we just need to know which it is in a list. MOST (including the one in the example above), all are the same, so those comparisons are dumb/wasted. Lets see if we can remove those.

For cases where there are multiple spellings, we can compare length! For example: 1st I found where spelling matters, is AT_ArgumentWithTypeTag:, which is either argument_with_type_tag or pointer_with_type_tag. At THAT point, we could just do Name.length() == 22 and Name.length() == 21.

We already know it is ONE of those two, and just need to figure out which.

IN SUMMARY: I am convinced somewhat we can just teach tablegen to be 'smarter' about this file and save ourselves ~100% of those string compares and replace them with numeric comparisons, or enum comparisons.

The result is something that is: A- acceptable to the community/maintainers B- sidesteps/never has this problem again for MSVC C- Improves performance.

BTW, @chinmaydd : Please let me know if this is something you're able to do. I might be able to free up some cycles later this week/next week to put effort into this if not.

@chinmaydd
Copy link
Contributor Author

chinmaydd commented Oct 30, 2024

Thank you everyone for the review.

On my local setup, this function occupies close to 185KB of local stack (MSVC 14.38.33130), which is really wasteful. Like @kparzysz highlighted, we observed a crash in __chkstk() which is inserted by MSVC in the prologue when a function requests for more than 8KB (page size on x64) of stack space for local variables.

Another such function we observed was TargetLibraryInfoImpl::addVectorizableFunctionsFromVecLib. Its stack usage was fixed earlier this year (#86829).

MSVC does not provide a flag such as -fstack-usage, but I was able to perform some basic binary analysis to identify that close to 800 functions use more than 8KB of stack and around 14 use more than 64KB. I feel like in the longer term, we should look at reducing this since the clang driver may be invoked in-process with limited stack space available for the executing thread.

@erichkeane thanks for taking the time to suggest a clear way forward. I'll write a patch that attempts to achieve the same. I'll close this PR for now.

@chinmaydd chinmaydd closed this Oct 30, 2024
@erichkeane
Copy link
Collaborator

Thank you everyone for the review.

On my local setup, this function occupies close to 185KB of local stack (MSVC 14.38.33130), which is really wasteful. Like @kparzysz highlighted, we observed a crash in __chkstk() which is inserted by MSVC in the prologue when a function requests for more than 8KB (page size on x64) of stack space for local variables.

Another such function we observed was TargetLibraryInfoImpl::addVectorizableFunctionsFromVecLib. Its stack usage was fixed earlier this year (#86829).

MSVC does not provide a flag such as -fstack-usage, but I was able to perform some basic binary analysis to identify that close to 800 functions use more than 8KB of stack and around 14 use more than 64KB. I feel like in the longer term, we should look at reducing this since the clang driver may be invoked in-process with limited stack space available for the executing thread.

@erichkeane thanks for taking the time to suggest a clear way forward. I'll write a patch that attempts to achieve the same. I'll close this PR for now.

Great, let me know if you need help/advice! I thought about it at lunch and had some thoughts:

1- The Scope comparison is a little finnicky/requires a lot of work probably to reduce to an integral, though I think we only have 5. (blank, clang, gnu, msvc, omp), so some level of checking there should be reasonable.

2- A LOT of our attributes have only 1 unique flattenedspelling (that is, only 1 variant to check), so we can optimize for that by just doing a return 0 (which is the break behavior anyway).

3- A LOT of our attributes only have 1 unique Spelling to check, so you can skip checking the strings entirely. This leaves you with just the Scope and Syntax checks.

4- I don't really see any/many that have duplicates of sizes (I see 5 that have ANY duplicate sizes), so this case just doing a .size is probably sufficient, in addition to Scope and Syntax.

5- OF the very few that are rest, we have 2 options: a) We just revert to string checks. b) We try to get clever.
For 'b', we have to check each length, THEN the index of the first 'different' character between it and every other name of that size (so up to N - 1 comparisons, imagine you have aaaa, aaca, and abaa. You'd have to check character 2 and 3 for each (2 comparisons + size). We likely will just need N - 1. I'd love to get here, but understand if you'd prefer to just drop to a.

6- The function that is responsible for this file is EmitClangAttrSpellingListIndex in ClangAttrEmitter.cpp. It is all really well contained, so the only thing you really need to care about is FlattenedSpelling.

7- PLEASE when you go for review, put in a comment/gist/etc, the contents of the .inc file to make sure this is doing everything we want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants