-
Notifications
You must be signed in to change notification settings - Fork 787
[sycl-post-link] Refactor cloned modules processing #4828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sycl-post-link] Refactor cloned modules processing #4828
Conversation
Temporary moved to draft mode to check with users that new solution doesn't break result applications functionality. |
012fa96
to
05a4b07
Compare
ValueToValueMapTy map is not fully release on destruction the data that is allocated on CloneModule call. Also split modules may utilize tens of megabytes and their amount may be significant. It all may lead to overflow of RAM memory. Rewrite module splitting in a way that each split module is processed and immediately saved into a file without temporary collecting all modules in RAM. Signed-off-by: Mikhail Lychkov <[email protected]>
05a4b07
to
5d80d3b
Compare
Signed-off-by: Mikhail Lychkov <[email protected]>
Signed-off-by: Mikhail Lychkov <[email protected]>
std::string ResModuleFile{}; | ||
bool CanReuseInputModule = !SyclAndEsimdCode && !IsEsimd && | ||
!IsLLVMUsedRemoved && !SpecConstsMet && | ||
(!DoSplit || GlobalsSet.size() <= 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last condition is equal to ResultModules.size() == 1
in previous implementation.
I'd like just to clarify if this is correct to reuse input module when split is requested and GlobalsSet.size() == 1
. This means that split scope is global, or split scope is per_source and all entry points are in 1 source file, or split is per_kernel and there is only 1 entry point. Anyway in such a situation CloneModule is called and some code optimizations are performed during splitting.
May we reuse input module in this case or this case should be excluded from check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May we reuse input module in this case or this case should be excluded from check?
I think it is ok to reuse input module in this case, because absolutely the same situation as one of you described here can happen with the current implementation as well - and we will reuse input module in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then there are 2 concerns:
- If we can reuse input module when splitting is used and GlobalsSet.size() == 1 then splitting is useless because the result is not saved. We might save resources and just use input module instead of split one.
- The results when we can reuse input module and when for example llvm.used or spec const is added into the same input module will differ a lot because in the first case we will split module and just drop the result reusing input module, but in second case we will split module, optimize it and save into a file.
// This map is shared between calls to fix memory leaks after its deletion and | ||
// reduce number of allocations and deallocations of memory for Module objects | ||
// during llvm::CloneModule call. | ||
static ValueToValueMapTy VMap; | ||
|
||
// During cloning of Module ValueMap tries to reuse already created old to new | ||
// Value mappings and asserts if a new Value pointer is nullptr. It may happen | ||
// after one cloned module is destroyed and we try to reuse ValueToValueMapTy | ||
// object for the next module to clone. | ||
// Proper solution is to improve CloneModule to support multiple clonings from | ||
// one source Module. | ||
// Current workaround is to drop all records that contain nullptr values. | ||
for (auto It = VMap.begin(), End = VMap.end(); It != End;) { | ||
if (It->second) | ||
++It; | ||
else | ||
VMap.erase(It++); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still looks like a hack/workaround to me. I don't want to block the commit with this, but it would be great if we find root cause of ValueToValueMapTy
not being deallocated properly to check if there is a more elegant solution to the problem
Signed-off-by: Mikhail Lychkov <[email protected]>
Signed-off-by: Mikhail Lychkov <[email protected]>
/summary:run |
Signed-off-by: Mikhail Lychkov <[email protected]>
Signed-off-by: Mikhail Lychkov <[email protected]>
/summary:run |
/summary:run |
Looks like test failure is a flaky known issue with llvm_test_suite_sycl/reduction_placeholder test abort after pass on L0 GPU target. |
Does loosing sensitive data lead to incorrect results/behavior in some cases? If so I don't think we should add this option. |
using TableFiles = std::map<StringRef, string_vector>; | ||
|
||
TableFiles processOneModule(std::unique_ptr<Module> M, bool IsEsimd, | ||
bool SyclAndEsimdCode) { | ||
struct SplitModeInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is quite hard to reverse engineer the overall algorithm/design. Can it be discussed online?
How much memory saving ReduceMemUsage brings?
The patch is reorganized in a set of newer PRs to simplify review process. |
ValueToValueMapTy map is not fully release on destruction the data that is
allocated on CloneModule call. Also split modules may utilize tens of megabytes
and their amount may be significant. It all may lead to overflow of RAM memory.
Rewrite module splitting in a way that each split module is processed and
immediately saved into a file without temporary collecting all modules in RAM.
Also try to share one ValueToValueMapTy instance for all cloned modules. It
may reduce memory utilization, but in some cases it also may loose sensitive data.
Add an option "-reduce-memory-usage" that turns on value map sharing mode.
This mode is off by default.
Signed-off-by: Mikhail Lychkov [email protected]