-
Notifications
You must be signed in to change notification settings - Fork 12.2k
llama : reduce compile time and binary size #9712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me with a 13900k reduces build time from 23s to 17s.
With VS2022 reduces build time from 6 minutes to 15 seconds. |
Huh, I wasn't aware the build was so slow with VS. Nice work @ngxson |
Nice, that's surprise given that the difference on CI is not that much (most windows builds are reduced by 1 minute) Small note, I'm wondering if we could use the same technique to reduce CUDA build time & size. Just an idea though, I'm not very familiar with CUDA programming so can't do much. |
The CI uses VS2019, this only affected VS2022. The issue with the CUDA build time is related to the number of kernels that have to be built, there are a lot of them and each one is usually built for several architectures. |
* llama : speed up compile time * fix build * fix build (2)
* llama : speed up compile time * fix build * fix build (2)
* llama : speed up compile time * fix build * fix build (2)
Some small modifications that reduce the compile time of
libllama
and binary size, without compromise the maintainability and readability of the code.Result
Tested on: Macbook Pro M3 Max
master:
PR:
How it works?
Given this example:
Compile on https://godbolt.org/ with
-O3
and look at the result:static const std::map
is constructed at runtime (via_Rb_tree_insert_and_rebalance
). This produces lot more instructions, which slow down both the compile time & start up (at runtime)static const std::string
is constructed at runtime (viastd::__cxx11::basic_string
)std::initializer_list
is compiled into static table (more visible with-O0
), which compiles very fastWithout some big
std::map
,std::vector
orstd::string
being constructed at runtime, in theory, we also speed up the application boot time (although it's too minor to notice).