Skip to content

llama : reduce compile time and binary size #9712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 2, 2024

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Oct 2, 2024

Some small modifications that reduce the compile time of libllama and binary size, without compromise the maintainability and readability of the code.

Result

Tested on: Macbook Pro M3 Max

master:

make clean && time make libllama.so -j
# make libllama.so -j  35.35s user 1.26s system 247% cpu 14.812 total
# output file size = 1512712 bytes

PR:

make clean && time make libllama.so -j
# make libllama.so -j  31.38s user 1.15s system 339% cpu 9.582 total
# output file size = 1296776 bytes

How it works?

Given this example:

#include <map>
#include <string>

static const std::map<int, int> TEST = {
    {1, 123},
};

static const std::initializer_list<std::pair<int, int>> unicode_map_lowercase = {
{0x000041, 0x000061},
{0x000042, 0x000062},
};

static const std::string STR = "abcdef";

// Type your code here, or load an example.
int square(int num) {
    return TEST.at(1) + unicode_map_lowercase.begin()[0].first;
}

Compile on https://godbolt.org/ with -O3 and look at the result:

  • static const std::map is constructed at runtime (via _Rb_tree_insert_and_rebalance). This produces lot more instructions, which slow down both the compile time & start up (at runtime)
  • static const std::string is constructed at runtime (via std::__cxx11::basic_string)
  • On the other hand, std::initializer_list is compiled into static table (more visible with -O0), which compiles very fast

Without some big std::map, std::vector or std::string being constructed at runtime, in theory, we also speed up the application boot time (although it's too minor to notice).

Copy link
Member

@slaren slaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me with a 13900k reduces build time from 23s to 17s.

@ngxson ngxson merged commit a39ab21 into ggml-org:master Oct 2, 2024
53 checks passed
@slaren
Copy link
Member

slaren commented Oct 2, 2024

With VS2022 reduces build time from 6 minutes to 15 seconds.

@ggerganov
Copy link
Member

Huh, I wasn't aware the build was so slow with VS. Nice work @ngxson

@ngxson
Copy link
Collaborator Author

ngxson commented Oct 2, 2024

Nice, that's surprise given that the difference on CI is not that much (most windows builds are reduced by 1 minute)

Small note, LLM_TENSOR_NAMES is also one of the big objects that could be benefit from the same optimization (with a small complexity added to LLM_TN). However,, in my test, the reduction is too small (just ~100ms or so), so I did not include this change in the PR.

I'm wondering if we could use the same technique to reduce CUDA build time & size. Just an idea though, I'm not very familiar with CUDA programming so can't do much.

@slaren
Copy link
Member

slaren commented Oct 2, 2024

The CI uses VS2019, this only affected VS2022.

The issue with the CUDA build time is related to the number of kernels that have to be built, there are a lot of them and each one is usually built for several architectures.

dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
* llama : speed up compile time

* fix build

* fix build (2)
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
* llama : speed up compile time

* fix build

* fix build (2)
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
* llama : speed up compile time

* fix build

* fix build (2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants