llama : reduce compile time and binary size #9712

ngxson · 2024-10-02T07:34:06Z

Some small modifications that reduce the compile time of libllama and binary size, without compromise the maintainability and readability of the code.

Result

Tested on: Macbook Pro M3 Max

master:

make clean && time make libllama.so -j
# make libllama.so -j  35.35s user 1.26s system 247% cpu 14.812 total
# output file size = 1512712 bytes

PR:

make clean && time make libllama.so -j
# make libllama.so -j  31.38s user 1.15s system 339% cpu 9.582 total
# output file size = 1296776 bytes

How it works?

Given this example:

#include <map>
#include <string>

static const std::map<int, int> TEST = {
    {1, 123},
};

static const std::initializer_list<std::pair<int, int>> unicode_map_lowercase = {
{0x000041, 0x000061},
{0x000042, 0x000062},
};

static const std::string STR = "abcdef";

// Type your code here, or load an example.
int square(int num) {
    return TEST.at(1) + unicode_map_lowercase.begin()[0].first;
}

Compile on https://godbolt.org/ with -O3 and look at the result:

static const std::map is constructed at runtime (via _Rb_tree_insert_and_rebalance). This produces lot more instructions, which slow down both the compile time & start up (at runtime)
static const std::string is constructed at runtime (via std::__cxx11::basic_string)
On the other hand, std::initializer_list is compiled into static table (more visible with -O0), which compiles very fast

Without some big std::map, std::vector or std::string being constructed at runtime, in theory, we also speed up the application boot time (although it's too minor to notice).

slaren

For me with a 13900k reduces build time from 23s to 17s.

slaren · 2024-10-02T13:55:05Z

With VS2022 reduces build time from 6 minutes to 15 seconds.

ggerganov · 2024-10-02T14:01:49Z

Huh, I wasn't aware the build was so slow with VS. Nice work @ngxson

ngxson · 2024-10-02T14:02:18Z

Nice, that's surprise given that the difference on CI is not that much (most windows builds are reduced by 1 minute)

Small note, LLM_TENSOR_NAMES is also one of the big objects that could be benefit from the same optimization (with a small complexity added to LLM_TN). However,, in my test, the reduction is too small (just ~100ms or so), so I did not include this change in the PR.

I'm wondering if we could use the same technique to reduce CUDA build time & size. Just an idea though, I'm not very familiar with CUDA programming so can't do much.

slaren · 2024-10-02T14:03:05Z

The CI uses VS2019, this only affected VS2022.

The issue with the CUDA build time is related to the number of kernels that have to be built, there are a lot of them and each one is usually built for several architectures.

* llama : speed up compile time * fix build * fix build (2)

ngxson added 3 commits October 2, 2024 00:44

llama : speed up compile time

5f972a0

fix build

c608781

fix build (2)

b03cab5

slaren approved these changes Oct 2, 2024

View reviewed changes

ngxson merged commit a39ab21 into ggml-org:master Oct 2, 2024
53 checks passed

dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024

llama : reduce compile time and binary size (ggml-org#9712)

2653758

* llama : speed up compile time * fix build * fix build (2)

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024

llama : reduce compile time and binary size (ggml-org#9712)

8d161a8

* llama : speed up compile time * fix build * fix build (2)

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024

llama : reduce compile time and binary size (ggml-org#9712)

d5e9e4b

* llama : speed up compile time * fix build * fix build (2)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama : reduce compile time and binary size #9712

llama : reduce compile time and binary size #9712

Uh oh!

ngxson commented Oct 2, 2024

Uh oh!

slaren left a comment

Uh oh!

Uh oh!

slaren commented Oct 2, 2024

Uh oh!

ggerganov commented Oct 2, 2024

Uh oh!

ngxson commented Oct 2, 2024 •

edited

Loading

Uh oh!

slaren commented Oct 2, 2024 •

edited

Loading

Uh oh!

Uh oh!

llama : reduce compile time and binary size #9712

llama : reduce compile time and binary size #9712

Uh oh!

Conversation

ngxson commented Oct 2, 2024

Result

How it works?

Uh oh!

slaren left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

slaren commented Oct 2, 2024

Uh oh!

ggerganov commented Oct 2, 2024

Uh oh!

ngxson commented Oct 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slaren commented Oct 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ngxson commented Oct 2, 2024 •

edited

Loading

slaren commented Oct 2, 2024 •

edited

Loading