Support calling mlock() on loaded model data on Linux and macOS #453

comex · 2023-03-24T03:08:33Z

This is enabled by a new --mlock command line option.

Using mlock() disables swapping and memory compression for the model data. Doing so can be useful on systems where the model takes up a large fraction of system RAM. In my experience, macOS is quite eager to start compressing llama.cpp's memory, which then makes it halt for a few seconds while it decompresses, even with a model that uses "only" 25GB out of 32GB.

Of course, this comes at the cost of forcing the system to swap or compress other processes' memory instead, so it needs to be used with care and shouldn't be enabled by default.

In theory it should be possible to support this on Windows as well using VirtualLock(), but I'm not much of a Windows user.

This is enabled by a new --mlock command line option. Using mlock() disables swapping and memory compression for the model data. Doing so can be useful on systems where the model takes up a large fraction of system RAM. In my experience, macOS is quite eager to start compressing llama.cpp's memory, which then makes it halt for a few seconds while it decompresses, even with a model that uses "only" 25GB out of 32GB. Of course, this comes at the cost of forcing the system to swap or compress other processes' memory instead, so it needs to be used with care and shouldn't be enabled by default. In theory it should be possible to support this on Windows as well using VirtualLock(), but I'm not much of a Windows user.

llama.cpp

jon-chuang · 2023-04-26T14:17:30Z

Just curious, if I load 2 models that are mlocked, such that their total memory exceeds my system memory, what would the behaviour be? Would this be OOM?

Also, what is the cleanup behaviour? I.e. if llama.cpp exits, will there be an munlock? What if my program exits prematurely i.e. by ctrl-C?

Fix incorrect token_logprobs (due to indexing after sorting)

comex and others added 2 commits March 23, 2023 20:08

Merge branch 'master' into mlock

a65f233

ggerganov reviewed Mar 24, 2023

View reviewed changes

llama.cpp Outdated Show resolved Hide resolved

Update llama.cpp

53a941c

ggerganov approved these changes Mar 24, 2023

View reviewed changes

ggerganov merged commit 563cdc3 into ggml-org:master Mar 24, 2023

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this pull request Dec 19, 2023

Merge pull request ggml-org#453 from wu-qing-157/main

b8e0bed

Fix incorrect token_logprobs (due to indexing after sorting)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support calling mlock() on loaded model data on Linux and macOS #453

Support calling mlock() on loaded model data on Linux and macOS #453

Uh oh!

comex commented Mar 24, 2023

Uh oh!

Uh oh!

jon-chuang commented Apr 26, 2023 •

edited

Loading

Uh oh!

Uh oh!

Support calling mlock() on loaded model data on Linux and macOS #453

Support calling mlock() on loaded model data on Linux and macOS #453

Uh oh!

Conversation

comex commented Mar 24, 2023

Uh oh!

Uh oh!

jon-chuang commented Apr 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jon-chuang commented Apr 26, 2023 •

edited

Loading