offload most used layers to gpu to get better performance #2330

PinkD · 2023-07-22T21:01:05Z

PinkD
Jul 22, 2023

I found that llama.cpp always offload the first n layers into gpu.
Is there any way to offload the most used layers to gpu to get better performance?
For example, we can run the model with cpu to count how many times each layers have been accessed and write the result into a file. Then we use this file to tell llama.cpp which layer should stay in gpu to get the theoretical best performance.

PinkD · 2023-07-22T21:18:21Z

PinkD
Jul 22, 2023
Author

I forgot that every layer in the model are equally traversed. So the count of every layer should be the same.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

offload most used layers to gpu to get better performance #2330

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

offload most used layers to gpu to get better performance #2330

Uh oh!

PinkD Jul 22, 2023

Replies: 1 comment

Uh oh!

PinkD Jul 22, 2023 Author

PinkD
Jul 22, 2023

PinkD
Jul 22, 2023
Author