Skip to content

Commit 9fe6eb4

Browse files
danbevggerganov
authored andcommitted
llama : add early return for empty range (ggml-org#8327)
* llama : add early return for empty range This commit adds an early return to the llama_kv_cache_seq_add and llama_kv_cache_seq_div functions. The motivation for adding this is to avoid looping over the cache when the range is empty. I ran into this when using the self-extend feature in main.cpp. Signed-off-by: Daniel Bevenius <[email protected]> * llama : add static_cast to fix CI warning/error This commit attempts to fix the following warning/error: ```console src/llama.cpp:7271:31: error: comparison of integer expressions of different signedness: ‘int’ and ‘uint32_t’ {aka ‘unsigned int’} [-Werror=sign-compare] 7271 | if (i < hparams.n_layer_dense_lead) { | ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` This can be reproduced locally by setting -Wsign-compare in the Makefile. Signed-off-by: Daniel Bevenius <[email protected]> * squash! llama : add early return for empty range Remove the setting of cache.head to 0 when the range is empty. Signed-off-by: Daniel Bevenius <[email protected]> * Update src/llama.cpp --------- Signed-off-by: Daniel Bevenius <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
1 parent 275c463 commit 9fe6eb4

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

src/llama.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3261,6 +3261,8 @@ static void llama_kv_cache_seq_add(
32613261

32623262
if (p0 < 0) p0 = 0;
32633263
if (p1 < 0) p1 = std::numeric_limits<llama_pos>::max();
3264+
// If there is no range then return early to avoid looping over the cache.
3265+
if (p0 == p1) return;
32643266

32653267
if (cache.recurrent) {
32663268
// for Mamba-like models, only the pos needs to be shifted
@@ -3305,6 +3307,8 @@ static void llama_kv_cache_seq_div(
33053307
int d) {
33063308
if (p0 < 0) p0 = 0;
33073309
if (p1 < 0) p1 = std::numeric_limits<llama_pos>::max();
3310+
// If there is no range then return early to avoid looping over the cache.
3311+
if (p0 == p1) return;
33083312

33093313
if (cache.recurrent) {
33103314
// for Mamba-like models, only the pos needs to be changed

0 commit comments

Comments
 (0)