Skip to content

Commit ce652b7

Browse files
committed
Some improvements to loading the session with --prompt-cache
1. Currently the --seed parameter is ignored when loading the prompt. However, a very common use case would be to save a prompt and then try several attempts at generation with different seeds. 2. When loading a cached prompt from a session, you have to specify the prompt again. Even worse, if you forget to enter a prompt you'll get your cached prompt overwritten by the blank one.
1 parent 2e6cd4b commit ce652b7

File tree

1 file changed

+13
-3
lines changed

1 file changed

+13
-3
lines changed

examples/main/main.cpp

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -134,8 +134,6 @@ int main(int argc, char ** argv) {
134134
return 0;
135135
}
136136

137-
// Add a space in front of the first character to match OG llama tokenizer behavior
138-
params.prompt.insert(0, 1, ' ');
139137

140138
std::string path_session = params.path_prompt_cache;
141139
std::vector<llama_token> session_tokens;
@@ -155,6 +153,9 @@ int main(int argc, char ** argv) {
155153
return 1;
156154
}
157155
session_tokens.resize(n_token_count_out);
156+
if (params.seed != -1) {
157+
llama_set_rng_seed(ctx, params.seed);
158+
}
158159

159160
fprintf(stderr, "%s: loaded a session with prompt size of %d tokens\n", __func__, (int) session_tokens.size());
160161
} else {
@@ -163,7 +164,16 @@ int main(int argc, char ** argv) {
163164
}
164165

165166
// tokenize the prompt
166-
auto embd_inp = ::llama_tokenize(ctx, params.prompt, true);
167+
std::vector<llama_token> embd_inp;
168+
169+
if (params.prompt.size() > 0 || session_tokens.size() == 0) {
170+
// Add a space in front of the first character to match OG llama tokenizer behavior
171+
params.prompt.insert(0, 1, ' ');
172+
173+
embd_inp = ::llama_tokenize(ctx, params.prompt, true);
174+
} else {
175+
embd_inp = session_tokens;
176+
}
167177

168178
const int n_ctx = llama_n_ctx(ctx);
169179

0 commit comments

Comments
 (0)