Skip to content

Commit 84a4481

Browse files
authored
cli : auto activate conversation mode if chat template is available (#11214)
* cli : auto activate conversation mode if chat template is detected * add warn on bad template * update readme (writing with the help of chatgpt) * update readme (2) * do not activate -cnv for non-instruct models
1 parent 39509fb commit 84a4481

File tree

4 files changed

+75
-36
lines changed

4 files changed

+75
-36
lines changed

README.md

Lines changed: 21 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -245,6 +245,8 @@ The [Hugging Face](https://huggingface.co) platform hosts a [number of LLMs](htt
245245
- [Trending](https://huggingface.co/models?library=gguf&sort=trending)
246246
- [LLaMA](https://huggingface.co/models?sort=trending&search=llama+gguf)
247247

248+
You can either manually download the GGUF file or directly use any `llama.cpp`-compatible models from Hugging Face by using this CLI argument: `-hf <user>/<model>[:quant]`
249+
248250
After downloading a model, use the CLI tools to run it locally - see below.
249251

250252
`llama.cpp` requires the model to be stored in the [GGUF](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) file format. Models in other data formats can be converted to GGUF using the `convert_*.py` Python scripts in this repo.
@@ -263,21 +265,12 @@ To learn more about model quantization, [read this documentation](examples/quant
263265
#### A CLI tool for accessing and experimenting with most of `llama.cpp`'s functionality.
264266

265267
- <details open>
266-
<summary>Run simple text completion</summary>
267-
268-
```bash
269-
llama-cli -m model.gguf -p "I believe the meaning of life is" -n 128
270-
271-
# I believe the meaning of life is to find your own truth and to live in accordance with it. For me, this means being true to myself and following my passions, even if they don't align with societal expectations. I think that's what I love about yoga – it's not just a physical practice, but a spiritual one too. It's about connecting with yourself, listening to your inner voice, and honoring your own unique journey.
272-
```
273-
274-
</details>
275-
276-
- <details>
277268
<summary>Run in conversation mode</summary>
278269

270+
Models with a built-in chat template will automatically activate conversation mode. If this doesn't occur, you can manually enable it by adding `-cnv` and specifying a suitable chat template with `--chat-template NAME`
271+
279272
```bash
280-
llama-cli -m model.gguf -p "You are a helpful assistant" -cnv
273+
llama-cli -m model.gguf
281274

282275
# > hi, who are you?
283276
# Hi there! I'm your helpful assistant! I'm an AI-powered chatbot designed to assist and provide information to users like you. I'm here to help answer your questions, provide guidance, and offer support on a wide range of topics. I'm a friendly and knowledgeable AI, and I'm always happy to help with anything you need. What's on your mind, and how can I assist you today?
@@ -289,17 +282,28 @@ To learn more about model quantization, [read this documentation](examples/quant
289282
</details>
290283

291284
- <details>
292-
<summary>Run with custom chat template</summary>
285+
<summary>Run in conversation mode with custom chat template</summary>
293286

294287
```bash
295-
# use the "chatml" template
296-
llama-cli -m model.gguf -p "You are a helpful assistant" -cnv --chat-template chatml
288+
# use the "chatml" template (use -h to see the list of supported templates)
289+
llama-cli -m model.gguf -cnv --chat-template chatml
297290
298291
# use a custom template
299-
llama-cli -m model.gguf -p "You are a helpful assistant" -cnv --in-prefix 'User: ' --reverse-prompt 'User:'
292+
llama-cli -m model.gguf -cnv --in-prefix 'User: ' --reverse-prompt 'User:'
300293
```
301294

302-
[Supported templates](https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template)
295+
</details>
296+
297+
- <details>
298+
<summary>Run simple text completion</summary>
299+
300+
To disable conversation mode explicitly, use `-no-cnv`
301+
302+
```bash
303+
llama-cli -m model.gguf -p "I believe the meaning of life is" -n 128 -no-cnv
304+
305+
# I believe the meaning of life is to find your own truth and to live in accordance with it. For me, this means being true to myself and following my passions, even if they don't align with societal expectations. I think that's what I love about yoga – it's not just a physical practice, but a spiritual one too. It's about connecting with yourself, listening to your inner voice, and honoring your own unique journey.
306+
```
303307

304308
</details>
305309

common/arg.cpp

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -777,15 +777,19 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
777777
).set_examples({LLAMA_EXAMPLE_MAIN, LLAMA_EXAMPLE_SERVER}));
778778
add_opt(common_arg(
779779
{"-cnv", "--conversation"},
780-
string_format(
781-
"run in conversation mode:\n"
782-
"- does not print special tokens and suffix/prefix\n"
783-
"- interactive mode is also enabled\n"
784-
"(default: %s)",
785-
params.conversation ? "true" : "false"
786-
),
780+
"run in conversation mode:\n"
781+
"- does not print special tokens and suffix/prefix\n"
782+
"- interactive mode is also enabled\n"
783+
"(default: auto enabled if chat template is available)",
784+
[](common_params & params) {
785+
params.conversation_mode = COMMON_CONVERSATION_MODE_ENABLED;
786+
}
787+
).set_examples({LLAMA_EXAMPLE_MAIN}));
788+
add_opt(common_arg(
789+
{"-no-cnv", "--no-conversation"},
790+
"force disable conversation mode (default: false)",
787791
[](common_params & params) {
788-
params.conversation = true;
792+
params.conversation_mode = COMMON_CONVERSATION_MODE_DISABLED;
789793
}
790794
).set_examples({LLAMA_EXAMPLE_MAIN}));
791795
add_opt(common_arg(

common/common.h

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,12 @@ enum dimre_method {
103103
DIMRE_METHOD_MEAN,
104104
};
105105

106+
enum common_conversation_mode {
107+
COMMON_CONVERSATION_MODE_DISABLED = 0,
108+
COMMON_CONVERSATION_MODE_ENABLED = 1,
109+
COMMON_CONVERSATION_MODE_AUTO = 2,
110+
};
111+
106112
// sampling parameters
107113
struct common_params_sampling {
108114
uint32_t seed = LLAMA_DEFAULT_SEED; // the seed used to initialize llama_sampler
@@ -275,7 +281,6 @@ struct common_params {
275281
bool special = false; // enable special token output
276282
bool interactive = false; // interactive mode
277283
bool interactive_first = false; // wait for user input immediately
278-
bool conversation = false; // conversation mode (does not print special tokens and suffix/prefix)
279284
bool prompt_cache_all = false; // save user input and generations to prompt cache
280285
bool prompt_cache_ro = false; // open the prompt cache read-only and do not update it
281286

@@ -301,6 +306,8 @@ struct common_params {
301306
ggml_type cache_type_k = GGML_TYPE_F16; // KV cache data type for the K
302307
ggml_type cache_type_v = GGML_TYPE_F16; // KV cache data type for the V
303308

309+
common_conversation_mode conversation_mode = COMMON_CONVERSATION_MODE_AUTO;
310+
304311
// multimodal models (see examples/llava)
305312
std::string mmproj = ""; // path to multimodal projector // NOLINT
306313
std::vector<std::string> image; // path to image file(s)

examples/main/main.cpp

Lines changed: 34 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@
3030
#pragma warning(disable: 4244 4267) // possible loss of data
3131
#endif
3232

33+
static const char * DEFAULT_SYSTEM_MESSAGE = "You are a helpful assistant";
34+
3335
static llama_context ** g_ctx;
3436
static llama_model ** g_model;
3537
static common_sampler ** g_smpl;
@@ -204,8 +206,24 @@ int main(int argc, char ** argv) {
204206
LOG_WRN("%s: model was trained on only %d context tokens (%d specified)\n", __func__, n_ctx_train, n_ctx);
205207
}
206208

209+
// auto enable conversation mode if chat template is available
210+
const bool has_chat_template = !common_get_builtin_chat_template(model).empty() || !params.chat_template.empty();
211+
if (params.conversation_mode == COMMON_CONVERSATION_MODE_AUTO) {
212+
if (has_chat_template) {
213+
LOG_INF("%s: chat template is available, enabling conversation mode (disable it with -no-cnv)\n", __func__);
214+
params.conversation_mode = COMMON_CONVERSATION_MODE_ENABLED;
215+
} else {
216+
params.conversation_mode = COMMON_CONVERSATION_MODE_DISABLED;
217+
}
218+
}
219+
220+
// in case user force-activate conversation mode (via -cnv) without proper chat template, we show a warning
221+
if (params.conversation_mode && !has_chat_template) {
222+
LOG_WRN("%s: chat template is not available or is not supported. This may cause the model to output suboptimal responses\n", __func__);
223+
}
224+
207225
// print chat template example in conversation mode
208-
if (params.conversation) {
226+
if (params.conversation_mode) {
209227
if (params.enable_chat_template) {
210228
LOG_INF("%s: chat template example:\n%s\n", __func__, common_chat_format_example(model, params.chat_template).c_str());
211229
} else {
@@ -252,8 +270,10 @@ int main(int argc, char ** argv) {
252270
std::vector<llama_token> embd_inp;
253271

254272
{
255-
auto prompt = (params.conversation && params.enable_chat_template && !params.prompt.empty())
256-
? chat_add_and_format(model, chat_msgs, "system", params.prompt) // format the system prompt in conversation mode
273+
auto prompt = (params.conversation_mode && params.enable_chat_template)
274+
// format the system prompt in conversation mode (fallback to default if empty)
275+
? chat_add_and_format(model, chat_msgs, "system", params.prompt.empty() ? DEFAULT_SYSTEM_MESSAGE : params.prompt)
276+
// otherwise use the prompt as is
257277
: params.prompt;
258278
if (params.interactive_first || !params.prompt.empty() || session_tokens.empty()) {
259279
LOG_DBG("tokenize the prompt\n");
@@ -327,7 +347,7 @@ int main(int argc, char ** argv) {
327347
params.n_keep += add_bos; // always keep the BOS token
328348
}
329349

330-
if (params.conversation) {
350+
if (params.conversation_mode) {
331351
params.interactive_first = true;
332352
}
333353

@@ -451,7 +471,11 @@ int main(int argc, char ** argv) {
451471
#if defined (__unix__) || (defined (__APPLE__) && defined (__MACH__)) || defined (_WIN32)
452472
LOG_INF( " - Press Ctrl+C to interject at any time.\n");
453473
#endif
454-
LOG_INF( "%s\n", control_message);
474+
LOG_INF( "%s", control_message);
475+
if (params.conversation_mode && params.enable_chat_template && params.prompt.empty()) {
476+
LOG_INF( " - Using default system message. To change it, set a different value via -p PROMPT or -f FILE argument.\n");
477+
}
478+
LOG_INF("\n");
455479

456480
is_interacting = params.interactive_first;
457481
}
@@ -763,15 +787,15 @@ int main(int argc, char ** argv) {
763787
}
764788

765789
// if current token is not EOG, we add it to current assistant message
766-
if (params.conversation) {
790+
if (params.conversation_mode) {
767791
const auto id = common_sampler_last(smpl);
768792
assistant_ss << common_token_to_piece(ctx, id, false);
769793
}
770794

771795
if (n_past > 0 && is_interacting) {
772796
LOG_DBG("waiting for user input\n");
773797

774-
if (params.conversation) {
798+
if (params.conversation_mode) {
775799
LOG("\n> ");
776800
}
777801

@@ -781,7 +805,7 @@ int main(int argc, char ** argv) {
781805
}
782806

783807
std::string buffer;
784-
if (!params.input_prefix.empty() && !params.conversation) {
808+
if (!params.input_prefix.empty() && !params.conversation_mode) {
785809
LOG_DBG("appending input prefix: '%s'\n", params.input_prefix.c_str());
786810
LOG("%s", params.input_prefix.c_str());
787811
}
@@ -805,7 +829,7 @@ int main(int argc, char ** argv) {
805829
// Entering a empty line lets the user pass control back
806830
if (buffer.length() > 1) {
807831
// append input suffix if any
808-
if (!params.input_suffix.empty() && !params.conversation) {
832+
if (!params.input_suffix.empty() && !params.conversation_mode) {
809833
LOG_DBG("appending input suffix: '%s'\n", params.input_suffix.c_str());
810834
LOG("%s", params.input_suffix.c_str());
811835
}
@@ -818,7 +842,7 @@ int main(int argc, char ** argv) {
818842
string_process_escapes(buffer);
819843
}
820844

821-
bool format_chat = params.conversation && params.enable_chat_template;
845+
bool format_chat = params.conversation_mode && params.enable_chat_template;
822846
std::string user_inp = format_chat
823847
? chat_add_and_format(model, chat_msgs, "user", std::move(buffer))
824848
: std::move(buffer);

0 commit comments

Comments
 (0)