Skip to content

Commit 2cc5044

Browse files
ochafiknopperl
authored andcommitted
Improve usability of --model-url & related flags (ggml-org#6930)
* args: default --model to models/ + filename from --model-url or --hf-file (or else legacy models/7B/ggml-model-f16.gguf) * args: main & server now call gpt_params_handle_model_default * args: define DEFAULT_MODEL_PATH + update cli docs * curl: check url of previous download (.json metadata w/ url, etag & lastModified) * args: fix update to quantize-stats.cpp * curl: support legacy .etag / .lastModified companion files * curl: rm legacy .etag file support * curl: reuse regex across headers callback calls * curl: unique_ptr to manage lifecycle of curl & outfile * curl: nit: no need for multiline regex flag * curl: update failed test (model file collision) + gitignore *.gguf.json
1 parent 397863e commit 2cc5044

File tree

7 files changed

+144
-133
lines changed

7 files changed

+144
-133
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
*.a
33
*.so
44
*.gguf
5+
*.gguf.json
56
*.bin
67
*.exe
78
*.dll

common/common.cpp

Lines changed: 132 additions & 128 deletions
Large diffs are not rendered by default.

common/common.h

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,8 @@
3131
fprintf(stderr, "%s: built with %s for %s\n", __func__, LLAMA_COMPILER, LLAMA_BUILD_TARGET); \
3232
} while(0)
3333

34+
#define DEFAULT_MODEL_PATH "models/7B/ggml-model-f16.gguf"
35+
3436
// build info
3537
extern int LLAMA_BUILD_NUMBER;
3638
extern char const *LLAMA_COMMIT;
@@ -92,7 +94,7 @@ struct gpt_params {
9294
// // sampling parameters
9395
struct llama_sampling_params sparams;
9496

95-
std::string model = "models/7B/ggml-model-f16.gguf"; // model path
97+
std::string model = ""; // model path
9698
std::string model_draft = ""; // draft model for speculative decoding
9799
std::string model_alias = "unknown"; // model alias
98100
std::string model_url = ""; // model url to download
@@ -171,6 +173,8 @@ struct gpt_params {
171173
std::vector<std::string> image; // path to image file(s)
172174
};
173175

176+
void gpt_params_handle_model_default(gpt_params & params);
177+
174178
bool parse_kv_override(const char * data, std::vector<llama_model_kv_override> & overrides);
175179

176180
bool gpt_params_parse_ex(int argc, char ** argv, gpt_params & params);

examples/main/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ main.exe -m models\7B\ggml-model.bin --ignore-eos -n -1 --random-prompt
6666

6767
In this section, we cover the most commonly used options for running the `main` program with the LLaMA models:
6868

69-
- `-m FNAME, --model FNAME`: Specify the path to the LLaMA model file (e.g., `models/7B/ggml-model.bin`).
69+
- `-m FNAME, --model FNAME`: Specify the path to the LLaMA model file (e.g., `models/7B/ggml-model.gguf`; inferred from `--model-url` if set).
7070
- `-mu MODEL_URL --model-url MODEL_URL`: Specify a remote http url to download the file (e.g https://huggingface.co/ggml-org/models/resolve/main/phi-2/ggml-model-q4_0.gguf).
7171
- `-i, --interactive`: Run the program in interactive mode, allowing you to provide input directly and receive real-time responses.
7272
- `-ins, --instruct`: Run the program in instruction mode, which is particularly useful when working with Alpaca models.

examples/quantize-stats/quantize-stats.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
#endif
2424

2525
struct quantize_stats_params {
26-
std::string model = "models/7B/ggml-model-f16.gguf";
26+
std::string model = DEFAULT_MODEL_PATH;
2727
bool verbose = false;
2828
bool per_layer_stats = false;
2929
bool print_histogram = false;

examples/server/server.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2353,7 +2353,7 @@ static void server_print_usage(const char * argv0, const gpt_params & params, co
23532353
printf(" disable KV offload\n");
23542354
}
23552355
printf(" -m FNAME, --model FNAME\n");
2356-
printf(" model path (default: %s)\n", params.model.c_str());
2356+
printf(" model path (default: models/$filename with filename from --hf-file or --model-url if set, otherwise %s)\n", DEFAULT_MODEL_PATH);
23572357
printf(" -mu MODEL_URL, --model-url MODEL_URL\n");
23582358
printf(" model download url (default: unused)\n");
23592359
printf(" -hfr REPO, --hf-repo REPO\n");
@@ -2835,6 +2835,8 @@ static void server_params_parse(int argc, char ** argv, server_params & sparams,
28352835
}
28362836
}
28372837

2838+
gpt_params_handle_model_default(params);
2839+
28382840
if (!params.kv_overrides.empty()) {
28392841
params.kv_overrides.emplace_back();
28402842
params.kv_overrides.back().key[0] = 0;

examples/server/tests/features/embeddings.feature

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ Feature: llama.cpp server
55
Background: Server startup
66
Given a server listening on localhost:8080
77
And a model url https://huggingface.co/ggml-org/models/resolve/main/bert-bge-small/ggml-model-f16.gguf
8-
And a model file ggml-model-f16.gguf
8+
And a model file bert-bge-small.gguf
99
And a model alias bert-bge-small
1010
And 42 as server seed
1111
And 2 slots

0 commit comments

Comments
 (0)