Skip to content

Commit a1666aa

Browse files
committed
Merge branch 'master' into xsn/fix_lora
2 parents f6d090d + f1948f1 commit a1666aa

34 files changed

+9406
-73
lines changed

Makefile

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ BUILD_TARGETS = \
1414
llama-finetune \
1515
llama-gbnf-validator \
1616
llama-gguf \
17+
llama-gguf-hash \
1718
llama-gguf-split \
1819
llama-gritlm \
1920
llama-imatrix \
@@ -1178,6 +1179,23 @@ llama-gguf: examples/gguf/gguf.cpp \
11781179
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
11791180
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
11801181

1182+
examples/gguf-hash/deps/sha1/sha1.o: \
1183+
examples/gguf-hash/deps/sha1/sha1.c
1184+
$(CC) $(CFLAGS) -Iexamples/gguf-hash/deps -c $< -o $@
1185+
1186+
examples/gguf-hash/deps/xxhash/xxhash.o: \
1187+
examples/gguf-hash/deps/xxhash/xxhash.c
1188+
$(CC) $(CFLAGS) -Iexamples/gguf-hash/deps -c $< -o $@
1189+
1190+
examples/gguf-hash/deps/sha256/sha256.o: \
1191+
examples/gguf-hash/deps/sha256/sha256.c
1192+
$(CC) $(CFLAGS) -Iexamples/gguf-hash/deps -c $< -o $@
1193+
1194+
llama-gguf-hash: examples/gguf-hash/gguf-hash.cpp examples/gguf-hash/deps/sha1/sha1.o examples/gguf-hash/deps/xxhash/xxhash.o examples/gguf-hash/deps/sha256/sha256.o\
1195+
$(OBJ_ALL)
1196+
$(CXX) $(CXXFLAGS) -Iexamples/gguf-hash/deps -c $< -o $(call GET_OBJ_FILE, $<)
1197+
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
1198+
11811199
llama-gguf-split: examples/gguf-split/gguf-split.cpp \
11821200
$(OBJ_ALL)
11831201
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)

README.md

Lines changed: 25 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,7 @@ Typically finetunes of the base models below are supported as well.
131131
- Zig: [deins/llama.cpp.zig](https://github.com/Deins/llama.cpp.zig)
132132
- Flutter/Dart: [netdur/llama_cpp_dart](https://github.com/netdur/llama_cpp_dart)
133133
- PHP (API bindings and features built on top of llama.cpp): [distantmagic/resonance](https://github.com/distantmagic/resonance) [(more info)](https://github.com/ggerganov/llama.cpp/pull/6326)
134+
- Guile Scheme: [guile_llama_cpp](https://savannah.nongnu.org/projects/guile-llama-cpp)
134135

135136
**UI:**
136137

@@ -391,28 +392,21 @@ The `grammars/` folder contains a handful of sample grammars. To write your own,
391392

392393
For authoring more complex JSON grammars, you can also check out https://grammar.intrinsiclabs.ai/, a browser app that lets you write TypeScript interfaces which it compiles to GBNF grammars that you can save for local use. Note that the app is built and maintained by members of the community, please file any issues or FRs on [its repo](http://github.com/intrinsiclabsai/gbnfgen) and not this one.
393394

394-
### Obtaining and using the Facebook LLaMA 2 model
395+
## Build
395396

396-
- Refer to [Facebook's LLaMA download page](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) if you want to access the model data.
397-
- Alternatively, if you want to save time and space, you can download already converted and quantized models from [TheBloke](https://huggingface.co/TheBloke), including:
398-
- [LLaMA 2 7B base](https://huggingface.co/TheBloke/Llama-2-7B-GGUF)
399-
- [LLaMA 2 13B base](https://huggingface.co/TheBloke/Llama-2-13B-GGUF)
400-
- [LLaMA 2 70B base](https://huggingface.co/TheBloke/Llama-2-70B-GGUF)
401-
- [LLaMA 2 7B chat](https://huggingface.co/TheBloke/Llama-2-7B-chat-GGUF)
402-
- [LLaMA 2 13B chat](https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF)
403-
- [LLaMA 2 70B chat](https://huggingface.co/TheBloke/Llama-2-70B-chat-GGUF)
397+
Please refer to [Build llama.cpp locally](./docs/build.md)
404398

405-
### Seminal papers and background on the models
399+
## Supported backends
406400

407-
If your issue is with model generation quality, then please at least scan the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT:
408-
- LLaMA:
409-
- [Introducing LLaMA: A foundational, 65-billion-parameter large language model](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/)
410-
- [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)
411-
- GPT-3
412-
- [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
413-
- GPT-3.5 / InstructGPT / ChatGPT:
414-
- [Aligning language models to follow instructions](https://openai.com/research/instruction-following)
415-
- [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
401+
| Backend | Target devices |
402+
| --- | --- |
403+
| [Metal](./docs/build.md#metal-build) | Apple Silicon |
404+
| [BLAS](./docs/build.md#blas-build) | All |
405+
| [BLIS](./docs/backend/BLIS.md) | All |
406+
| [SYCL](./docs/backend/SYCL.md) | Intel and Nvidia GPU |
407+
| [CUDA](./docs/build.md#cuda) | Nvidia GPU |
408+
| [hipBLAS](./docs/build.md#hipblas) | AMD GPU |
409+
| [Vulkan](./docs/build.md#vulkan) | GPU |
416410

417411
## Tools
418412

@@ -460,3 +454,15 @@ To learn more how to measure perplexity using llama.cpp, [read this documentatio
460454
- [Build on Android](./docs/android.md)
461455
- [Performance troubleshooting](./docs/token_generation_performance_tips.md)
462456
- [GGML tips & tricks](https://github.com/ggerganov/llama.cpp/wiki/GGML-Tips-&-Tricks)
457+
458+
**Seminal papers and background on the models**
459+
460+
If your issue is with model generation quality, then please at least scan the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT:
461+
- LLaMA:
462+
- [Introducing LLaMA: A foundational, 65-billion-parameter large language model](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/)
463+
- [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)
464+
- GPT-3
465+
- [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
466+
- GPT-3.5 / InstructGPT / ChatGPT:
467+
- [Aligning language models to follow instructions](https://openai.com/research/instruction-following)
468+
- [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)

common/common.cpp

Lines changed: 37 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -190,6 +190,12 @@ int32_t cpu_get_num_math() {
190190
// CLI argument parsing
191191
//
192192

193+
void gpt_params_handle_hf_token(gpt_params & params) {
194+
if (params.hf_token.empty() && std::getenv("HF_TOKEN")) {
195+
params.hf_token = std::getenv("HF_TOKEN");
196+
}
197+
}
198+
193199
void gpt_params_handle_model_default(gpt_params & params) {
194200
if (!params.hf_repo.empty()) {
195201
// short-hand to avoid specifying --hf-file -> default it to --model
@@ -237,6 +243,8 @@ bool gpt_params_parse_ex(int argc, char ** argv, gpt_params & params) {
237243

238244
gpt_params_handle_model_default(params);
239245

246+
gpt_params_handle_hf_token(params);
247+
240248
if (params.escape) {
241249
string_process_escapes(params.prompt);
242250
string_process_escapes(params.input_prefix);
@@ -652,6 +660,14 @@ bool gpt_params_find_arg(int argc, char ** argv, const std::string & arg, gpt_pa
652660
params.model_url = argv[i];
653661
return true;
654662
}
663+
if (arg == "-hft" || arg == "--hf-token") {
664+
if (++i >= argc) {
665+
invalid_param = true;
666+
return true;
667+
}
668+
params.hf_token = argv[i];
669+
return true;
670+
}
655671
if (arg == "-hfr" || arg == "--hf-repo") {
656672
CHECK_ARG
657673
params.hf_repo = argv[i];
@@ -1576,6 +1592,7 @@ void gpt_params_print_usage(int /*argc*/, char ** argv, const gpt_params & param
15761592
options.push_back({ "*", "-mu, --model-url MODEL_URL", "model download url (default: unused)" });
15771593
options.push_back({ "*", "-hfr, --hf-repo REPO", "Hugging Face model repository (default: unused)" });
15781594
options.push_back({ "*", "-hff, --hf-file FILE", "Hugging Face model file (default: unused)" });
1595+
options.push_back({ "*", "-hft, --hf-token TOKEN", "Hugging Face access token (default: value from HF_TOKEN environment variable)" });
15791596

15801597
options.push_back({ "retrieval" });
15811598
options.push_back({ "retrieval", " --context-file FNAME", "file to load context from (repeat to specify multiple files)" });
@@ -2015,9 +2032,9 @@ std::tuple<struct llama_model *, struct llama_context *> llama_init_from_gpt_par
20152032
llama_model * model = nullptr;
20162033

20172034
if (!params.hf_repo.empty() && !params.hf_file.empty()) {
2018-
model = llama_load_model_from_hf(params.hf_repo.c_str(), params.hf_file.c_str(), params.model.c_str(), mparams);
2035+
model = llama_load_model_from_hf(params.hf_repo.c_str(), params.hf_file.c_str(), params.model.c_str(), params.hf_token.c_str(), mparams);
20192036
} else if (!params.model_url.empty()) {
2020-
model = llama_load_model_from_url(params.model_url.c_str(), params.model.c_str(), mparams);
2037+
model = llama_load_model_from_url(params.model_url.c_str(), params.model.c_str(), params.hf_token.c_str(), mparams);
20212038
} else {
20222039
model = llama_load_model_from_file(params.model.c_str(), mparams);
20232040
}
@@ -2200,7 +2217,7 @@ static bool starts_with(const std::string & str, const std::string & prefix) {
22002217
return str.rfind(prefix, 0) == 0;
22012218
}
22022219

2203-
static bool llama_download_file(const std::string & url, const std::string & path) {
2220+
static bool llama_download_file(const std::string & url, const std::string & path, const std::string & hf_token) {
22042221

22052222
// Initialize libcurl
22062223
std::unique_ptr<CURL, decltype(&curl_easy_cleanup)> curl(curl_easy_init(), &curl_easy_cleanup);
@@ -2215,6 +2232,15 @@ static bool llama_download_file(const std::string & url, const std::string & pat
22152232
curl_easy_setopt(curl.get(), CURLOPT_URL, url.c_str());
22162233
curl_easy_setopt(curl.get(), CURLOPT_FOLLOWLOCATION, 1L);
22172234

2235+
// Check if hf-token or bearer-token was specified
2236+
if (!hf_token.empty()) {
2237+
std::string auth_header = "Authorization: Bearer ";
2238+
auth_header += hf_token.c_str();
2239+
struct curl_slist *http_headers = NULL;
2240+
http_headers = curl_slist_append(http_headers, auth_header.c_str());
2241+
curl_easy_setopt(curl.get(), CURLOPT_HTTPHEADER, http_headers);
2242+
}
2243+
22182244
#if defined(_WIN32)
22192245
// CURLSSLOPT_NATIVE_CA tells libcurl to use standard certificate store of
22202246
// operating system. Currently implemented under MS-Windows.
@@ -2410,14 +2436,15 @@ static bool llama_download_file(const std::string & url, const std::string & pat
24102436
struct llama_model * llama_load_model_from_url(
24112437
const char * model_url,
24122438
const char * path_model,
2439+
const char * hf_token,
24132440
const struct llama_model_params & params) {
24142441
// Basic validation of the model_url
24152442
if (!model_url || strlen(model_url) == 0) {
24162443
fprintf(stderr, "%s: invalid model_url\n", __func__);
24172444
return NULL;
24182445
}
24192446

2420-
if (!llama_download_file(model_url, path_model)) {
2447+
if (!llama_download_file(model_url, path_model, hf_token)) {
24212448
return NULL;
24222449
}
24232450

@@ -2465,14 +2492,14 @@ struct llama_model * llama_load_model_from_url(
24652492
// Prepare download in parallel
24662493
std::vector<std::future<bool>> futures_download;
24672494
for (int idx = 1; idx < n_split; idx++) {
2468-
futures_download.push_back(std::async(std::launch::async, [&split_prefix, &split_url_prefix, &n_split](int download_idx) -> bool {
2495+
futures_download.push_back(std::async(std::launch::async, [&split_prefix, &split_url_prefix, &n_split, hf_token](int download_idx) -> bool {
24692496
char split_path[PATH_MAX] = {0};
24702497
llama_split_path(split_path, sizeof(split_path), split_prefix, download_idx, n_split);
24712498

24722499
char split_url[LLAMA_CURL_MAX_URL_LENGTH] = {0};
24732500
llama_split_path(split_url, sizeof(split_url), split_url_prefix, download_idx, n_split);
24742501

2475-
return llama_download_file(split_url, split_path);
2502+
return llama_download_file(split_url, split_path, hf_token);
24762503
}, idx));
24772504
}
24782505

@@ -2491,6 +2518,7 @@ struct llama_model * llama_load_model_from_hf(
24912518
const char * repo,
24922519
const char * model,
24932520
const char * path_model,
2521+
const char * hf_token,
24942522
const struct llama_model_params & params) {
24952523
// construct hugging face model url:
24962524
//
@@ -2506,14 +2534,15 @@ struct llama_model * llama_load_model_from_hf(
25062534
model_url += "/resolve/main/";
25072535
model_url += model;
25082536

2509-
return llama_load_model_from_url(model_url.c_str(), path_model, params);
2537+
return llama_load_model_from_url(model_url.c_str(), path_model, hf_token, params);
25102538
}
25112539

25122540
#else
25132541

25142542
struct llama_model * llama_load_model_from_url(
25152543
const char * /*model_url*/,
25162544
const char * /*path_model*/,
2545+
const char * /*hf_token*/,
25172546
const struct llama_model_params & /*params*/) {
25182547
fprintf(stderr, "%s: llama.cpp built without libcurl, downloading from an url not supported.\n", __func__);
25192548
return nullptr;
@@ -2523,6 +2552,7 @@ struct llama_model * llama_load_model_from_hf(
25232552
const char * /*repo*/,
25242553
const char * /*model*/,
25252554
const char * /*path_model*/,
2555+
const char * /*hf_token*/,
25262556
const struct llama_model_params & /*params*/) {
25272557
fprintf(stderr, "%s: llama.cpp built without libcurl, downloading from Hugging Face not supported.\n", __func__);
25282558
return nullptr;

common/common.h

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,7 @@ struct gpt_params {
108108
std::string model_draft = ""; // draft model for speculative decoding
109109
std::string model_alias = "unknown"; // model alias
110110
std::string model_url = ""; // model url to download
111+
std::string hf_token = ""; // HF token
111112
std::string hf_repo = ""; // HF repo
112113
std::string hf_file = ""; // HF file
113114
std::string prompt = "";
@@ -256,6 +257,7 @@ struct gpt_params {
256257
bool spm_infill = false; // suffix/prefix/middle pattern for infill
257258
};
258259

260+
void gpt_params_handle_hf_token(gpt_params & params);
259261
void gpt_params_handle_model_default(gpt_params & params);
260262

261263
bool gpt_params_parse_ex (int argc, char ** argv, gpt_params & params);
@@ -311,8 +313,8 @@ std::tuple<struct llama_model *, struct llama_context *> llama_init_from_gpt_par
311313
struct llama_model_params llama_model_params_from_gpt_params (const gpt_params & params);
312314
struct llama_context_params llama_context_params_from_gpt_params(const gpt_params & params);
313315

314-
struct llama_model * llama_load_model_from_url(const char * model_url, const char * path_model, const struct llama_model_params & params);
315-
struct llama_model * llama_load_model_from_hf(const char * repo, const char * file, const char * path_model, const struct llama_model_params & params);
316+
struct llama_model * llama_load_model_from_url(const char * model_url, const char * path_model, const char * hf_token, const struct llama_model_params & params);
317+
struct llama_model * llama_load_model_from_hf(const char * repo, const char * file, const char * path_model, const char * hf_token, const struct llama_model_params & params);
316318

317319
// Batch utils
318320

0 commit comments

Comments
 (0)