Skip to content

Commit daeaeb1

Browse files
author
Olivier Chafik
committed
Merge remote-tracking branch 'origin/master' into bins
2 parents 5265c15 + fd5ea0f commit daeaeb1

30 files changed

+1118
-492
lines changed
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
- Self Reported Review Complexity:
2+
- [ ] Review Complexity : Low
3+
- [ ] Review Complexity : Medium
4+
- [ ] Review Complexity : High
5+
- [ ] I have read the [contributing guidelines](CONTRIBUTING.md)

.github/workflows/server.yml

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,9 @@ on:
1616
branches:
1717
- master
1818
paths: ['.github/workflows/server.yml', '**/CMakeLists.txt', '**/Makefile', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu', '**/*.swift', '**/*.m', 'examples/server/**.*']
19-
pull_request_target:
19+
pull_request:
2020
types: [opened, synchronize, reopened]
2121
paths: ['.github/workflows/server.yml', '**/CMakeLists.txt', '**/Makefile', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu', '**/*.swift', '**/*.m', 'examples/server/**.*']
22-
schedule:
23-
- cron: '2 4 * * *'
2422

2523
concurrency:
2624
group: ${{ github.workflow }}-${{ github.ref }}-${{ github.head_ref || github.run_id }}
@@ -115,7 +113,7 @@ jobs:
115113
116114
117115
server-windows:
118-
runs-on: windows-latest
116+
runs-on: windows-2019
119117

120118
steps:
121119
- name: Clone

CONTRIBUTING.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Contributing Guidelines
2+
3+
## Checklist
4+
5+
* Make sure your PR follows the [coding guidelines](https://github.com/ggerganov/llama.cpp/blob/master/README.md#coding-guidelines)
6+
* Test your changes using the commands in the [`tests`](tests) folder. For instance, running the `./tests/test-backend-ops` command tests different backend implementations of the GGML library
7+
* Execute [the full CI locally on your machine](ci/README.md) before publishing
8+
9+
## PR formatting
10+
11+
* Please rate the complexity of your PR (i.e. `Review Complexity : Low`, `Review Complexity : Medium`, `Review Complexity : High`). This makes it easier for maintainers to triage the PRs.
12+
- The PR template has a series of review complexity checkboxes `[ ]` that you can mark as `[X]` for your conveience. Refer to [About task lists](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/about-task-lists) for more information.
13+
* If the pull request only contains documentation changes (e.g., updating READMEs, adding new wiki pages), please add `[no ci]` to the commit title. This will skip unnecessary CI checks and help reduce build times.
14+
* When squashing multiple commits on merge, use the following format for your commit title: `<module> : <commit title> (#<issue_number>)`. For example: `utils : Fix typo in utils.py (#1234)`

README.md

Lines changed: 0 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,6 @@ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others)
5353
<li><a href="#quantization">Quantization</a></li>
5454
<li><a href="#interactive-mode">Interactive mode</a></li>
5555
<li><a href="#constrained-output-with-grammars">Constrained output with grammars</a></li>
56-
<li><a href="#instruct-mode">Instruct mode</a></li>
5756
<li><a href="#obtaining-and-using-the-facebook-llama-2-model">Obtaining and using the Facebook LLaMA 2 model</a></li>
5857
<li><a href="#seminal-papers-and-background-on-the-models">Seminal papers and background on the models</a></li>
5958
<li><a href="#perplexity-measuring-model-quality">Perplexity (measuring model quality)</a></li>
@@ -769,34 +768,6 @@ The `grammars/` folder contains a handful of sample grammars. To write your own,
769768

770769
For authoring more complex JSON grammars, you can also check out https://grammar.intrinsiclabs.ai/, a browser app that lets you write TypeScript interfaces which it compiles to GBNF grammars that you can save for local use. Note that the app is built and maintained by members of the community, please file any issues or FRs on [its repo](http://github.com/intrinsiclabsai/gbnfgen) and not this one.
771770

772-
### Instruct mode
773-
774-
1. First, download and place the `ggml` model into the `./models` folder
775-
2. Run the `main` tool like this:
776-
777-
```
778-
./examples/alpaca.sh
779-
```
780-
781-
Sample run:
782-
783-
```
784-
== Running in interactive mode. ==
785-
- Press Ctrl+C to interject at any time.
786-
- Press Return to return control to LLaMA.
787-
- If you want to submit another line, end your input in '\'.
788-
789-
Below is an instruction that describes a task. Write a response that appropriately completes the request.
790-
791-
> How many letters are there in the English alphabet?
792-
There 26 letters in the English Alphabet
793-
> What is the most common way of transportation in Amsterdam?
794-
The majority (54%) are using public transit. This includes buses, trams and metros with over 100 lines throughout the city which make it very accessible for tourists to navigate around town as well as locals who commute by tram or metro on a daily basis
795-
> List 5 words that start with "ca".
796-
cadaver, cauliflower, cabbage (vegetable), catalpa (tree) and Cailleach.
797-
>
798-
```
799-
800771
### Obtaining and using the Facebook LLaMA 2 model
801772

802773
- Refer to [Facebook's LLaMA download page](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) if you want to access the model data.

common/common.cpp

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -200,19 +200,13 @@ void gpt_params_handle_model_default(gpt_params & params) {
200200
}
201201
params.hf_file = params.model;
202202
} else if (params.model.empty()) {
203-
std::string cache_directory = fs_get_cache_directory();
204-
const bool success = fs_create_directory_with_parents(cache_directory);
205-
if (!success) {
206-
throw std::runtime_error("failed to create cache directory: " + cache_directory);
207-
}
208-
params.model = cache_directory + string_split(params.hf_file, '/').back();
203+
params.model = fs_get_cache_file(string_split(params.hf_file, '/').back());
209204
}
210205
} else if (!params.model_url.empty()) {
211206
if (params.model.empty()) {
212207
auto f = string_split(params.model_url, '#').front();
213208
f = string_split(f, '?').front();
214-
f = string_split(f, '/').back();
215-
params.model = "models/" + f;
209+
params.model = fs_get_cache_file(string_split(f, '/').back());
216210
}
217211
} else if (params.model.empty()) {
218212
params.model = DEFAULT_MODEL_PATH;
@@ -2279,6 +2273,16 @@ std::string fs_get_cache_directory() {
22792273
return ensure_trailing_slash(cache_directory);
22802274
}
22812275

2276+
std::string fs_get_cache_file(const std::string & filename) {
2277+
GGML_ASSERT(filename.find(DIRECTORY_SEPARATOR) == std::string::npos);
2278+
std::string cache_directory = fs_get_cache_directory();
2279+
const bool success = fs_create_directory_with_parents(cache_directory);
2280+
if (!success) {
2281+
throw std::runtime_error("failed to create cache directory: " + cache_directory);
2282+
}
2283+
return cache_directory + filename;
2284+
}
2285+
22822286

22832287
//
22842288
// Model utils

common/common.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -277,6 +277,7 @@ bool fs_validate_filename(const std::string & filename);
277277
bool fs_create_directory_with_parents(const std::string & path);
278278

279279
std::string fs_get_cache_directory();
280+
std::string fs_get_cache_file(const std::string & filename);
280281

281282
//
282283
// Model utils

convert-hf-to-gguf.py

Lines changed: 22 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -47,11 +47,12 @@ class Model:
4747
_model_classes: dict[str, type[Model]] = {}
4848

4949
dir_model: Path
50-
ftype: int
50+
ftype: gguf.LlamaFileType
5151
is_big_endian: bool
5252
endianess: gguf.GGUFEndian
5353
use_temp_file: bool
5454
lazy: bool
55+
model_name: str | None
5556
part_names: list[str]
5657
is_safetensors: bool
5758
hparams: dict[str, Any]
@@ -64,7 +65,7 @@ class Model:
6465
# subclasses should define this!
6566
model_arch: gguf.MODEL_ARCH
6667

67-
def __init__(self, dir_model: Path, ftype: gguf.LlamaFileType, fname_out: Path, is_big_endian: bool, use_temp_file: bool, eager: bool):
68+
def __init__(self, dir_model: Path, ftype: gguf.LlamaFileType, fname_out: Path, is_big_endian: bool, use_temp_file: bool, eager: bool, model_name: str | None):
6869
if type(self) is Model:
6970
raise TypeError(f"{type(self).__name__!r} should not be directly instantiated")
7071
self.dir_model = dir_model
@@ -73,10 +74,11 @@ def __init__(self, dir_model: Path, ftype: gguf.LlamaFileType, fname_out: Path,
7374
self.endianess = gguf.GGUFEndian.BIG if is_big_endian else gguf.GGUFEndian.LITTLE
7475
self.use_temp_file = use_temp_file
7576
self.lazy = not eager
76-
self.part_names = Model.get_model_part_names(self.dir_model, ".safetensors")
77+
self.model_name = model_name
78+
self.part_names = Model.get_model_part_names(self.dir_model, "model", ".safetensors")
7779
self.is_safetensors = len(self.part_names) > 0
7880
if not self.is_safetensors:
79-
self.part_names = Model.get_model_part_names(self.dir_model, ".bin")
81+
self.part_names = Model.get_model_part_names(self.dir_model, "pytorch_model", ".bin")
8082
self.hparams = Model.load_hparams(self.dir_model)
8183
self.block_count = self.find_hparam(["n_layers", "num_hidden_layers", "n_layer"])
8284
self.tensor_map = gguf.get_tensor_name_map(self.model_arch, self.block_count)
@@ -94,7 +96,7 @@ def __init__(self, dir_model: Path, ftype: gguf.LlamaFileType, fname_out: Path,
9496
ftype_lw: str = ftype_up.lower()
9597
# allow templating the file name with the output ftype, useful with the "auto" ftype
9698
self.fname_out = fname_out.parent / fname_out.name.format(ftype_lw, outtype=ftype_lw, ftype=ftype_lw, OUTTYPE=ftype_up, FTYPE=ftype_up)
97-
self.gguf_writer = gguf.GGUFWriter(self.fname_out, gguf.MODEL_ARCH_NAMES[self.model_arch], endianess=self.endianess, use_temp_file=self.use_temp_file)
99+
self.gguf_writer = gguf.GGUFWriter(path=None, arch=gguf.MODEL_ARCH_NAMES[self.model_arch], endianess=self.endianess, use_temp_file=self.use_temp_file)
98100

99101
@classmethod
100102
def __init_subclass__(cls):
@@ -182,7 +184,7 @@ def map_tensor_name(self, name: str, try_suffixes: Sequence[str] = (".weight", "
182184
return new_name
183185

184186
def set_gguf_parameters(self):
185-
self.gguf_writer.add_name(self.dir_model.name)
187+
self.gguf_writer.add_name(self.dir_model.name if self.model_name is None else self.model_name)
186188
self.gguf_writer.add_block_count(self.block_count)
187189

188190
if (n_ctx := self.find_hparam(["max_position_embeddings", "n_ctx"], optional=True)) is not None:
@@ -324,21 +326,21 @@ def write_tensors(self):
324326

325327
def write(self):
326328
self.write_tensors()
327-
self.gguf_writer.write_header_to_file()
329+
self.gguf_writer.write_header_to_file(self.fname_out)
328330
self.gguf_writer.write_kv_data_to_file()
329331
self.gguf_writer.write_tensors_to_file(progress=True)
330332
self.gguf_writer.close()
331333

332334
def write_vocab(self):
333-
self.gguf_writer.write_header_to_file()
335+
self.gguf_writer.write_header_to_file(self.fname_out)
334336
self.gguf_writer.write_kv_data_to_file()
335337
self.gguf_writer.close()
336338

337339
@staticmethod
338-
def get_model_part_names(dir_model: Path, suffix: str) -> list[str]:
340+
def get_model_part_names(dir_model: Path, prefix: str, suffix: str) -> list[str]:
339341
part_names: list[str] = []
340342
for filename in os.listdir(dir_model):
341-
if filename.endswith(suffix):
343+
if filename.startswith(prefix) and filename.endswith(suffix):
342344
part_names.append(filename)
343345

344346
part_names.sort()
@@ -665,7 +667,7 @@ class GPTNeoXModel(Model):
665667
def set_gguf_parameters(self):
666668
block_count = self.hparams["num_hidden_layers"]
667669

668-
self.gguf_writer.add_name(self.dir_model.name)
670+
self.gguf_writer.add_name(self.dir_model.name if self.model_name is None else self.model_name)
669671
self.gguf_writer.add_context_length(self.hparams["max_position_embeddings"])
670672
self.gguf_writer.add_embedding_length(self.hparams["hidden_size"])
671673
self.gguf_writer.add_block_count(block_count)
@@ -798,7 +800,7 @@ def set_vocab(self):
798800

799801
def set_gguf_parameters(self):
800802
block_count = self.hparams["n_layers"]
801-
self.gguf_writer.add_name(self.dir_model.name)
803+
self.gguf_writer.add_name(self.dir_model.name if self.model_name is None else self.model_name)
802804
self.gguf_writer.add_context_length(self.hparams["max_seq_len"])
803805
self.gguf_writer.add_embedding_length(self.hparams["d_model"])
804806
self.gguf_writer.add_block_count(block_count)
@@ -850,7 +852,7 @@ def set_gguf_parameters(self):
850852
raise ValueError("gguf: can not find ctx length parameter.")
851853

852854
self.gguf_writer.add_file_type(self.ftype)
853-
self.gguf_writer.add_name(self.dir_model.name)
855+
self.gguf_writer.add_name(self.dir_model.name if self.model_name is None else self.model_name)
854856
self.gguf_writer.add_source_hf_repo(hf_repo)
855857
self.gguf_writer.add_tensor_data_layout("Meta AI original pth")
856858
self.gguf_writer.add_context_length(ctx_length)
@@ -887,7 +889,7 @@ def set_gguf_parameters(self):
887889
else:
888890
raise ValueError("gguf: can not find ctx length parameter.")
889891

890-
self.gguf_writer.add_name(self.dir_model.name)
892+
self.gguf_writer.add_name(self.dir_model.name if self.model_name is None else self.model_name)
891893
self.gguf_writer.add_source_hf_repo(hf_repo)
892894
self.gguf_writer.add_tensor_data_layout("Meta AI original pth")
893895
self.gguf_writer.add_context_length(ctx_length)
@@ -1010,7 +1012,7 @@ def set_gguf_parameters(self):
10101012
else:
10111013
raise ValueError("gguf: can not find ctx length parameter.")
10121014

1013-
self.gguf_writer.add_name(self.dir_model.name)
1015+
self.gguf_writer.add_name(self.dir_model.name if self.model_name is None else self.model_name)
10141016
self.gguf_writer.add_source_hf_repo(hf_repo)
10151017
self.gguf_writer.add_tensor_data_layout("Meta AI original pth")
10161018
self.gguf_writer.add_context_length(ctx_length)
@@ -1206,7 +1208,7 @@ def set_gguf_parameters(self):
12061208
hparams = self.hparams
12071209
block_count = hparams["num_hidden_layers"]
12081210

1209-
self.gguf_writer.add_name(self.dir_model.name)
1211+
self.gguf_writer.add_name(self.dir_model.name if self.model_name is None else self.model_name)
12101212
self.gguf_writer.add_context_length(hparams["max_position_embeddings"])
12111213
self.gguf_writer.add_embedding_length(hparams["hidden_size"])
12121214
self.gguf_writer.add_block_count(block_count)
@@ -1681,7 +1683,7 @@ class GPT2Model(Model):
16811683
model_arch = gguf.MODEL_ARCH.GPT2
16821684

16831685
def set_gguf_parameters(self):
1684-
self.gguf_writer.add_name(self.dir_model.name)
1686+
self.gguf_writer.add_name(self.dir_model.name if self.model_name is None else self.model_name)
16851687
self.gguf_writer.add_block_count(self.hparams["n_layer"])
16861688
self.gguf_writer.add_context_length(self.hparams["n_ctx"])
16871689
self.gguf_writer.add_embedding_length(self.hparams["n_embd"])
@@ -2248,7 +2250,7 @@ def set_gguf_parameters(self):
22482250
hparams = self.hparams
22492251
block_count = hparams["num_hidden_layers"]
22502252

2251-
self.gguf_writer.add_name(self.dir_model.name)
2253+
self.gguf_writer.add_name(self.dir_model.name if self.model_name is None else self.model_name)
22522254
self.gguf_writer.add_context_length(hparams["max_position_embeddings"])
22532255
self.gguf_writer.add_embedding_length(hparams["hidden_size"])
22542256
self.gguf_writer.add_block_count(block_count)
@@ -2348,7 +2350,7 @@ def set_gguf_parameters(self):
23482350
# Fail early for models which don't have a block expansion factor of 2
23492351
assert d_inner == 2 * d_model
23502352

2351-
self.gguf_writer.add_name(self.dir_model.name)
2353+
self.gguf_writer.add_name(self.dir_model.name if self.model_name is None else self.model_name)
23522354
self.gguf_writer.add_context_length(2**20) # arbitrary value; for those who use the default
23532355
self.gguf_writer.add_embedding_length(d_model)
23542356
self.gguf_writer.add_feed_forward_length(0) # unused, but seemingly required when loading
@@ -2852,7 +2854,7 @@ def main() -> None:
28522854
logger.error(f"Model {hparams['architectures'][0]} is not supported")
28532855
sys.exit(1)
28542856

2855-
model_instance = model_class(dir_model, ftype_map[args.outtype], fname_out, args.bigendian, args.use_temp_file, args.no_lazy)
2857+
model_instance = model_class(dir_model, ftype_map[args.outtype], fname_out, args.bigendian, args.use_temp_file, args.no_lazy, args.model_name)
28562858

28572859
logger.info("Set model parameters")
28582860
model_instance.set_gguf_parameters()

examples/alpaca.sh

Lines changed: 0 additions & 19 deletions
This file was deleted.

examples/gpt4all.sh

Lines changed: 0 additions & 15 deletions
This file was deleted.

0 commit comments

Comments
 (0)