Skip to content

Commit d0bbcc6

Browse files
committed
Release docs proofreading
1 parent 2f9f94a commit d0bbcc6

File tree

2 files changed

+12
-10
lines changed

2 files changed

+12
-10
lines changed

examples/models/phi-3-mini-lora/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ To see how you can use the model exported for training in a fully involved finet
1111
- `./examples/models/phi-3-mini-lora/install_requirements.sh`
1212

1313
### Step 3: Export and run the model
14-
1. Export the inferenace and training models to ExecuTorch.
14+
1. Export the inference and training models to ExecuTorch.
1515
```
1616
python export_model.py
1717
```

extension/llm/README.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,9 @@ This subtree contains libraries and utils of running generative AI, including La
22
Below is a list of sub folders.
33
## export
44
Model preparation codes are in _export_ folder. The main entry point is the _LLMEdgeManager_ class. It hosts a _torch.nn.Module_, with a list of methods that can be used to prepare the LLM model for ExecuTorch runtime.
5-
Note that ExecuTorch supports two [quantization APIs](https://pytorch.org/docs/stable/quantization.html#quantization-api-summary): eager mode quantization (aka source transform based quantization), and PyTorch 2 Export based quantization (aka pt2e quantization).
6-
Typical methods include:
5+
Note that ExecuTorch supports two [quantization APIs](https://pytorch.org/docs/stable/quantization.html#quantization-api-summary): eager mode quantization (aka source transform based quantization) and PyTorch 2 Export based quantization (aka pt2e quantization).
6+
7+
Commonly used methods in this class include:
78
- _set_output_dir_: where users want to save the exported .pte file.
89
- _to_dtype_: override the data type of the module.
910
- _source_transform_: execute a series of source transform passes. Some transform passes include
@@ -19,7 +20,7 @@ Typical methods include:
1920

2021
Some usage of LLMEdgeManager can be found in executorch/examples/models/llama2, and executorch/examples/models/llava.
2122

22-
When the .pte file is exported and saved, we can prepare a load and run it in a runner.
23+
When the .pte file is exported and saved, we can load and run it in a runner (see below).
2324

2425
## tokenizer
2526
Currently, we support two types of tokenizers: sentencepiece and Tiktoken.
@@ -28,20 +29,21 @@ Currently, we support two types of tokenizers: sentencepiece and Tiktoken.
2829
- _tokenizer.py_: rewrite a sentencepiece tokenizer model to a serialization format that the runtime can load.
2930
- In C++:
3031
- _tokenizer.h_: a simple tokenizer interface. Actual tokenizer classes can be implemented based on this. In this folder, we provide two tokenizer implementations:
31-
- _bpe_tokenizer_. We need the rewritten version of tokenizer artifact (refer to _tokenizer.py_ above), for bpe tokenizer to work.
32-
- _tiktokern_. It's for llama3 and llama3.1.
32+
- _bpe_tokenizer_. Note: we need the rewritten version of tokenizer artifact (refer to _tokenizer.py_ above), for bpe tokenizer to work.
33+
- _tiktoken_. For llama3 and llama3.1.
3334

3435
## sampler
3536
A sampler class in C++ to sample the logistics given some hyperparameters.
3637

3738
## custom_ops
38-
It hosts a custom sdpa operator. This sdpa operator implements CPU flash attention, it avoids copies by taking the kv cache as one of the arguments to this custom operator.
39-
- _sdpa_with_kv_cache.py_, _op_sdpa_aot.cpp_: custom op definition in PyTorch with C++ registration.
40-
- _op_sdpa.cpp_: the optimized operator implementation and registration of _sdpa_with_kv_cache.out_.
39+
Contains custom op, such as:
40+
- custom sdpa: implements CPU flash attention and avoids copies by taking the kv cache as one of its arguments.
41+
- _sdpa_with_kv_cache.py_, _op_sdpa_aot.cpp_: custom op definition in PyTorch with C++ registration.
42+
- _op_sdpa.cpp_: the optimized operator implementation and registration of _sdpa_with_kv_cache.out_.
4143

4244
## runner
4345
It hosts the libary components used in a C++ llm runner. Currently, it hosts _stats.h_ on runtime status like token numbers and latency.
4446

45-
With the components above, an actual runner can be built for a model or a series of models. An exmaple is in //executorch/examples/models/llama2/runner, where a C++ runner code is built to run Llama 2, 3, 3.1 and other models using the same architecture.
47+
With the components above, an actual runner can be built for a model or a series of models. An example is in //executorch/examples/models/llama2/runner, where a C++ runner code is built to run Llama 2, 3, 3.1 and other models using the same architecture.
4648

4749
Usages can also be found in the [torchchat repo](https://github.com/pytorch/torchchat/tree/main/runner).

0 commit comments

Comments
 (0)