Skip to content

Commit ab85a84

Browse files
committed
Added docs
1 parent 73b8583 commit ab85a84

File tree

2 files changed

+9
-2
lines changed

2 files changed

+9
-2
lines changed

src/llama-sampling.cpp

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1013,7 +1013,6 @@ static const char * llama_sampler_temp_ext_name(const struct llama_sampler * /*s
10131013

10141014
static void llama_sampler_temp_ext_apply(struct llama_sampler * smpl, llama_token_data_array * cur_p) {
10151015
const auto * ctx = (llama_sampler_temp_ext *) smpl->ctx;
1016-
10171016
if (ctx->delta > 0) {
10181017
const float min_temp = std::max(0.0f, ctx->temp - ctx->delta);
10191018
const float max_temp = ctx->temp + ctx->delta;

tools/main/README.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -301,7 +301,15 @@ Example usage: `--xtc-probability 0.5 --xtc-threshold 0.1`
301301
- `--smoothing-factor N`: Set the smoothing factor for smoothing / quadratic sampling (default: 0.0).
302302
- `--smoothing-curve N`: Set the cubic transformation curve for smoothing / quadratic sampling (default: 1.0).
303303

304-
Smoothing / Quadratic Sampling is a sampler that modifies the probability of each token instead of removing tokens, similar to what temperature does. (TODO: finish this part)
304+
(Source: https://github.com/ggml-org/llama.cpp/pull/6445)
305+
306+
Smoothing / Quadratic Sampling, as described in the [original PR](https://github.com/ggml-org/llama.cpp/pull/6445), is a sampler that changes the probability distribution of tokens in a non-linear fashion. This sampler does not remote any tokens; instead, it tweaks the original logit scores of each token based on the distance from the topmost logit using quadratic transformation. This can be viewed as an alternative to Temperature that scales differently while still punishing extreme outlier tokens.
307+
308+
By performing a non-linear transformation on token logits, we can effectively avoid biasing towards the topmost token if there is a group of similar probability tokens at the top, thus creating more variance. Higher values of `smoothing factor` would result in more deterministic output, while lower values would boost the creativity of the model. "Smoothing Factor" values of 0.2-0.3 are generally thought to be good for creative writing. It is worth noting that a smoothing factor value of `0.0` disables the sampler completely.
309+
310+
`smoothing curve` is a second hyperparameter that adds a cubic transformation on top of the original quadratic one, and can "help make lower `smoothing factor` values work if the curve is set higher. A smoothing curve value of `1.0` is equivalant of using just quadratic transformation.
311+
312+
This sampler is not mutually exclusive with Temperature, they can be used together.
305313

306314
### Top-nσ Sampling
307315

0 commit comments

Comments
 (0)