readme : add link to Autopen under UIs #11684

blackhole89 · 2025-02-05T20:07:44Z

Sorry that after all the months of AWOL this triviality is all I come back with, but I'd be grateful if my little project could be added to the README list.

Autopen is a graphical text editor that uses llama.cpp to tokenize the buffer on the fly, score the buffer, visualise token logits and allow you to switch back and forth between different possible completions at any point. There's a demo video here. I hope the criteria for inclusion are met, as I'm stating the dependency prominently.

(I think I might also have found some subtle issues (of numerical stability, snapshot save/restore, and/or the CUDA kernels) in the process of working on it, where different sequences of decode calls that should give identical results don't, but I'm still working on isolating the exact conditions in a way that is actionable.)

Autopen (https://github.com/blackhole89/autopen) is a graphical text editor that uses llama.cpp to tokenize the buffer on the fly, score the buffer, visualise token logits and allow you to switch back and forth between different possible completions at any point. It hopefully meets the criteria for inclusion, as the dependency on llama.cpp is stated prominently.

ggerganov · 2025-02-06T07:09:49Z

(I think I might also have found some subtle issues (of numerical stability, snapshot save/restore, and/or the CUDA kernels) in the process of working on it, where different sequences of decode calls that should give identical results don't, but I'm still working on isolating the exact conditions in a way that is actionable.)

Indeed there are some subtle effects that lead to differences when processing the same input with different batch sizes (see for example #7745). Not clear how to make this fully deterministic. There are at least 2 points in the computation that are problematic in this regard:

The max computation over the sequence length in the softmax operator of the attention
The KQ*V reduction over the sequence length at the end of the attention

blackhole89 · 2025-02-07T05:23:43Z

Ah, good to know that it's a known problem. I've observed discrepancies on the order of 0.5 logit units (usually less), though my use case may be a bit pathological since I frequently reevaluate the same run of ~10 tokens (the window between two snapshots, with default params, in my program) batched up differently, in a context where the ordering of two choices suddenly flipping is quite conspicuous.

Thanks for putting in my link!

Autopen (https://github.com/blackhole89/autopen) is a graphical text editor that uses llama.cpp to tokenize the buffer on the fly, score the buffer, visualise token logits and allow you to switch back and forth between different possible completions at any point. It hopefully meets the criteria for inclusion, as the dependency on llama.cpp is stated prominently.

slaren approved these changes Feb 6, 2025

View reviewed changes

slaren merged commit c3db048 into ggml-org:master Feb 6, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

readme : add link to Autopen under UIs #11684

readme : add link to Autopen under UIs #11684

Uh oh!

blackhole89 commented Feb 5, 2025

Uh oh!

Uh oh!

ggerganov commented Feb 6, 2025

Uh oh!

blackhole89 commented Feb 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

readme : add link to Autopen under UIs #11684

readme : add link to Autopen under UIs #11684

Uh oh!

Conversation

blackhole89 commented Feb 5, 2025

Uh oh!

Uh oh!

ggerganov commented Feb 6, 2025

Uh oh!

blackhole89 commented Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

blackhole89 commented Feb 7, 2025 •

edited

Loading