how to make llama.cpp keep talking forever as in writing a book and a way to save it? #1852
-
totally newbie here, not sure how the parameters work. though i've downloaded and used the uncensored vicuna 13b and it's working great (except that it only uses half of my cpu cores which i think is kind of wasted)
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
You can get a partial answer by reading my answers here: #1838
You can't really do that. These models mostly have a context limit of 2048 tokens and even if you use tricks to get it to keep generating past that you're not really going to get something very coherent.
Run the program with
Different models have different prompt formats. Sometimes ignoring the prompt produces better output (probably mainly for creative stuff, not Q&A). You'll need to experiment or look for discussion forums and other examples and see the techniques people use. Here's an example I used to demonstrate prompting Guanaco-65B to write a little story: https://gist.github.com/KerfuffleV2/46689e097d8b8a6b3a5d6ffc39ce7acd edit: Oops, I actually linked the wrong thing there. That's normal LLaMA-65B (which produced a much worse result than Guanaco). Here's the correct link, although the prompt is the same: https://gist.github.com/KerfuffleV2/4ead8be7204c4b0911c3f3183e8a320c#file-2_guanaco-65b-ggmlv3-q4_k_m-md (one thing to note is Guanaco actually has a specific format that you're supposed to use for prompting but I ignored it here) One trick is writing tags for your content, that can guide the model toward producing the type of content you want.
I'm not really qualified to answer that one. llama.cpp does have some vestigal training stuff but as far as I know it's not really suitable for training large models. Training also requires much more hardware and time than just using the model to generate stuff. You basically need a really beefy GPU like a 3090 with 24GB VRAM, or you could maybe rent compute to do training. |
Beta Was this translation helpful? Give feedback.
-
@KerfuffleV2 thx for the informative feedback!
does anyone know of any trick to make it "coherent" too? i can write programs. i'm really looking forward to training llama so not sure how to get this done but 3090 24GB VRAM... possible to make this into just pure cpu i wonder. willing to get more cpu power and ram size > 128gb than invest in a GPU (coz gpu gets obsolete faster than the cpu, i have more use for cpu than 24/7 gpu usage i guess) i've tried 7b, 13b and 30b and surprised the greater the "b" the better it is! and the jump in quality is like 20 point IQ better. p.s. : now i'm limited to running in 30b, after running 30b, i cant stop wondering how wudao 2.0 1 trillion parameters feel and how to get it running on cpu. i have a 2060 RTX that's like obsolete and it's really useless for me now. |
Beta Was this translation helpful? Give feedback.
-
Basically have it write your story in chunks around the context and write a prompt for each chunk with whatever information is needed to write that part. You can't have it just write a long strong by itself. Also another thing to keep in mind is these local models are generally much less powerful than something like ChatGPT and have a much smaller maximum context size as well. So have realistic expectations. The main advantage for local models now is that they're under your full control and private. I think someone actually did use ChatGPT to write a book (with some hand-holding). They did get it done but it wasn't a good book or anything.
It just doesn't work that well because GPUs tend to have very high memory bandwidth + are highly parallel. Both are things that benefit LLM training (and inference). You're better off renting time on someone else's powerful GPU for training than trying to do it on CPU.
Well, it's not that surprising that a frog is smarter than an ant and a dog is smarter than a frog. :)
The models you can run locally top out at about 65B parameters (there are some larger research models but they actually don't even perform that well in comparison). Talk to ChatGPT 4 if you want to know what talking an actually big model feels (but I'm pretty sure it's substantially less than 1T params). |
Beta Was this translation helpful? Give feedback.
Basically have it write your story in chunks around the context and write a prompt for each chunk with whatever information is needed to write that part. You can't have it just write a long strong by itself.
Also another thing to keep in mind is these local models are generally much less powerful than something like ChatGPT and have a much smaller maximum context size as well. So have realistic expectations. The main advantage for local models now is that they're under your full control and private.
I think someone actually did use ChatGPT to write a book (with some hand-holding). They did get it done but it wasn't a good book or …