how to make llama.cpp keep talking forever as in writing a book and a way to save it? #1852

kolinfluence · 2023-06-14T13:12:21Z

kolinfluence
Jun 14, 2023

totally newbie here, not sure how the parameters work. though i've downloaded and used the uncensored vicuna 13b and it's working great (except that it only uses half of my cpu cores which i think is kind of wasted)

how do i make it write a complete book of 1600 pages and save the content while it's generating?
where do i find all the parameters documentaiton / tutorial?
how do i expertly prompt it to do something? as in expertly prompting it. not like the examples given i think it's... quality wise not very good though. so maybe also how to fine tune quality?
how do i train it to learn?

Answered by KerfuffleV2

Jun 14, 2023

does anyone know of any trick to make it "coherent" too?

Basically have it write your story in chunks around the context and write a prompt for each chunk with whatever information is needed to write that part. You can't have it just write a long strong by itself.

Also another thing to keep in mind is these local models are generally much less powerful than something like ChatGPT and have a much smaller maximum context size as well. So have realistic expectations. The main advantage for local models now is that they're under your full control and private.

I think someone actually did use ChatGPT to write a book (with some hand-holding). They did get it done but it wasn't a good book or …

View full answer

KerfuffleV2 · 2023-06-14T15:40:49Z

KerfuffleV2
Jun 14, 2023
Collaborator

You can get a partial answer by reading my answers here: #1838

how do i make it write a complete book of 1600 pages and save the content while it's generating?

You can't really do that. These models mostly have a context limit of 2048 tokens and even if you use tricks to get it to keep generating past that you're not really going to get something very coherent.

where do i find all the parameters documentaiton / tutorial?

Run the program with --help, look at the main README file and also the one under examples/main/

how do i expertly prompt it to do something? as in expertly prompting it.

Different models have different prompt formats. Sometimes ignoring the prompt produces better output (probably mainly for creative stuff, not Q&A). You'll need to experiment or look for discussion forums and other examples and see the techniques people use.

Here's an example I used to demonstrate prompting Guanaco-65B to write a little story: https://gist.github.com/KerfuffleV2/46689e097d8b8a6b3a5d6ffc39ce7acd

edit: Oops, I actually linked the wrong thing there. That's normal LLaMA-65B (which produced a much worse result than Guanaco). Here's the correct link, although the prompt is the same: https://gist.github.com/KerfuffleV2/4ead8be7204c4b0911c3f3183e8a320c#file-2_guanaco-65b-ggmlv3-q4_k_m-md (one thing to note is Guanaco actually has a specific format that you're supposed to use for prompting but I ignored it here)

One trick is writing tags for your content, that can guide the model toward producing the type of content you want.

how do i train it to learn?

I'm not really qualified to answer that one. llama.cpp does have some vestigal training stuff but as far as I know it's not really suitable for training large models. Training also requires much more hardware and time than just using the model to generate stuff. You basically need a really beefy GPU like a 3090 with 24GB VRAM, or you could maybe rent compute to do training.

0 replies

kolinfluence · 2023-06-14T16:39:27Z

kolinfluence
Jun 14, 2023
Author

@KerfuffleV2 thx for the informative feedback!

You can't really do that. These models mostly have a context limit of 2048 tokens and even if you use tricks to get it to keep generating past that you're not really going to get something very coherent.

does anyone know of any trick to make it "coherent" too? i can write programs.

i'm really looking forward to training llama so not sure how to get this done but 3090 24GB VRAM... possible to make this into just pure cpu i wonder. willing to get more cpu power and ram size > 128gb than invest in a GPU (coz gpu gets obsolete faster than the cpu, i have more use for cpu than 24/7 gpu usage i guess)

i've tried 7b, 13b and 30b and surprised the greater the "b" the better it is! and the jump in quality is like 20 point IQ better.

p.s. : now i'm limited to running in 30b, after running 30b, i cant stop wondering how wudao 2.0 1 trillion parameters feel and how to get it running on cpu.

i have a 2060 RTX that's like obsolete and it's really useless for me now.
no gpu pls, will wait 3 days for anyone who knows how to train llama on cpu (will invest the hardware). Do mention if anyone knows how to train it on cpu. i have this vps hosting that's sitting idle most of the time and i can do something with it's cpu time.

0 replies

KerfuffleV2 · 2023-06-14T17:07:30Z

KerfuffleV2
Jun 14, 2023
Collaborator

does anyone know of any trick to make it "coherent" too?

Basically have it write your story in chunks around the context and write a prompt for each chunk with whatever information is needed to write that part. You can't have it just write a long strong by itself.

Also another thing to keep in mind is these local models are generally much less powerful than something like ChatGPT and have a much smaller maximum context size as well. So have realistic expectations. The main advantage for local models now is that they're under your full control and private.

I think someone actually did use ChatGPT to write a book (with some hand-holding). They did get it done but it wasn't a good book or anything.

willing to get more cpu power and ram size > 128gb than invest in a GPU

It just doesn't work that well because GPUs tend to have very high memory bandwidth + are highly parallel. Both are things that benefit LLM training (and inference).

You're better off renting time on someone else's powerful GPU for training than trying to do it on CPU.

i've tried 7b, 13b and 30b and surprised the greater the "b" the better it is!

Well, it's not that surprising that a frog is smarter than an ant and a dog is smarter than a frog. :)

i cant stop wondering how wudao 2.0 1 trillion parameters feel

The models you can run locally top out at about 65B parameters (there are some larger research models but they actually don't even perform that well in comparison). Talk to ChatGPT 4 if you want to know what talking an actually big model feels (but I'm pretty sure it's substantially less than 1T params).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

how to make llama.cpp keep talking forever as in writing a book and a way to save it? #1852

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

how to make llama.cpp keep talking forever as in writing a book and a way to save it? #1852

Uh oh!

kolinfluence Jun 14, 2023

Replies: 3 comments

Uh oh!

Uh oh!

KerfuffleV2 Jun 14, 2023 Collaborator

Uh oh!

Uh oh!

kolinfluence Jun 14, 2023 Author

Uh oh!

KerfuffleV2 Jun 14, 2023 Collaborator

kolinfluence
Jun 14, 2023

KerfuffleV2
Jun 14, 2023
Collaborator

kolinfluence
Jun 14, 2023
Author

KerfuffleV2
Jun 14, 2023
Collaborator