Cuda + Docker is generating only infinite text. #12328
Unanswered
Wandering-Magi
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
OS: Arch Linux
Build: Docker + llama.cpp:full-cuda
Built from this guide.
I just started messing around with AI this week, so forgive me for not already knowing all of the words.
I got the basic build working the other night, and got it running with llama-cli. The generation speed was pretty slow, so I wanted to use graphic acceleration so I can get some real speed. But is was working, so I know it's possible.
As usual, Nvidia on Arch continues to be the bane of my existence, but that's neither here nor there. I set up CUDA, installed the toolkit, installed docker, got everything configured and working. A fun learning experience.
If I run it using the example text, it generates text about making a website. Cool, works. The catch is getting the damn thing to pause for user input. Which the example won't do, but a few extra commands should change that.
This is the current command I'm using. All I'm trying to do right now is to get it into conversation mode.
For the record, I have tried both
-p
and-sys
, just to see if it would help at all. Same problem.You can see at the end there, I'm using ctrl+c to interject, but it just won't stop. I have tried this at various lengths into the generation. After 3 calls it interrupts, and I go and use
kill N
to shut it down.If I try to run it in
--server --port 8080
mode, I can't access the server from my browser athttp://127.0.0.1:8080
. I have also tried--host 0.0.0.0
as I have found on other threads, but that doesn't work either.In short, I'm at the end of my ability to figure my way through this. I feel like I'm one step away from figuring this out, but I'm facing a niche case of a new and rapidly developing field.
Beta Was this translation helpful? Give feedback.
All reactions