LocalLLaMA

2249 readers

1 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 1 year ago

MODERATORS

pax@sh.itjust.works

SkySyrup@sh.itjust.works

noneabove1182@sh.itjust.works

[Help] Trying to run a local Story telling model with KoboldCpp (kbin.social)

submitted 1 year ago* (last edited 1 year ago) by darkeox@kbin.social to c/localllama@sh.itjust.works

15 comments fedilink hide all child comments

Hi,

Just like the title says:

I'm try to run:

https://huggingface.co/TheBloke/WizardLM-Uncensored-SuperCOT-StoryTelling-30B-SuperHOT-8K-GGML

With:

koboldcpp:v1.43 using HIPBLAS on a 7900XTX / Arch Linux

Running :

--stream --unbantokens --threads 8 --usecublas normal

I get very limited output with lots of repetition.

Illustrattion

I mostly didn't touch the default settings:

Settings

Does anyone know how I can make things run better?

EDIT: Sorry for multiple posts, Fediverse bugged out.

you are viewing a single comment's thread
view the rest of the comments

[–] darkeox@kbin.social 2 points 1 year ago (1 children)

The MythoMax looks nice but I'm using it in story mode and it seems to have problems progressing once it's reached the max token, it appears stuck:

Generating (1 / 512 tokens)
(EOS token triggered!)
Time Taken - Processing:4.8s (9ms/T), Generation:0.0s (1ms/T), Total:4.8s (0.2T/s)
Output:

And then stops when I try to prompt it to continue the story.

[–] rufus@discuss.tchncs.de 1 points 1 year ago* (last edited 1 year ago) (1 children)

That is correct behaviour. At some point it'll decide this is the text you requested and follow it up with an EOS token. You either need to suppress that token and force it to generate endlessly. (With your --unbantoken you activate that EOS token and this behaviour.) Or manually add something and hit 'generate again. For example just a line break after the text often does the trick for me.

I can take a screenshot tomorrow.

Edit: Also your rope config doesn't seem correct for a superHOT model. And your prompt from the screenshot isn't what I'd expect when dealing with a WizardLM model. I'll see if I can reproduce your issues and write a few more words tomorrow.

Edit2: Notes:

I think SuperHOT means linear scale. So for a 8k LLaMA1: --contextsize 8192 --ropeconfig 0.25 10000
No --unbantokens if you don't want it to stop
WizardLM prompt format is: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Hi ASSISTANT: Hello.USER: Who are you? ASSISTANT: I am WizardLM.......
SuperCOT prompt format is: Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n\n\n### Input:\n\n\n### Response:\n
Storywrite is probably plain stories. But idk.

The chosen model is kind of badly documented. And a bit older. I'm not sure if it's the best choice.

Edit3: I've put this in better words and made another comment including screenshots and my workflow.

[–] micheal65536@lemmy.micheal65536.duckdns.org 2 points 1 year ago

Yeah, I think you need to set the contextsize and ropeconfig. Documentation isn't completely clear and in some places sort of implies that it should be autodetected based on the model when using a recent version, but the first thing I would try is setting these explicitly as this definitely looks like an encoding issue.