Thanks a lot for your input. It's a lot to stomach but very descriptive which is what I need.
I run this Koboldcpp in a container.
What I ended up doing and which was semi-working is:
--model "/app/models/mythomax-l2-13b.ggmlv3.q5_0.bin" --port 80 --stream --unbantokens --threads 8 --contextsize 4096 --useclblas 0 0
In the Kobboldcpp UI, I set max response token to 512 and switched to an Instruction/response model and kept prompting with "continue the writing", with the MythoMax model.
But I'll be re-checking your way of doing it because the SuperCOT model seemed less streamlined and more qualitative in its story writing.
Don't be sorry, you're being so helpful, thank you a lot.
I finally replicated your config:
localhost/koboldcpp:v1.43 --port 80 --threads 4 --contextsize 8192 --useclblas 0 0 --smartcontext --ropeconfig 1.0 32000 --stream "/app/models/mythomax-l2-kimiko-v2-13b.Q5_K_M.gguf"
And had satisfying results! The performance of LLaMA2 really is nice to have here as well.