StableDiffusion

98 readers

1 users here now

/r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and...

founded 1 year ago

MODERATORS

bot@lemmit.online

Things Flux Does Poorly? (old.reddit.com)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink hide all child comments

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/ArmadstheDoom on 2024-08-22 13:05:19+00:00.

So, I love Flux. I think it's a huge advancement. But I've noticed that there are things it seems to just... not know how to do well? It's entirely possible that I don't know how to prompt, but these are things I've noticed, and I figure that at the very worst, I'll learn how to fix these problems, and maybe other people can share their own issues.

There's a weird attachment to hands in pockets? When generating people, there's an odd thing I've noticed that the model just loves to put people's hands in their pockets, unprompted. I mean, I think this might be something done to avoid having to generate hands, but I've noticed it?
Emotions are really hard to generate? Some it seems to understand, like 'happy' or 'laughing' but others it just doesn't seem to understand at all, like 'surprised' in my experience. It's really hard to generate expressions such as 'confused' or 'worried' because it just doesn't seem to parse it well?
Styles are kind of hit and miss? Loras can solve this, though I've found that it can be kind of picky with whether or not it uses loras in the first place. It doesn't seem to understand say, the difference between a comic art style and an anime art style, and will often conflate the two, for example.
The lack of a negative prompt means that it's really hard to dissuade the model from things you don't want. For example, I noticed that a lot of images had earrings and watches and wristbands unprompted. But putting things like 'no earrings' in the prompt only made it add them more, and made the prompt longer.
It seems to only really grasp the first 75 tokens or so? One thing I've noticed is that while the training prompts are usually very long, in practice it often ignores prompt tokens towards the end of the prompt, and that moving them ahead in the prompt makes them more likely to be accepted.

Those are the ones on the top of my head, maybe you have others or fixes? I feel like there's so much hype around what it can do that there's not as much discussion on things it doesn't do or how to make certain things easier.

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here