this post was submitted on 24 Aug 2024
1 points (100.0% liked)

StableDiffusion

98 readers
1 users here now

/r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and...

founded 1 year ago
MODERATORS
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/ZTPlayz on 2024-08-23 21:32:51+00:00.


Overall I've been very impressed by the FLUX models, and I was testing something that was bothering me with SD models: prompt following.

So I came up with 10 short but hard prompts and it seems while dev generates better looking images, it doesn't follow the prompt as well as schnell.

Also tested other models in the process. I wanted to try Midjourney but didn't feel like paying for a subscription with flux existing. If anyone would like to do it, feel free to share :)

All images are 1024x1024.

Models used:

FLUX1 schnell fp8, 4 steps, no guidance control, euler, no negative prompt

FLUX1 dev fp8, 50 steps, no guidance control, euler, no negative prompt

Juggerrnaut X RunDiffusion, 20 steps, CFG 8, DPM++ SDE, no negative prompt

Unstable Diffuser v11 + RunDiffusion, 35 steps, CFG 4, DPM++ 3M SDE, no negative prompt

Pony Realism v2.1 Main + VAE, 30 steps, CFG 7, DPM++ SDE, positive: +"score_9, score_8_up, score_7_up" negative: "score_4, score_5, score_6, rating_explicit"

1/10 (I didn't make a grid for this one, mostly women in images)

DALL-E 3 (Bing Image Creator)

DALL-E 3 (gpt-4o), (User: 'prompt: "image prompt"' ), prompts all automatically modified by gpt-4o

Ideogram V2, default free settings, 1:1 ar, magic prompt on

Disclaimers:

  • I took the first image for every single generation, even if multiple were generated, so some luck is involved.
  • I used a very simple workflow for the models; no guidance with flux, no negative prompt, and I am aware that parameters can be adjusted to create better images that follow the prompt better.
  • Some tools (like gpt-4o dalle 3) automatically modify the prompt, which some might find unfair- but I disagree; it's automatic, I don't ask for it and it gives better result, so it's part of the generation process.
  • These are very simple and short prompts. I am aware that FLUX models like details and natural language, which is not the case here. However most of them should be easy to understand by a human, which is the ultimate goal of these AIs imo.
  • Following the prompt isn't all that matters: making a good image is important too, but this tests only tests how well the image follows the prompt, based on my view of the images, which is subjective.
  • This testing is actually biased against the FLUX1 dev model :), since I initially took 20 images and used the 10 where the model couldn't generate them. However I still think the schnell model follows prompts better (compared to dev, with my settings) based on other tests.
  • The 10 prompts that I used are generated by gpt-4o, so there could be bias there, where it generates images that it understands well.

This was to show that while FLUX is an amazing upgrade over what we had, and possibly the best model right now in some categories, it still has its limitations. It's still amazing ofc but we all know that.

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here