StableDiffusion

98 readers
1 users here now

/r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and...

founded 1 year ago
MODERATORS
926
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/afinalsin on 2024-08-22 09:18:39+00:00.

927
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/MarcS- on 2024-08-21 22:04:09+00:00.


Hi everyone,

I've been running comparisons with standardized prompts of several new models. Usually, I focus on models I can run on my local machine (since I favor open software) but I decided I could use some free generations on ideogram to test their latest 2.0 model which they claim is better than Flux and Dall-E. I couldn't run all my library of prompt before running out of free credits but I hope the five prompts I tested will be of interest to you, before deciding if it's worth paying for subscription to their online generation service.

Prompt #1: the positional prompt, which you can compare to Flux and AuraFlow here :

"a blue cylinder in the center of the image, with a red sphere at the left, a green square at the right, a purple smiling sun on the top of the image and a severed foot at the bottom"

The idea here is to test if Ideogram 2.0 is SOTA in matter of adhering to a prompt with several items clearly positionned compared to each other.

Here are the four results I got, not cherrypicked:

It's very good. Arguably the smiling sun isn't on top all the time but it's at least in the top third of the image each time, so I say that it passes this test. AuraFlow did, as well, but it's the SOTA model for prompt adherence (version 0.2). Aesthetics are bad in both case, but I won't value aesthetics here as the result is pretty surrealist anyway. If we were to nitpick, I could say that the feet only look severed and not attached to the cylinder 3 times out of 4.

Prompt #2: A complex description.

Here I compared several models with the Shinto monk prompt.

"In the inner court of a grand Greek temple, majestic columns rise towards the sky, framing the scene with ancient elegance. At the center, a Shinto monk, dressed in traditional white and orange robes with intricate patterns, is levitating in the lotus position, floating serenely above a blazing fire. The flames dance and flicker, casting a warm, ethereal glow on the monk's peaceful expression. His hands are gently resting on his knees, with beads of a prayer necklace hanging loosely from his fingers. At the opposite end of the court, an anthropomorphical lion, regal and powerful, is bowing deeply. The lion, with a mane of golden fur and wearing an ornate, ceremonial chest plate, exudes a sense of reverence and respect. Its tail is curled gracefully around its body, and its eyes are closed in solemn devotion. Surrounding the court, ancient statues and carvings of Greek deities look down, their expressions solemn and timeless. The sky above is a serene blue, with the light of the setting sun casting long shadows and a warm, golden hue across the scene, highlighting the unique fusion of cultures and the mystical ambiance of the moment."

This prompt has 20 different items to rate, so I get a mark out of 20 and averaged the first 4 generations.

Misses "hands on knees", he doesn't hold the prayer beads in hands, the lion isn't anthropomorphic, not bowing particularly, mane isn't really fiery, his tail isn't curled around his body, admittedly his eyes are half-closed so I'll count it as right, no statues of greek gods, no serene blue sky. 12 out 20.

No lotus position, no prayer beads, not attached to hands, lion not anthropomorphic, mane doesn't seem golden either, tail not around body, that's a 14 (but the monk position is a big drawback).

Horrible monk... Misses the same as before, plus orange and white robe, intricate patterns, Demerit for the artifact monk... 11/20.

Misses the court of the temple (he's in front of a temple), misses the location of prayer bead necklace, anthropomorph lion, (fur admittedly golden here), tail curled around body, statues of greek gods. 15/20.

The average is 13/20. AuraFlow did 15/20. The prompt adherence is good, but not stellar. But out of a few generation, some can get quite close to the intended image.

Prompt #3: the pirate lady

A woman wearing 18th-century attire is positioned on all fours, facing the viewer, on a wooden table in a lively pirate tavern. She is dressed in a traditional colonial-style dress, with a corset bodice, lace-trimmed neckline, and flowing skirts. The fabric of her dress is rich and textured, featuring a deep burgundy color with intricate embroidery and gold accents. Her hair is styled in loose curls, cascading around her face, and she wears a tricorn hat adorned with feathers and ribbons.The tavern itself is bustling with activity. The background is filled with wooden beams, barrels, and rustic furniture, typical of a pirate tavern. The atmosphere is dimly lit by flickering lanterns and candles, casting warm, golden light throughout the room. Various pirates and patrons can be seen in the background, engaged in animated conversations, drinking from tankards, and playing cards. The woman's expression is confident and mischievous, her eyes meeting the viewer's gaze directly. Her posture, though unusual for the setting, conveys a sense of boldness and command. The table beneath her is cluttered with tankards, maps, and scattered coins, adding to the chaotic and adventurous ambiance of the pirate tavern.

Another scene that is very clearly depiected to reflect the image I have in mind. I won't count items, as the goal was to see if we could get a woman on all fours in a non-sexual context.

Ideogram fails, #3 is the best but she's at most leaning on the table, not on all fours on the table. Also, the table isn't cluttered with tankards, maps and coins. The model focussed on the 1girl, not the whole of the scene's composition. Flux did better, despite missing the kneeling on all four part of the lady as well.

Prompt #4; the submarine ruins

Compare here:

"Beneath the tranquil surface of a crystal-clear ocean, an ancient temple lies half-submerged, its majestic architecture eroded but still grand. The temple is a marvel, with columns covered in intricate carvings of sea creatures and mythical beings. Soft, blue light filters down from above, illuminating the scene with a serene glow. Merfolk, with their shimmering scales and flowing hair, glide gracefully around the temple, guarding its secrets. Giant kelp sway gently in the current, and schools of colorful fish dart through the water, adding vibrant splashes of color. An adventuring party, equipped with magical diving suits that emit a soft glow, explores the temple. They are fascinated by the glowing runes and ancient artifacts they find, evidence of a long-lost civilization. One member, a wizard, reaches out to touch a glowing orb, while another, a rogue, carefully inspects a mural depicting a great battle under the sea."

Actually, Ideogram did pretty good on this one, especially on the intricate carvings of sea creatures on the column, which are the most elaborate of any models I tried. On the other hands, it drops the ball mid-prompt, with a party of adventurer barely present, not interacting as they shouuld and lacking magical diving suits. It is however the prettiest set of images generated, so it has some quality.

And finally, a short prompt to let the magical prompt shine : "a breathtaking views of the Garden Dome, orbiting Uranus, with people taking a coffee break".

Not Uranus, no garden-y thing. The garden dome could be on an asteroid, so I won't count it against Ideogram.

Not very garden-y as well. Als...


Content cut off. Read original on https://old.reddit.com/r/StableDiffusion/comments/1ey2ffa/ideogram_20_prompt_adherence_and_aesthetics_test/

928
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/johnffreeman on 2024-08-21 17:46:26+00:00.


I've just heard that SD 3.1 is about to be released, with adjusted licensing. More information soon. We will see...

Edit: people asking for the source, this information is emailed to me by a Stability.ai employee I had contact with for some time.

Also noted, you don't have to downvote my post if you're done with Stability.ai, I'm just sharing some relevant SD related news. We know we love Flux but there are still other things happening.

929
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/CeFurkan on 2024-08-22 02:02:12+00:00.

Original Title: FLUX LoRA Rank 128 Kohya SS GUI Training: For 8-bit training Torch 2.4 upgrade dropped VRAM usage from 23 GB to 17 GB and for 16-bit training 44 GB to 27 GB - step speed improved from 8.4 second to 4.4 second (RTX A6000) - 1024x1024px - results unknown hopefully tomorrow

930
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Hopeful_Letterhead92 on 2024-08-21 18:15:10+00:00.

931
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/CeFurkan on 2024-08-22 01:06:44+00:00.

932
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/beighto on 2024-08-21 17:52:03+00:00.


Don't read the fix, skip to the edits below.

~~The problem originated in commit b09c24e when Illyasviel introduced the fp16_fix.~~ ~~You can fix the fix by editing the latest commit (31bed67 as of 8/21/24):~~

~~From backend/nn/flux.py remove lines:~~

~~from backend.utils import fp16_fix~~ ~~txt = fp16_fix(txt)~~ ~~x = fp16_fix(x)~~ ~~fp16_fix(x)~~

~~From backend/utils.py remove function block:~~

~~def fp16_fix(x):~~ ~~# An interesting trick to avoid fp16 overflow~~ ~~# Source: [~~issues/1114]~~()~~ ~~# Related: https://github.com/comfyanonymous/ComfyUI/blob/~~f1d6cef71c70719cc3ed45a2455a4e5ac910cd5e/comfy/ldm/flux/layers.py#L180~~ ~~if x.dtype == torch.float16: ~~ ~~return x.clip(-16384.0, 16384.0) ~~ ~~return x~~

~~That's it! I went from 36s/it @ 1024x768 to 13s/it with nf4, 14s/it with Q4 gguf, and 14s/it with Q8.~~ ~~Hopefully this will get removed or fixed in future releases to save us GPU poor folk.~~

~~I tried to find a fix for this in ComfyUI as well, but that one is broken from the start.~~

~~Edit: I'm having trouble recreating this from the latest commit. It might need the pip requirements from the aadc0f0 commit and upgrade from there. Has anybody else had any luck with this fix?~~

Edit2: Illyasviel has been busy today. It looks like he fixed the issue without removing the fp16_fix. Per commit notes:

change some dtype behaviors based on community feedbacks

only influence old devices like 1080/70/60/50. please remove cmd flags if you are on 1080/70/60/50 and previously used many cmd flags to tune performance

So take those flags off. I'm getting 20s/it now. Going to keep trying for that 14s/it again with the latest commit.

Edit 3: ComfyUI fixed theirs too! Per commit notes:

commit a60620dcea1302ef5c7f555e5e16f70b39c234ef (HEAD -> master, origin/master, origin/HEAD) Author: comfyanonymous comfyanonymous@protonmail.com Date: Wed Aug 21 16:38:26 2024 -0400

Fix slow performance on 10 series Nvidia GPUs.

commit 015f73dc4941ae6e01e01b934368f031c7fa8b8d Author: comfyanonymous comfyanonymous@protonmail.com Date: Wed Aug 21 16:17:15 2024 -0400

Try a different type of flux fp16 fix.

I'm getting 20s/it on Comfy too. What a day for updates!

933
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/20yroldentrepreneur on 2024-08-21 22:27:42+00:00.


Using Lora trained on my likeness:

2000 steps

10 self-captioned selfies, 5 full body shots

3 hours to train

FLUX is extremely good at prompt adherence and natural language prompting. We now live in a future where we never have to dress up for photoshoots again. RIP fashion photographers.

934
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/DumeSleigher on 2024-08-21 18:38:09+00:00.

935
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/EldrichArchive on 2024-08-21 18:11:34+00:00.

936
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Lumpy-Breakfast3295 on 2024-08-21 17:48:56+00:00.

937
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/CeraRalaz on 2024-08-21 17:33:35+00:00.

938
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/andreac75 on 2024-08-21 16:48:54+00:00.


I have started uploading Loras to Civitai again, hoping that what happened with SDXL will not happen again (I have put a specific disclaimer). I hope you like it.

939
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Total-Resort-3120 on 2024-08-21 16:13:41+00:00.

940
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/danamir_ on 2024-08-21 13:58:13+00:00.


Following the various tests on CFG & FLUX (like this one for example), I was wondering if I could use the same trick as in SDXL : switching settings mid-rendering in ComfyUI by passing the latent between two SamplerCustomAdvanced nodes.

The answer is a resounding yes. You can set the CFG at any value you want, limiting it to the few first step to harvest the benefit of a greater prompt adherence (and optionally use the negative prompt, to a certain extent) and only suffer the cost of double rendering time for those few steps.

single rendering vs. split rendering

Single rendering, CFG 1

Split rendering at 4 steps, CFG 2

The increased CFG adds details, but depending on the prompt can be too contrasted. This can be somewhat balanced by lowering the Guidance. You can push the CFG much higher, 4 and 5 can give interesting results.

The single rendering (72s) :

100%|█████| 18/18 [01:12<00:00,  4.02s/it]

Versus the split rendering (88s) :

100%|█████| 4/4   [00:32<00:00,  8.04s/it]
100%|█████| 14/14 [00:56<00:00,  4.04s/it]

Here is the full workflow : Danamir Flux v14.json

[edit] : A simplified version of the workflow, with two outputs to compare the two rendering method : Flux Danamir Split v15.json

[edit2] : It has been brought to my attention that may do just that without the need of a full set of nodes, to be tested.

Note that this workflow was used to test many things so you'll also find in it : every checkpoint, unet, and CLIP loaders (including GGUF & NF4), an upscale pass (optionally tiled), a second pass with an SDXL model at base level or upscale, detailers both for FLUX and SDXL, supporting any detectors.

941
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/CeFurkan on 2024-08-21 16:20:12+00:00.

942
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/zeekwithz on 2024-08-21 13:42:48+00:00.

943
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Philosopher_Jazzlike on 2024-08-21 12:56:26+00:00.

944
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/askin-gm on 2024-08-21 11:17:36+00:00.

945
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/MewnCat on 2024-08-21 11:06:50+00:00.

946
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/FinetunersAI on 2024-08-21 10:45:34+00:00.

947
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/lkewis on 2024-08-21 07:56:16+00:00.

948
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/dal_mac on 2024-08-21 05:48:11+00:00.


I spent a couple days trying to adapt my upscale workflows to flux. Hi-res fix (img2img), Ultimate SD upscale (tiled), Supir, etc. All were giving major problems or were far too difficult to get looking right (for example, tiled upscaling causes hallucinations and seams above ~0.35 denoise, while any lower than ~0.3 creates artifacts and/or reduces detail). I couldn't find any combo of parameters that gave me results like XL.

Then I realized.. why do I need to upscale? FLUX can generate my target resolution natively from the first pass.... and with very little detriment to composition.

Now I'm generating directly at ~1600px+. Quality and composition don't suffer at all, and outputs are incredibly clear and detailed (I'm a stickler for fine detail). Then it only needs a pass with a GAN to reach 4k+.

Not only is it saving time over my old workflows, but looks even better as well. It also takes no more vram than 1024px (dont ask me how). Only downside being that you need to wait a lot longer for generation previews to show up before you can decide if you want to cancel and do another.

I can't post comparisons since changing size completely changes the image. But it takes no effort to try it real quick if you are struggling to get high res. Hope this helps someone!

949
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/lordoflaziness on 2024-08-21 05:21:10+00:00.


It’s outttt let’s see how much ram we will need and also Flux Face ID w3n?!

950
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/digitaljohn on 2024-08-21 01:01:24+00:00.

view more: ‹ prev next ›