StableDiffusion

926

1

Using LLM refusals as Flux prompts (imgur.com)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/afinalsin on 2024-08-22 09:18:39+00:00.

927

1

Ideogram 2.0 prompt adherence and aesthetics test (old.reddit.com)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/MarcS- on 2024-08-21 22:04:09+00:00.

Hi everyone,

I've been running comparisons with standardized prompts of several new models. Usually, I focus on models I can run on my local machine (since I favor open software) but I decided I could use some free generations on ideogram to test their latest 2.0 model which they claim is better than Flux and Dall-E. I couldn't run all my library of prompt before running out of free credits but I hope the five prompts I tested will be of interest to you, before deciding if it's worth paying for subscription to their online generation service.

Prompt #1: the positional prompt, which you can compare to Flux and AuraFlow here :

"a blue cylinder in the center of the image, with a red sphere at the left, a green square at the right, a purple smiling sun on the top of the image and a severed foot at the bottom"

The idea here is to test if Ideogram 2.0 is SOTA in matter of adhering to a prompt with several items clearly positionned compared to each other.

Here are the four results I got, not cherrypicked:

It's very good. Arguably the smiling sun isn't on top all the time but it's at least in the top third of the image each time, so I say that it passes this test. AuraFlow did, as well, but it's the SOTA model for prompt adherence (version 0.2). Aesthetics are bad in both case, but I won't value aesthetics here as the result is pretty surrealist anyway. If we were to nitpick, I could say that the feet only look severed and not attached to the cylinder 3 times out of 4.

Prompt #2: A complex description.

Here I compared several models with the Shinto monk prompt.

"In the inner court of a grand Greek temple, majestic columns rise towards the sky, framing the scene with ancient elegance. At the center, a Shinto monk, dressed in traditional white and orange robes with intricate patterns, is levitating in the lotus position, floating serenely above a blazing fire. The flames dance and flicker, casting a warm, ethereal glow on the monk's peaceful expression. His hands are gently resting on his knees, with beads of a prayer necklace hanging loosely from his fingers. At the opposite end of the court, an anthropomorphical lion, regal and powerful, is bowing deeply. The lion, with a mane of golden fur and wearing an ornate, ceremonial chest plate, exudes a sense of reverence and respect. Its tail is curled gracefully around its body, and its eyes are closed in solemn devotion. Surrounding the court, ancient statues and carvings of Greek deities look down, their expressions solemn and timeless. The sky above is a serene blue, with the light of the setting sun casting long shadows and a warm, golden hue across the scene, highlighting the unique fusion of cultures and the mystical ambiance of the moment."

This prompt has 20 different items to rate, so I get a mark out of 20 and averaged the first 4 generations.

Misses "hands on knees", he doesn't hold the prayer beads in hands, the lion isn't anthropomorphic, not bowing particularly, mane isn't really fiery, his tail isn't curled around his body, admittedly his eyes are half-closed so I'll count it as right, no statues of greek gods, no serene blue sky. 12 out 20.

No lotus position, no prayer beads, not attached to hands, lion not anthropomorphic, mane doesn't seem golden either, tail not around body, that's a 14 (but the monk position is a big drawback).

Horrible monk... Misses the same as before, plus orange and white robe, intricate patterns, Demerit for the artifact monk... 11/20.

Misses the court of the temple (he's in front of a temple), misses the location of prayer bead necklace, anthropomorph lion, (fur admittedly golden here), tail curled around body, statues of greek gods. 15/20.

The average is 13/20. AuraFlow did 15/20. The prompt adherence is good, but not stellar. But out of a few generation, some can get quite close to the intended image.

Prompt #3: the pirate lady

A woman wearing 18th-century attire is positioned on all fours, facing the viewer, on a wooden table in a lively pirate tavern. She is dressed in a traditional colonial-style dress, with a corset bodice, lace-trimmed neckline, and flowing skirts. The fabric of her dress is rich and textured, featuring a deep burgundy color with intricate embroidery and gold accents. Her hair is styled in loose curls, cascading around her face, and she wears a tricorn hat adorned with feathers and ribbons.The tavern itself is bustling with activity. The background is filled with wooden beams, barrels, and rustic furniture, typical of a pirate tavern. The atmosphere is dimly lit by flickering lanterns and candles, casting warm, golden light throughout the room. Various pirates and patrons can be seen in the background, engaged in animated conversations, drinking from tankards, and playing cards. The woman's expression is confident and mischievous, her eyes meeting the viewer's gaze directly. Her posture, though unusual for the setting, conveys a sense of boldness and command. The table beneath her is cluttered with tankards, maps, and scattered coins, adding to the chaotic and adventurous ambiance of the pirate tavern.

Another scene that is very clearly depiected to reflect the image I have in mind. I won't count items, as the goal was to see if we could get a woman on all fours in a non-sexual context.

Ideogram fails, #3 is the best but she's at most leaning on the table, not on all fours on the table. Also, the table isn't cluttered with tankards, maps and coins. The model focussed on the 1girl, not the whole of the scene's composition. Flux did better, despite missing the kneeling on all four part of the lady as well.

Prompt #4; the submarine ruins

Compare here:

"Beneath the tranquil surface of a crystal-clear ocean, an ancient temple lies half-submerged, its majestic architecture eroded but still grand. The temple is a marvel, with columns covered in intricate carvings of sea creatures and mythical beings. Soft, blue light filters down from above, illuminating the scene with a serene glow. Merfolk, with their shimmering scales and flowing hair, glide gracefully around the temple, guarding its secrets. Giant kelp sway gently in the current, and schools of colorful fish dart through the water, adding vibrant splashes of color. An adventuring party, equipped with magical diving suits that emit a soft glow, explores the temple. They are fascinated by the glowing runes and ancient artifacts they find, evidence of a long-lost civilization. One member, a wizard, reaches out to touch a glowing orb, while another, a rogue, carefully inspects a mural depicting a great battle under the sea."

Actually, Ideogram did pretty good on this one, especially on the intricate carvings of sea creatures on the column, which are the most elaborate of any models I tried. On the other hands, it drops the ball mid-prompt, with a party of adventurer barely present, not interacting as they shouuld and lacking magical diving suits. It is however the prettiest set of images generated, so it has some quality.

And finally, a short prompt to let the magical prompt shine : "a breathtaking views of the Garden Dome, orbiting Uranus, with people taking a coffee break".

Not Uranus, no garden-y thing. The garden dome could be on an asteroid, so I won't count it against Ideogram.

Not very garden-y as well. Als...

Content cut off. Read original on https://old.reddit.com/r/StableDiffusion/comments/1ey2ffa/ideogram_20_prompt_adherence_and_aesthetics_test/

928

1

SD 3.1 is coming (old.reddit.com)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/johnffreeman on 2024-08-21 17:46:26+00:00.

I've just heard that SD 3.1 is about to be released, with adjusted licensing. More information soon. We will see...

Edit: people asking for the source, this information is emailed to me by a Stability.ai employee I had contact with for some time.

Also noted, you don't have to downvote my post if you're done with Stability.ai, I'm just sharing some relevant SD related news. We know we love Flux but there are still other things happening.

929

1

FLUX LoRA Rank 128 Kohya SS GUI Training: For 8-bit training Torch 2.4 upgrade dropped VRAM usage from 23 GB to 17 GB and for 16-bit training 44 GB to 27 GB - step speed improved from 8.4 second t... (i.redd.it)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/CeFurkan on 2024-08-22 02:02:12+00:00.

Original Title: FLUX LoRA Rank 128 Kohya SS GUI Training: For 8-bit training Torch 2.4 upgrade dropped VRAM usage from 23 GB to 17 GB and for 16-bit training 44 GB to 27 GB - step speed improved from 8.4 second to 4.4 second (RTX A6000) - 1024x1024px - results unknown hopefully tomorrow

930

1

Flux generation of “PlayStation 1 for sale on display in the year 2000” (i.redd.it)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Hopeful_Letterhead92 on 2024-08-21 18:15:10+00:00.

931

1

Kohya SS GUI FLUX LoRA Training on RTX 3060 - LoRA Rank 128 - uses 9.7 GB VRAM - Finally made it work. Results will be hopefully tomorrow training at the moment :) (old.reddit.com)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/CeFurkan on 2024-08-22 01:06:44+00:00.

932

1

Forge fix for Nvidia 10XX GPUs - 2x faster generations (old.reddit.com)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/beighto on 2024-08-21 17:52:03+00:00.

Don't read the fix, skip to the edits below.

~~The problem originated in commit b09c24e when Illyasviel introduced the fp16_fix.~~ ~~You can fix the fix by editing the latest commit (31bed67 as of 8/21/24):~~

~~From backend/nn/flux.py remove lines:~~

~~from backend.utils import fp16_fix~~ ~~txt = fp16_fix(txt)~~ ~~x = fp16_fix(x)~~ ~~fp16_fix(x)~~

~~From backend/utils.py remove function block:~~

~~def fp16_fix(x):~~ ~~# An interesting trick to avoid fp16 overflow~~ ~~# Source: [~~issues/1114]~~()~~ ~~# Related: https://github.com/comfyanonymous/ComfyUI/blob/~~f1d6cef71c70719cc3ed45a2455a4e5ac910cd5e/comfy/ldm/flux/layers.py#L180~~ ~~if x.dtype == torch.float16: ~~ ~~return x.clip(-16384.0, 16384.0) ~~ ~~return x~~

~~That's it! I went from 36s/it @ 1024x768 to 13s/it with nf4, 14s/it with Q4 gguf, and 14s/it with Q8.~~ ~~Hopefully this will get removed or fixed in future releases to save us GPU poor folk.~~

~~I tried to find a fix for this in ComfyUI as well, but that one is broken from the start.~~

~~Edit: I'm having trouble recreating this from the latest commit. It might need the pip requirements from the aadc0f0 commit and upgrade from there. Has anybody else had any luck with this fix?~~

Edit2: Illyasviel has been busy today. It looks like he fixed the issue without removing the fp16_fix. Per commit notes:

change some dtype behaviors based on community feedbacks

only influence old devices like 1080/70/60/50. please remove cmd flags if you are on 1080/70/60/50 and previously used many cmd flags to tune performance

So take those flags off. I'm getting 20s/it now. Going to keep trying for that 14s/it again with the latest commit.

Edit 3: ComfyUI fixed theirs too! Per commit notes:

commit a60620dcea1302ef5c7f555e5e16f70b39c234ef (HEAD -> master, origin/master, origin/HEAD) Author: comfyanonymous comfyanonymous@protonmail.com Date: Wed Aug 21 16:38:26 2024 -0400

Fix slow performance on 10 series Nvidia GPUs.

commit 015f73dc4941ae6e01e01b934368f031c7fa8b8d Author: comfyanonymous comfyanonymous@protonmail.com Date: Wed Aug 21 16:17:15 2024 -0400

Try a different type of flux fp16 fix.

I'm getting 20s/it on Comfy too. What a day for updates!

933

1

I tried my likeness into the newest image AI model FLUX and the results were unreal (extremely real)! (old.reddit.com)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/20yroldentrepreneur on 2024-08-21 22:27:42+00:00.

Using Lora trained on my likeness:

2000 steps

10 self-captioned selfies, 5 full body shots

3 hours to train

FLUX is extremely good at prompt adherence and natural language prompting. We now live in a future where we never have to dress up for photoshoots again. RIP fashion photographers.

934

1

Teaching my kid to use stablediffusion and Krita to create fanart of shows we watch (final in comments) (i.redd.it)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/DumeSleigher on 2024-08-21 18:38:09+00:00.

935

1

Some pics generated with the new Ideogram Model 2.0 (www.reddit.com)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/EldrichArchive on 2024-08-21 18:11:34+00:00.

936

1

Flux LoRA - Dark Fantasy 80s aesthetics (popular on Tiktok) (www.reddit.com)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Lumpy-Breakfast3295 on 2024-08-21 17:48:56+00:00.

937

1

Thats real (i.redd.it)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/CeraRalaz on 2024-08-21 17:33:35+00:00.

938

1

Flux - Vintage Anime Lora (old.reddit.com)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/andreac75 on 2024-08-21 16:48:54+00:00.

I have started uploading Loras to Civitai again, hoping that what happened with SDXL will not happen again (I have put a specific disclaimer). I hope you like it.

939

1

Three methods to run Flux at CFG > 1 (i.redd.it)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Total-Resort-3120 on 2024-08-21 16:13:41+00:00.

940

1

Using split rendering in FLUX to allow CFG setting at a lower cost (old.reddit.com)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/danamir_ on 2024-08-21 13:58:13+00:00.

Following the various tests on CFG & FLUX (like this one for example), I was wondering if I could use the same trick as in SDXL : switching settings mid-rendering in ComfyUI by passing the latent between two SamplerCustomAdvanced nodes.

The answer is a resounding yes. You can set the CFG at any value you want, limiting it to the few first step to harvest the benefit of a greater prompt adherence (and optionally use the negative prompt, to a certain extent) and only suffer the cost of double rendering time for those few steps.

single rendering vs. split rendering

Single rendering, CFG 1

Split rendering at 4 steps, CFG 2

The increased CFG adds details, but depending on the prompt can be too contrasted. This can be somewhat balanced by lowering the Guidance. You can push the CFG much higher, 4 and 5 can give interesting results.

The single rendering (72s) :

100%|█████| 18/18 [01:12&lt;00:00,  4.02s/it]

Versus the split rendering (88s) :

100%|█████| 4/4   [00:32&lt;00:00,  8.04s/it]
100%|█████| 14/14 [00:56&lt;00:00,  4.04s/it]

Here is the full workflow : Danamir Flux v14.json

[edit] : A simplified version of the workflow, with two outputs to compare the two rendering method : Flux Danamir Split v15.json

[edit2] : It has been brought to my attention that may do just that without the need of a full set of nodes, to be tested.

Note that this workflow was used to test many things so you'll also find in it : every checkpoint, unet, and CLIP loaders (including GGUF & NF4), an upscale pass (optionally tiled), a second pass with an SDXL model at base level or upscale, detailers both for FLUX and SDXL, supporting any detectors.

941

1

FLUX Huge Sampler + Scheduler Test for a very hard prompt (i.redd.it)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/CeFurkan on 2024-08-21 16:20:12+00:00.

942

1

Flux for Product Images for my furniture store (First Image is my Input) (www.reddit.com)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/zeekwithz on 2024-08-21 13:42:48+00:00.

943

1

ARob - A robot/cyborg LoRA (FLUX) (www.reddit.com)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Philosopher_Jazzlike on 2024-08-21 12:56:26+00:00.

944

1

Text Logos - All Prompts Included (i.redd.it)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/askin-gm on 2024-08-21 11:17:36+00:00.

945

1

Meaty Fruit (MEAT 🥩 | FLUX LoRA) (www.reddit.com)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/MewnCat on 2024-08-21 11:06:50+00:00.

946

1

Making a good model great. Link in the comments (i.redd.it)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/FinetunersAI on 2024-08-21 10:45:34+00:00.

947

1

Flux - Alien Set Design LoRA (www.reddit.com)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/lkewis on 2024-08-21 07:56:16+00:00.

948

1

You don't need to upscale FLUX (old.reddit.com)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/dal_mac on 2024-08-21 05:48:11+00:00.

I spent a couple days trying to adapt my upscale workflows to flux. Hi-res fix (img2img), Ultimate SD upscale (tiled), Supir, etc. All were giving major problems or were far too difficult to get looking right (for example, tiled upscaling causes hallucinations and seams above ~0.35 denoise, while any lower than ~0.3 creates artifacts and/or reduces detail). I couldn't find any combo of parameters that gave me results like XL.

Then I realized.. why do I need to upscale? FLUX can generate my target resolution natively from the first pass.... and with very little detriment to composition.

Now I'm generating directly at ~1600px+. Quality and composition don't suffer at all, and outputs are incredibly clear and detailed (I'm a stickler for fine detail). Then it only needs a pass with a GAN to reach 4k+.

Not only is it saving time over my old workflows, but looks even better as well. It also takes no more vram than 1024px (dont ask me how). Only downside being that you need to wait a lot longer for generation previews to show up before you can decide if you want to cancel and do another.

I can't post comparisons since changing size completely changes the image. But it takes no effort to try it real quick if you are struggling to get high res. Hope this helps someone!

949

1

Flux Ipadapter by x labs (old.reddit.com)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/lordoflaziness on 2024-08-21 05:21:10+00:00.

It’s outttt let’s see how much ram we will need and also Flux Face ID w3n?!

950

1

Busy day at the Flux Hotel (www.reddit.com)

submitted 2 months ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/digitaljohn on 2024-08-21 01:01:24+00:00.