StableDiffusion

98 readers
1 users here now

/r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and...

founded 1 year ago
MODERATORS
401
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/zazaoo19 on 2024-09-20 04:01:52+00:00.

402
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/theninjacongafas on 2024-09-20 11:38:06+00:00.

403
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/mrfofr on 2024-09-20 10:14:56+00:00.

404
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/R34vspec on 2024-09-20 05:47:21+00:00.

405
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/4-r-r-o-w on 2024-09-20 05:49:04+00:00.

406
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/dewarrn1 on 2024-09-20 02:44:52+00:00.

407
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/FoxBenedict on 2024-09-20 04:50:34+00:00.


An astonishing paper was released a couple of days ago showing a revolutionary new image generation paradigm. It's a multimodal model with a built in LLM and a vision model that gives you unbelievable control through prompting. You can give it an image of a subject and tell it to put that subject in a certain scene. You can do that with multiple subjects. No need to train a LoRA or any of that. You can prompt it to edit a part of an image, or to produce an image with the same pose as a reference image, without the need of a controlnet. The possibilities are so mind-boggling, I am, frankly, having a hard time believing that this could be possible.

They are planning to release the source code "soon". I simply cannot wait. This is on a completely different level from anything we've seen.

408
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/ZootAllures9111 on 2024-09-20 02:27:39+00:00.

Original Title: FYI if you're using something like JoyCaption to caption images: Kohya does not support actual newline characters between paragraphs, it stops parsing the file after the first one it hits, your caption text needs to be separated only by spaces between words (meaning just one long paragraph)


I noticed this was the case a while ago, figured I'd point it out. You can confirm it by comparing metadata in a Lora file to captions that had newlines, any text after one for a given image simply won't be present in that metadata.

409
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/flyingdickins on 2024-09-19 22:37:03+00:00.

410
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/an303042 on 2024-09-19 21:05:45+00:00.

411
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/371830 on 2024-09-19 17:53:26+00:00.


After using Flux for over a month now, I'm curious what's your combo for best image quality? As I started local image generation only last month (occasional MJ user before), it's pretty much constant learning. One of the things that took me time to realize is that not just selection of the model itself is important, but also all the other bits like clip, te, sampler etc. so I thought I'll share this, maybe other newbies find it useful.

Here is my current best quality setup (photorealistic). I have 24GB, but I think it will work on 16 GB vram.

  • flux1-dev-Q8_0.gguf

  • clip: ViT-L-14-TEXT-detail-improved-hiT-GmP-TE-only-HF.safetensors - until last week I didn't even know you can use different clips. This one made big difference for me and works better than ViT-L-14-BEST-smooth. Thanks u/zer0int1

  • te: t5-v1_1-xxl-encoder-Q8_0.gguf - not sure if it makes any difference vs t5xxl_fp8_e4m3fn.safetensors

  • vae: ae.safetensors - don't remember where I got this one from

  • sampling: Forge Flux Realistic - best results from few sampling methods I tested in forge

  • scheduler: simple

  • sampling steps: 20

  • DCFG 2-2.5 - with PAG below enabled it seems I can bump up DCFG higher before the skin starts to look unnatural

  • Perturbed Attention Guidance: 3 - this adds about 40% inference time, but I see clear improvement in prompt adherence and overall consistency so always keep it on. When going above 5 the images start looking unnatural.

  • Other optional settings in forge did not give me any convincing improvements, so don't use them.

412
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/simpleuserhere on 2024-09-19 15:30:21+00:00.

413
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/SmaugPool on 2024-09-19 14:10:52+00:00.

414
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Patient-Librarian-33 on 2024-09-19 12:20:16+00:00.

415
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/tintwotin on 2024-09-19 09:56:48+00:00.


Image to Video for CogVideoX-5b implemented in diffuserslib by zRdianjiao and Aryan V S has now been added to the free and open-source Blender VSE add-on: Pallaidium.

416
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/JBOOGZEE on 2024-09-19 08:56:14+00:00.

417
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/an303042 on 2024-09-19 08:15:52+00:00.

418
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/wonderflex on 2024-09-19 06:27:25+00:00.

419
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/zazaoo19 on 2024-09-19 03:36:44+00:00.

420
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Pultti4 on 2024-09-19 02:33:07+00:00.

421
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/EcoPeakPulse on 2024-09-19 02:27:02+00:00.

422
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Junior_Economics7502 on 2024-09-18 19:54:38+00:00.

423
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Angrypenguinpng on 2024-09-18 22:00:39+00:00.

424
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/ScarletEnthusiast on 2024-09-18 17:21:57+00:00.

425
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Old_Reach4779 on 2024-09-18 16:18:06+00:00.


Hugging face:

Hugging face space:

Github:

Comfyui node: (kijai just inserted i2v example workflow 😍)

License: Apache-2.0 license !

view more: ‹ prev next ›