StableDiffusion

98 readers
1 users here now

/r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and...

founded 1 year ago
MODERATORS
776
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/StrongClass86 on 2024-08-28 07:33:41+00:00.

777
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/nark0se on 2024-08-28 05:30:32+00:00.

778
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/x0rchid on 2024-08-28 05:29:17+00:00.

779
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Pure_Tomatillo1028 on 2024-08-28 02:59:34+00:00.

780
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Much_Can_4610 on 2024-08-28 01:01:59+00:00.

781
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/ThunderBR2 on 2024-08-28 02:07:27+00:00.

782
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/tomeks on 2024-08-27 21:58:03+00:00.

783
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/AWTom on 2024-08-28 00:00:05+00:00.

784
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/bub000 on 2024-08-27 22:11:43+00:00.

785
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/ThunderBR2 on 2024-08-27 20:50:34+00:00.

786
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/dr_lm on 2024-08-27 17:35:33+00:00.


TLDR: this produces a low-res grid of faces in one gen (for high consistency per seed), then crops each cell of the grid and upscales it into high-res gen to use in LoRA training.

Given we currently lack decent controlnets and IPAdapters for Flux, our only option for consistent characters between different seeds and prompts is to train a character LoRA.

Generating training images is a challenge. We want consistency in each training image, but changing seed will change the face. Rendering multiple faces on one seed is a good solution, but then the resolution is too low for training.

This workflow is in three parts:


1 Generate a grid of faces, which will be consistently of the same person. Do this at low-res.

Prompt:

Low-res pass:


2 Automatically crop out each face, put it into a list.


3 Render a high-res pass of each grid cell (face) using img2img.


The consistency is not perfect, even in the first pass, but it's an improvement on what I can get by rendering faces with the same prompt over different seeds. A LoRA trained on these images will learn the average of them all, and I think the consistency is good enough for those purposes. The aim isn't to remove all differences between images, but to have a good degree of control over how the character looks in the final LoRA. The second-pass denoise is critical here.

Nothing fancy is happening here, instead the focus is on automation. Once you have a seed that a) makes a nice tidy grid, and b) has a character look that you like, you can fix that seed and slightly alter the prompt (e.g. make their expression "angry" or "happy", change hairstyles or clothes) without losing much consistency.

Some notes/tips:

  • I've tried to keep the spaghetti within each stage, hopefully the connections and flow between stages are fairly clear. It should be modular, if you wanted to swap in or out particular parts.
  • I'm trying to use the xlabs canny controlnet (v2) to guide the shape of the grid. It doesn't work very well, YMMV, consider turning it off.
  • Try different seeds until you get a workable grid layout. Flux will sometimes create nice uniform grids, and sometimes a more random layout that won't crop nicely. In my experience, it's better to just find a seed that works rather than prompt-engineer your way to success.
  • Depending on the grid you get, you can select two ways of cropping each cell, using either a segm "person" model (whole body) or a bbox "face" model. You can connect whatever version works best for a particular seed on the workflow (there's a note indicating where).
  • You can set the grid width and height, but there's no guarantee Flux will obey it! It does seem to give the model something to work with, even if you choose not to use controlnet (as the grid is used as a 100% denoised latent for the initial generation).
  • The upscale pass should in theory give better consistency if done as a batch, rather than a list. However, you need the VRAM to handle this. On my 25gb 3090, I can do a 4x5 grid of 20 cells in a batch at 1024x1024 using a Q5 flux quant. If you run into VRAM issues, just use a list instead of a batch (there's a note in the workflow showing where).
  • Flux is weird with img2img compared to SD. It needs very high denoise values (>70%) to achieve what SDXL does at ~30%. As with any upscale, the denoise in the upscale pass is critical.
  • Because rendering 20 images in the upscale pass is slow, there's a node to select a subset of the images for testing.
  • I use higher steps for the first pass grid (32 seems to work well) than for the upscale (25). I think Flux needs more steps to converge on a tidy layout when making a complex grid. YMMV.
  • I use separate guidance (fake Flux cfg) for each pass. As with steps, higher is needed for the first pass.
  • There are separate text boxes for the grid layout itself, and for the character. The first pass concats both, the second pass only uses the character prompt. If you want a certain style (e.g. "photograph", "anime"), make sure it's at minimum in the character prompt, as otherwise Flux will decide on its own style in the second pass.
  • This workflow will work with SD1.5/SDXL models, but there we have far better options with controlnets for pose and grid layout, and IP adapter for consistency, so you can probably achieve more with a different approach. This is flux-specific so I can generate LoRA training images whilst I wait for all that stuff to become available.
787
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/PhilosopherOne5453 on 2024-08-27 13:58:04+00:00.


I am on 2x16gb 3600mhz which is 32gb... and i bought another 2x16gb 3600mhz same model, but didn't know that 4 stick of ram cannot be 3600mhz, so i lowered it to 3200mhz on 4stick of ram to avoid bluescreen.

what can you suggest guys.

-i will just file return the 2 stick of ram, so i have 2 stick 32gb ram and 3600mhz speed,

-or just use the 4 stick of ram, so i have 64gb but its 3200mhz speed now?

788
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/AtreveteTeTe on 2024-08-27 19:52:20+00:00.

Original Title: Hard Fork podcast guys asked for someone to make video of Abe Lincoln endorsing them. This is FLUX images > Runway Gen-3 video > face animation transfer with LivePortrait (voice via speech to speech with ElevenLabs, After Effects for phone comp). More info in comments!

789
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/ninjasaid13 on 2024-08-27 18:38:36+00:00.

790
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/tabula_rasa22 on 2024-08-27 17:58:55+00:00.

791
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/protector111 on 2024-08-27 15:32:26+00:00.

792
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Similar_Piano_963 on 2024-08-27 14:38:13+00:00.


This model looks pretty good in the demos. Sadly it's only t2v though. In my experience, ALL current video gen models are quite slot machine-y right now, so it would be great to be able to have it run i2v locally...

Possible for someone to turn this into an image to video model?

I'm no ML researcher, but maybe one could train an IP-Adapter type model to condition the beginning of the video? Maybe that's not feasible, I don't know. How cool would LORAs be for this too!?

Download CogVideoX 5b weights:

This 5B model runs on a 3060. And the previous 2B model is now Apache 2.0.

Hugging Space:

Paper:

Sources:

Gradio

Vaibhav (VB) Srivastav on X:

Adina Yakup on X:

Tiezhen WANG:

ChatGLM:

793
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/0xmgwr on 2024-08-27 12:18:30+00:00.

794
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/darkside1977 on 2024-08-27 10:52:55+00:00.

795
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/oksowhaat on 2024-08-27 09:23:16+00:00.

796
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/JimmyCallMe on 2024-08-27 03:19:56+00:00.

797
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/zoyer2 on 2024-08-26 22:14:38+00:00.

798
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/mardy_grass on 2024-08-27 07:14:46+00:00.

799
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/renderartist on 2024-08-27 05:34:29+00:00.

800
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Ok-Application-2261 on 2024-08-26 20:50:04+00:00.


Consider this basic prompt: a dystopian street in the heart of a grunge city with a woman in the far distance leaning with her back against the wall,

Now lets say i want to change specific characteristics of the woman so i change the prompt to: a dystopian street in the heart of a grunge city with a woman in the far distance leaning with her back against the wall, the woman has dark thick black hair,

Two things happen. She comes closer to the camera and the background gets the bokeh treatment. This effect becomes more pronounced the more we describe the woman.

a dystopian street in the heart of a grunge city with a woman in the far distance leaning with her back against the wall, the woman has dark thick black hair, her curly hair is tied in a ponytail,

And the concept of a dystopian street seems to melt away with the bokeh (no graffiti). Here's another example uninterrupted by text.

Same prompt but on the last image i prompted for combat fatigues instead of curly hair.

Effect is the same on SDXL Juggernaut:

So whats the solution? Prompting individual Unet Blocks. We just don't have it for Flux (that im aware of). But here's a demonstration on SDXL:

And for the next one we send all the descriptors+original prompt to Attention layer "output_1" and we get this:

Problem is pretty much solved. There is SOME bleeding of red into the walls but it can be managed.

Anyways overall i feel like Matteos U-net layer prompting node for comfyUI was the single most significant advancement since controlnets were introduced and i wonder if its possible for Flux. Here's the source of my information/workflow (Latent Vision):

This seems to work by keeping the descriptors like "Red dress" and "long black hair" away from input_8 which is a subject-related input that over-powers the output.

view more: ‹ prev next ›