The prompt enhancer is based on THUDM's convert_demo.py but since that example only works through OpenAI API, I felt that there was a need for a local option.

The vision model glm-4v-9b has completely blown my mind and the fact that is runnable on consumer-grade GPUs is incredible.

Example workflows included in the repo.

Link to repo in comments.

Also available in ComfyUI-Manager.

339

1

the best flux quant for 12Gb (i.redd.it)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/sahil1572 on 2024-09-25 07:40:27+00:00.

340

1

Coloring Book Flux LoRA (www.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/renderartist on 2024-09-25 03:53:35+00:00.

341

1

The creator of Realistic Vision released a Flux model (www.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Sea-Resort730 on 2024-09-25 02:17:34+00:00.

342

1

Training Guide - Flux model training from just 1 image [Attention Masking] (old.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/mnemic2 on 2024-09-24 22:28:53+00:00.

I wrote an article over at CivitAI about it.

Her's a copy of the article in Reddit format. It doesn't contain all the images though.

Flux model training from just 1 image

They say that it's not the size of your dataset that matters. It's how you use it.

I have been doing some tests with single image (and few image) model trainings, and my conclusion is that this is a perfectly viable strategy depending on your needs.

A model trained on just one image may not be as strong as one trained on tens, hundreds or thousands, but perhaps it's all that you need.

What if you only have one good image of the model subject or style? This is another reason to train a model on just one image.

Single Image Datasets

The concept is simple. One image, one caption.

Since you only have one image, you may as well spend some time and effort to make the most out of what you have. So you should very carefully curate your caption.

What should this caption be? I still haven't cracked it, and I think Flux just gets whatever you throw at it. In the end I cannot tell you with absolute certainty what will work and what won't work.

Here are a few things you can consider when you are creating the caption:

Suggestions for a single image style dataset

Do you need a trigger word? For a style, you may want to do it just to have something to let the model recall the training. You may also want to avoid the trigger word and just trust the model to get it. For my style test, I did not use a trigger word.
Caption everything in the image.
Don't describe the style. At least, it's not necessary.
Consider using masked training (see Masked Training below).

Suggestions for a single image character dataset

Do you need a trigger word? For a character, I would always use a trigger word. This lets you control the character better if there are multiple characters.

For my character test, I did use a trigger word. I don't know how trainable different tokens are. I went with "GoWRAtreus" for my character test.

Caption everything in the image. I think Flux handles it perfectly as it is. You don't need to "trick" the model into learning what you want, like how we used to caption things for SD1.5 or SDXL (by captioning the things we wanted to be able to change after, and not mentioning what we wanted the model to memorize and never change, like if a character was always supposed to wear glasses, or always have the same hair color or style.
Consider using masked training (see Masked Training below).

Suggestions for a single image concept dataset

TBD. I'm not 100% sure that a concept would be easily taught in one image, that's something to test.

There's certainly more experimentation to do here. Different ranks, blocks, captioning methods.

If I were to guess, I think most combinations of things are going to produce good and viable results. Flux tends to just be okay with most things. It may be up to the complexity of what you need.

Masked training

This essentially means to train the image using either a transparent background, or a black/white image that acts as your mask. When using an image mask, the white parts will be trained on, and the black parts will not.

Note: I don't know how mask with grays, semi-transparent (gradients) works. If somebody knows, please add a comment below and I will update this.

What is it good for?

The benefits of training it this way is that we can focus on what we want to teach the model, and make it avoid learning things from the background, which we may not want.

If you instead were to cut out the subject of your training and put a white background behind it, the model will still learn from the white background, even if you caption it. And if you only have one image to train on, the model does so many repeats across this image that it will learn that a white background is really important. It's better that it never sees a white background in the first place

If you have a background behind your character, this means that your background should be trained on just as much as the character. It also means that you will see this background in all of your images. Even if you're training a style, this is not something you want. See images below.

Example without masking

I trained a model using only this image in my dataset.

The results can be found in this version of the model.

As we can see from these images, the model has learned the style and character design/style from our single image dataset amazingly! It can even do a nice bird in the style. Very impressive.

We can also unfortunately see that it's including that background, and a ton of small doll-like characters in the background. This wasn't desirable, but it was in the dataset. I don't blame the model for this.

Once again, with masking!

I did the same training again, but this time using a masked image:

It's the same image, but I removed the background in Photoshop. I did other minor touch-ups to remove some undesired noise from the image while I was in there.

The results can be found in this version of the model.

Now the model has learned the style equally well, but it never overtrained on the background, and it can therefore generalize better and create new backgrounds based on the art style of the character. Which is exactly what I wanted the model to learn.

The model shows signs of overfitting, but this is because I'm training for 2000 steps on a single image. That is bound to overfit.

How to create good masks

You can use something like Inspyrnet-Rembg.
You can also do it manually in Photoshop or Photopea. Just make sure to save it as a transparent PNG and use that.
Inspyrnet-Rembg is also avaialble as a ComfyUI node.

Where can you do masked training?

I used ComfyUI to train my model. I think I used this workflow from CivitAI user Tenofas.

Note the "alpha_mask" setting on the TrainDatasetGeneralConfig.

There are also other trainers that utilizes masked training. I know OneTrainer supports it, but I don't know if their Flux training is functional yet or if it supports alpha masking.

I believe it is coming in kohya_ss as well.

If you know of other training scripts that support it, please write below and I can update this information.

It would be great if the option would be added to the CivitAI onsite trainer as well. With this and some simple "rembg" integration, we could make it easier to create single/few-image models right here on CivitAI.

Example Datasets & Models from single image training

Kawaii Style - failed first attempt without masks

Unfortunately I didn't save the captions I trained the model on. But it was automatically generated and it used a trigger word.

I trained this version of the model on the Shakker onsite trainer. They had horrible default model settings and if you changed them, the model still trained on the default settings so the model is huge (trained on rank 64).

As I mentioned earlier, the model learned the art style and character design reasonably well. It did however pick up the details from the background, which was highly undesirable. It was either that, or have a simple/no background. Which is not great for an art style model.

Kawaii Style - Masked training

[An asian looking man with pointy ears and long gray hair standing. The man is holding his hands and palms together in front of him in a prayer like pose. The man has slightly wavy long gray hair, and a bun in the back. In his hair is a golden crown with two pieces sticking up above it. The man is wearing a large red ceremony robe with golden embroidery swirling patterns. Under the robe, the man is wearing a...

Content cut off. Read original on https://old.reddit.com/r/StableDiffusion/comments/1fop9gy/training_guide_flux_model_training_from_just_1/

343

1

PSA: You can get Kling/Runway quality video locally with cogvideo if you upscale the video and interpolate the frames (old.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Historical-Action-13 on 2024-09-24 19:32:06+00:00.

I don't know why nobody is talking about this. You can use video AI editing software locally without paying for failed attempts online and basically turn your video into studio quality. It uses AI to guess the missing frames ramping up from the 8 fps defaulted by Cogvideo.

You can also use ffmpeg to rip the frames and make the last frame of your refined output video the first frame of a new video, effectively given you unlimited video length as well.

Edit: here's a more detailed guide,

Either use flux to generate an image in the same aspect ratio as 720x480 or paste a real photo into MS paint and expand the boundaries of the image to that same aspect ratio. Don't stretch the image. Just leave a white space. You can keep the resolution low at 720x480 but if you are using flux I suggest making it higher to capture more detail during the initial generation. Cog will downscale it regardless so no need to make your original small.
Use the comfy workflow for cogvideo5b image to video. Keep your prompt simple. For example if your source photo is a paladin, then don't make the prompt "a paladin unsheathes his sword and slashes a rock" because it will probably fuck that up.

Limit it to one major action per video. For example just do "a paladin unsheathes his sword" and then you can take the last frame of the output video and make it the first frame of your input video for your next iteration. 3. Use ffmpeg to rip the frames. Pick the last GOOD frame (before shit goes acid trip) and delete the frames after this. 4. Load your last good frame into Cog again, and now use the prompt "a paladin slashes a rock with his sword" since in your starting frame the sword is already unsheathed this will be simpler for cog. 5. Use ffmpeg again and add to your collection of frames. Make sure you are using good sequential naming with ffmpeg so they don't get out of order, ask chatgpt if you need the syntax. 6. Repeat until your video is done, limiting one "action" per prompt/generation. 7. When you have all your frames in order, combine them with ffmpeg to make your infinitely long video. 8. Use Topaz Video (free demo with watermark) to upscale the video to 4k and use the interpolate feature to increase the fps from 8 to 30. I am a beginner with Topaz so I can't help you much at this step because I am still learning. However youtube has great videos on tips for it, it's used by pro video editors.

One word of caution, just like with Cog its best with Topaz to do one "action at a time". For example first upscale, then interpolate. Though I think it can do both at the same time as well so you'll want to experiment with which models and methods are the best.

344

1

How2Draw FLUX LoRA (www.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Angrypenguinpng on 2024-09-24 19:20:47+00:00.

345

1

James Cameron, Academy Award-Winning Filmmaker, Joins Stability AI Board of Directors — Stability AI (stability.ai)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Nunki08 on 2024-09-24 14:27:59+00:00.

346

1

Invoke 5.0 — Massive Update introducing a new Canvas with Layers & Flux Support (old.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/hipster_username on 2024-09-24 13:20:16+00:00.

347

1

This week in SD - all the major developments in a nutshell (old.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/OkSpot3819 on 2024-09-24 09:15:18+00:00.

Interesting find of the week: Sougwen Chung, a Chinese-Canadian artist pioneering human-machine collaboration in art. Her "Assembly Lines" project features robotic art assistants that sync with her brainwaves to create paintings together.
Flux updates:
- GPU compatibility: Successful generation on AMD GPU (RX 6600 XT), overcoming compatibility issues using Zluda.
- CFG improvements: Support for negative prompting and values >1 without image degradation, based on PuLID team's work.
- Consistent character frames: Technique using Flux and ControlNet for generating multiple consistent frames.
- LoRA and DoRA training: Insights on training models using OneTrainer with Flux.1 architecture, including detailed configuration settings.
- ComfyUI Flux pipeline: Clean and organized workflow for Stable Diffusion image generation using Flux.
- Seamless outpainting: New workflow for precise background and human feature outpainting using Flux models in ComfyUI.
Lions Gate x Runway: Lionsgate partners with AI firm Runway to develop exclusive AI models based on its film and TV library, focusing on integrating AI into pre- and post-production workflows.
EA x AI: Electronic Arts positions AI as core to its business strategy, with over 100 AI projects in development across efficiency, expansion, and transformation areas.
Put This On Your Radar:
- Tripo 3D (Version 2.0): Text-to-3D model generation tool releases version 2.0 with significantly improved mesh quality.
- CogStudio: Advanced web interface for AI video generation based on the CogVideo model.
- OmniGen: New unified multimodal AI model combining text and image generation capabilities.
- Differential diffusion technique for AnimateDiff: Technique for creating more stable backgrounds in AI-generated videos.
- Pony and non-pony AI model merging technique: New method for merging specialized AI models to expand capabilities.
- Image and sound generation workflow: Workflow for generating both images and corresponding sound effects from a single prompt using Stable Diffusion and Stable Audio.
- CogVideoX-5B: Open-source image-to-video model weights released for generating short video clips from input images.
- CogVideoX-Fun: Open-source text/image/video-to-video model by Alibaba PAI with enhanced video generation capabilities.
- ComfyUI workflow for replacing video backgrounds with Flux model: Workflow demonstrating how to replace backgrounds in videos using the Flux model.
- Multi-face swap workflow for ComfyUI: Workflow for swapping multiple faces in a single image with customizable options.
- Audio reactive particle simulator in ComfyUI: Workflow demonstrating an audio-reactive particle simulation system for creating visually dynamic content.
- KLING 1.5: Update to KLING with motion control and general improvements.
Flux LoRA showcase: New FLUX LoRA models including Miniature People, Omegle Webcam, Gesture Drawing, Jigsaw, and SameFace Fix.