StableDiffusion

98 readers
1 users here now

/r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and...

founded 1 year ago
MODERATORS
626
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/piggledy on 2024-09-05 23:06:43+00:00.

627
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Much_Can_4610 on 2024-09-05 21:30:57+00:00.

628
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Secure-Message-8378 on 2024-09-05 20:58:44+00:00.

629
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/renderartist on 2024-09-05 20:26:45+00:00.

630
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Psi-Clone on 2024-09-05 19:51:13+00:00.

631
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/jackqack on 2024-09-05 16:53:59+00:00.

632
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/miaoshouai on 2024-09-05 16:14:37+00:00.


With the release of the FLUX model, the use of LLM becomes much more common because of the ability that the model can understand the natural language through the combination of T5 and CLIP_L model. However, most of the LLMs require large VRAM and the results it returns are not optimized for image prompting.

I recently trained PromptGen v1 and got a lot of great feedback from the community and I just released PromptGen v1.5 which is a major upgrade based on many of your feedbacks. In addition, version 1.5 is a model trained specifically to solve the issues I mentioned above in the era of Flux. PromptGen is trained based on Microsoft Florence2 base model, thus the model size is only 1G and can generate captions in light speed and uses much less VRAM.

PromptGen v1.5 can handle image caption in 5 different modes all under 1 model: danbooru style tags, one line image description, structured caption, detailed caption and mixed caption, each of which handles a specific scenario in doing prompting jobs. Below are some of the features of this model:

  • When using PromptGen, you won't get annoying text like"This image is about...", I know many of you tried hard in your LLM prompt to get rid of these words.

  • Caption the image in detail. The new version has greatly improved its capability of capturing details in the image and also the accuracy.

  • In LLM, it's hard to tell the model to name the positions of each subject in the image. The structured caption mode really helps to tell these position information in the image. eg, it will tell you: a person is on the left side of the image or right side of the image. This mode also reads the text from the image, which can be super useful if you want to recreate a scene.

  • Memory efficient compared to other models! This is a really light weight caption model as I mentioned above, and its quality is really good. This is a comparison of using PromptGen vs. Joy Caption, where PromptGen even captures the facial expression for the character to look down and camera angle for shooting from side.

  • V1.5 is designed to handle image captions for the Flux model for both T5XXL CLIP and CLIP_L. ComfyUI-Miaoshouai-Tagger is the ComfyUI custom node created for people to use this model more easily. Inside Miaoshou Tagger v1.1, there is a new node called "Flux CLIP Text Encode" which eliminates the need to run two separate tagger tools for caption creation under the "mixed" mode. You can easily populate both CLIPs in a single generation, significantly boosting speed when working with Flux models. Also, this node comes with an empty condition output so that there is no more need for you to grab another empty TEXT CLIP just for the negative prompt in Ksampler for FLUX.

So, please give the new version a try, I'm looking forward to getting your feedback and working more on the model.

Huggingface Page:

Github Page for ComfyUI MiaoshouAI Tagger:

633
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/RunDiffusion on 2024-09-05 15:34:34+00:00.

634
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Chubby_Pig on 2024-09-05 15:09:53+00:00.

635
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/AGillySuit on 2024-09-05 14:46:09+00:00.

636
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/ZyloO_AI on 2024-09-05 14:02:21+00:00.

637
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Inner-Reflections on 2024-09-05 12:22:11+00:00.

638
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Sharlinator on 2024-09-05 07:41:14+00:00.

639
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Won3wan32 on 2024-09-05 05:31:22+00:00.


TL;DR: we propose an end-to-end audio-only conditioned video diffusion model named Loopy. Specifically, we designed an inter- and intra-clip temporal module and an audio-to-latents module, enabling the model to leverage long-term motion information from the data to learn natural motion patterns and improving audio-portrait movement correlation. This method removes the need for manually specified spatial motion templates used in existing methods to constrain motion during inference, delivering more lifelike and high-quality results across various scenarios.

640
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/OkSpot3819 on 2024-09-05 08:43:45+00:00.


  • Stability AI’s text-to-image models (Stable Image Ultra, Stable Diffusion 3 Large and Stable Image Core) are now live in Amazon Bedrock (BLOG)
  • MiniMax: NEW Chinese text2video model (), they also do free music generation (https://hailuoai.com/music)
  • LumaLabsAI released V 6.1 of Dream Machine which now features camera controls
  • RB-Modulation (IP-Adapter alternative by Google): training-free personalization of diffusion models using stochastic optimal control (HUGGING FACE DEMO)
  • New ChatGPT Voices: Fathom, Glimmer, Harp, Maple, Orbit, Rainbow (1, 2 and 3 - not working yet), Reef, Ridge and Vale (X Video Preview)
  • Text-Guided-Image-Colorization: influence the colorisation of objects in your images using text prompts (uses SDXL and CLIP) (GITHUB)
  • Meta's Sapiens segmentation model is now available on Hugging Faces Spaces (HUGGING FACE DEMO)
  • FluxMusic: SOTA open-source text-to-music model (GITHUB | JUPYTER NOTEBOOK | PAPER)
  • SKYBOX AI: create 360° worlds with one image ()
  • P2P-Bridge: remove noise from 3D scans (GITHUB | PAPER)
  • HivisionIDPhoto: uses a set of models and workflows for portrait recognition, image cutout & ID photo generation (HUGGING FACE DEMO | GITHUB)
  • Anifusion.ai: create comic books using UI via web app ()
  • ComfyUI-AdvancedLivePortrait Update (GITHUB)
  • ComfyUI v0.2.0: support for Flux controlnets from Xlab and InstantX; improvement to queue management; node library enhancement; quality of life updates (BLOG POST)
  • A song made by SUNO breaks 100k views on Youtube (LINK)

These will all be covered in the weekly newsletter, check out the most recent issue.

Here are the updates from the previous week:

  • Joy Caption Update: Improved tool for generating natural language captions for images, including NSFW content. Significant speed improvements and ComfyUI integration.
  • FLUX Training Insights: New article suggests FLUX can understand more complex concepts than previously thought. Minimal captions and abstract prompts can lead to better results.
  • Realism Techniques: Tips for generating more realistic images using FLUX, including deliberately lowering image quality in prompts and reducing guidance scale.
  • LoRA Training for Logos: Discussion on training LoRAs of company logos using FLUX, with insights on dataset size and training parameters.

⚓ Links, context, visuals for the section above ⚓

  • FluxForge v0.1: New tool for searching FLUX LoRA models across Civitai and Hugging Face repositories, updated every 2 hours.
  • Juggernaut XI: Enhanced SDXL model with improved prompt adherence and expanded dataset.
  • FLUX.1 ai-toolkit UI on Gradio: User interface for FLUX with drag-and-drop functionality and AI captioning.
  • Kolors Virtual Try-On App UI on Gradio: Demo for virtual clothing try-on application.
  • CogVideoX-5B: Open-weights text-to-video generation model capable of creating 6-second videos.
  • Melyn's 3D Render SDXL LoRA: LoRA model for Stable Diffusion XL trained on personal 3D renders.
  • sd-ppp Photoshop Extension: Brings regional prompt support for ComfyUI to Photoshop.
  • GenWarp: AI model that generates new viewpoints of a scene from a single input image.
  • Flux Latent Detailer Workflow: Experimental ComfyUI workflow for enhancing fine details in images using latent interpolation.

⚓ Links, context, visuals for the section above ⚓

Want updates emailed to you weekly? Subscribe.

641
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Beamher on 2024-09-05 08:09:06+00:00.

642
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/74185296op on 2024-09-05 07:21:09+00:00.

643
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Psi-Clone on 2024-09-05 05:34:34+00:00.

644
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/dipray55 on 2024-09-04 21:34:47+00:00.

645
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/MooseBoys on 2024-09-04 22:34:12+00:00.

646
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/rvitor on 2024-09-04 22:02:26+00:00.


I'm not very good at posting news, but since nobody shared yet.

They are releasing 3 models, on amazon bedrock, no news about weights yet.

the models are: Stable Image Ultra, Stable Diffusion 3 Large and Stable Image Core

  1. Stable Image Ultra: Photorealistic, Large-Scale Output

    • Ideal For: Ultra-realistic imagery for luxury brands and high-end campaigns.
    • Use Case Example: A luxury brand uses Stable Image Ultra to create stunning visuals of its latest collection for magazine spreads, ensuring a premium feel that matches its high standards.
  2. Stable Diffusion 3 Large: High-Quality, High-Quantity Creative Assets

    • Ideal For: High-volume outputs like marketing campaigns and digital assets.
    • Use Case Example: A game development team uses SD3 Large to create detailed environmental textures and character concepts, accelerating their creative pipeline.
  3. Stable Image Core: Fast and Affordable

    • Ideal For: Rapid content generation at scale. Optimized for speedy image generation.
    • Use Case Example: An online retailer uses Stable Image Core to quickly generate product images for new arrivals, allowing it to list items faster and keep its catalog up-to-date.

More on:

Hope to see this model weights any time.

I'm enjoying Flux alot thease days, and I really don't know what to expect from this model, I'm a little frustrated with the latest news I've had about stability, but I still recognize them for being one of those who started the movement.

647
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Historical-Action-13 on 2024-09-04 21:40:08+00:00.


Early on, the possibility of model checkpoints (ckpt files) was discussed as a possible attack vector via "pickling".

This was quickly mitigated by moving to a safetensors format, and everyone now understands and abides by this best practice.

There's another elephant in the room.... the fact that we are downloading all kinds of new repos from unknown sources, and they frequently download dependencies both during install and some require internet access every time they are ran... either for some cloud service the author implemented with little thought, or to check for updates or download dependencies.

As an example, Auto1111 doesn't require internet after install, but it does every time you try to use a new controlnet or upscale you haven't used before.

This isn't suspicious and I trust Auto, but it becomes a security risk when you potentially have untrusted add-ons running alongside it in the same environment.

It should be standard practice for all repos to have an easy option to download all possible dependencies during the initial install so that they can then be walled off.

If developers want to implement cloud based tools with their repo that's fine, but it should be obtaining user consent before doing so, and it shouldn't cause a panic and break the software if there's no connection.

Furthermore, if for some reason a dependency wasn't downloaded and is now needed, the software should not panic and break because it couldn't be downloaded. Rather it should provide the user clear instructions on what model or dependency is needed, and what folder to place the file in so that it can be accessed locally.

Basically, just like we all expect safetensors and use pickles with caution, we should all expect offline installers to be available for all repos, and use ones that are not with caution.

Is this a big deal right now? Not really, and I personally am not that concerned about it yet.

I assume all the big name devs are honest and have no bad intentions, I'm more concerned about nodes, extensions, and standalone new repos.

As far as motive, you have the artists that hate AI who might maliciously bundle malware with new repos. You also have the potential for credential stealing malware or any other number of threats present in downloading any new experimental code or software.

Open source isn't a protection if nobody has had time to proofread the code yet.

We should be able to run these repos with a complete offline installer and be able to block their network access after without breaking them.

I want to emphasize I'm not currently accusing any devs of embedding malware, I just want us to get ahead of the potential threat the same way we did pickle files.

648
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/tom83_be on 2024-09-04 20:32:49+00:00.


So you got no answer from the OneTrainer team on documentation? You do not want to join any discord channels so someone maybe answers a basic setup question? You do not want to get a HF key and want to download model files for OneTrainer Flux training locally? Look no further, here is the answer:

  • Go to
  • download everything from there including all subfolders; rename files so they exactly resemble what they are named on huggingface (some file names are changed when downloaded) and so they reside in the exact same folders
    • Note: I think you can ommit all files on the main directory, especially the big flux1-dev.safetensors; the only file I think is necessary from the main directory is model_index.json as it points to all the subdirs (which you need)
  • install and startup the most recent version of OneTrainer =>
  • choose "FluxDev" and "LoRA" in the dropdowns to the upper right
  • go to the "model"-tab and to "base model"
  • point to the directory where all the files and subdirectories you downloaded are located; example:
    • I downloaded everything to ...whateveryouPathIs.../FLUX.1-dev/
    • so ...whateveryouPathIs.../FLUX.1-dev/ holds the model_index.json and the subdirs (scheduler, text_encoder, text_encoder_2, tokenizer, tokenizer_2, transformer, vae) including all files inside of them
    • hence I point to ..whateveryouPathIs.../FLUX.1-dev in the base model entry in the "model"-tab
  • use your other settings and start training

At least I got it to load the model this way. I chose weight data type nfloat4 and output data type bfloat16 for now; and Adafactor as the Optimizer. It trains with about 9,5 GB VRAM. I won't give a full turorial for all OneTrainer settings here, since I have to check it first, see results etc.

Just wanted to describe how to download the model and point to it, since this is described nowhere. Current info on Flux from OneTrainer is but at the time of writing this gives nearly no clue on how to even start training / loading the model...

PS: There probably is a way to use a HF-key or also to just git clone the HF-space. But I do not like to point to remote spaces when training locally nor do I want to get a HF key, if I can download things without it. So there may be easier ways to do this, if you cave to that. I won't.

649
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/filouface12 on 2024-09-04 19:31:03+00:00.

650
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/StableLlama on 2024-09-04 16:45:15+00:00.


We all enjoy LoRAs, some are trained by our self but many are from well known sources. And usually people are just happy about them with little diverse feedback that gives a real measurement of the quality of a LoRA. But this quality is important for the user - and also for the creator to be able to see where improvement is necessary. So I think we need to make the quality measurable.

For that I created this little list that could create a 1-5 star rating.

It should do what it is advertised to do:

  • Does the output look like it should?
    • fail: +0
    • little resemblance: +1
    • identifiable: +2
    • good match: +3
    • perfect match: +4
  • How often does the output look like it should:
    • seldom (less than every 4th image): +0
    • sometimes (every 3rd or 4th image): +1
    • half of the time (every 2nd image): +2
    • most of the time (only every 3rd or 4th image is a fail): +3
    • nearly every time (at most every 4th image is a fail): +4

It should not do what it is not advertised to do (freedom from side effects):

Test setup: make up a prompt that will work with the LoRA, fix the seed to stay the same and create image A with just the base model (i.e. without the LoRA) and without the trigger word as a base, then do exactly the same with the LoRA loaded (still without the trigger word!) as image B and finally with the trigger word as image C

  • strong side effect: image B looks like image C and not like image A: +0
  • side effect: image B looks like a mixture of image C and image A: +1
  • little side effect: image B looks mostly like image A with little deviations, image C looks very different: +3
  • no side effect: image A and image B are (nearly) identical, image C looks completely different: +4

Note: This setup works for character and object LoRAs. A style LoRA is expected to be a side effect in the classical sense, so often it doesn't even come with a trigger word. Therefore the definition and test of freedom from side effects is for this type slightly different: create an image of a person or object (either already in the base model of added by a good LoRA) as image D first and then this side effect test should be done by additionally loading the style LoRA to create image E.

When the character/object is still looking like it should (but in the new style, of course) and anything that shouldn't be is not affected by the style in image E, there's no side effect.

When the character/object or anything else that shouldn't be is mutated much more than just changing the style you have a side effect.

And it should not destroy what we have already:

  • minor anatomy issues (hands, finger, feet): -1
  • major anatomy issues (bad arms and legs): -3

It should be easy to use:

  • does it have description about how to use it? +1
  • does it have sample images with sample prompts that show its effect and do they contain the prompt used to create them? +1

Adding all together we could come to a star rating:

13 - 14: Very good, 5 stars

11 - 12: Good, 4 stars

8 - 10: acceptable, 3 stars

5 - 7: poor, 2 stars

4 or less: bad, 1 star

I'm happy to hear your feedback on this attempt to bring quality to the LoRA. So I might update the scoring according to feedback, but I will be transparent about that so that there are not bad surprises.

And I'd also be very happy to see people using this scoring to score LoRAs on the typical places like civitai. And, of course, I'd be also very happy when this helps LoRA trainers to create a good LoRA.

view more: ‹ prev next ›