StableDiffusion

98 readers
1 users here now

/r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and...

founded 1 year ago
MODERATORS
1
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/lofi21345 on 2024-10-19 15:20:25+00:00.


I had been struggling with this for awhile, and I've seen others posting about this too:

Here's my example. Both images have had their levels adjusted the same amount to exaggerate the effect present in the first image

Pre-fix after 2x latent upscaling and noise injection

Same parameters and seeds, effect removed. The composition was slightly altered

I fixed this issue by modifying the block weights of the LoRA I was using. tl;dr is to reduce the influence of double block indicies 1-3, and this eliminates the effect.

The "A" parameter will be loaded into double block indicies 1-3. I've had success with values between 0-0.25. Any more than that I still had noticeable gridlines in some images

If you're running into these gridline effect issues without using LoRA's then this fix won't work for you unfortunately. I don't know if there's an equivalent extension in other UI's, but the block weight node for Comfy is in this repo:

If you are training your own loras then you can completely eliminate the effect and always have beautiful upscales by not training the blocks in question. I use these settings in Kohya:

"train_double_block_indices": "0,4-18"

I hope this helps others who may be running into the same issues.

2
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Exotic-Midnight-3912 on 2024-10-19 13:23:00+00:00.

3
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/KacperXX on 2024-10-19 16:03:07+00:00.

4
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/1infinitelooo on 2024-10-19 17:24:41+00:00.

5
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Previous-Street8087 on 2024-10-19 13:50:03+00:00.

6
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/diStyR on 2024-10-19 08:16:11+00:00.

7
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/reddit22sd on 2024-10-19 07:55:40+00:00.


Just noticed that the Krita AI plugin was updated with Custom workflows! You can have comfy and Krita open at the same time and changes in comfy are updated in Krita. Makes it even more powerful than it already was.

Plugin

Youtube demo

8
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Haghiri75 on 2024-10-19 07:52:28+00:00.


I'm glad to announce we've released a new version of our SDXL 1.0 based model Mann-E Dreams. The version 0.0.5, is the newest release and had changes (in comparison to the older one). These are some changes worth noting:

  • Now it can generate both artistic and realistic images (although it still may work not-so-fine in some styles)
  • It is now more liberated and capable of doing more NSFW artwork (since we intended this version to be self-hosted, we had no obstacles in making it as free as possible in any sense of the word free).
  • Now it supports a wider range of Schedulers (Euler, Euler a are also supported but not as good as DPM++ SDE Karras, honestly).
  • We've used a big set of images from midjourney generated images in training process (and yes, it unfortunately inherits bad behavior of MJ as well).
  • Now you can generate images up to 1280x1280 pixels! Although I personally recommend these numbers:
    • HD Square: 1024x1024
    • 16:9 : 720x1280

These were what we've been working on, and now...

Download Links

  • HuggingFace:
  • CivitAI:

Notes

  • All LoRA's working with SDXL 1.0 are working fine with this model.
  • ControlNet and img2img pipelines were just fine.
  • Inpainting/Outpainting hasn't been tested, but since img2img was fine, we may assume those are fine as well. But I'd be glad to know if there's a problem with them.
  • In version 0.0.4, a lot of people reported "grey film" or "white artifact" effect on the pictures, in this version, those are gone!
  • The model has been tested in A1111, so if you test it in another UI, please report your feedback so we'll be working on it.

Final thoughts

First, I made an announcement on X as well. I'd be happy if you consider following me there. Second, I know the new hot topic of every AI image generation platform is FLUX, but our intention was to make it easy to self-host with limited resources.

This model was basically our masterpiece in distillation and making SDXL 1.0 much faster than usual and make it run on a single 3080 as well!

We're also open to every opinion about the model 🤗

9
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/akatz_ai on 2024-10-19 05:18:10+00:00.

10
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/shtorm2005 on 2024-10-19 02:44:06+00:00.

11
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/hackerzcity on 2024-10-18 22:23:30+00:00.


IterComp helps AI models create images that are more accurate, detailed, and beautiful by learning from what went wrong and fixing it step by step!

How Does It Work?

  • Step 1: You tell the AI what you want to see (for example, “a dog playing in the park”).
  • Step 2: Different models (like FLUX or SDXL) will try to create this image. One might do a great job with colors, while another gets the dog’s position right.
  • Step 3: IterComp looks at the images from these models and figures out what worked and what didn’t.
  • Step 4: It combines the best parts from all the models and makes the picture better. It learns from each attempt (or iteration) and keeps refining the image until it’s just right!

You can enhanced the Flux AI Generated Image with the Help of IterComp Model and Upscaling

Resorces

Youtube Tutorial: 

12
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Corinstit on 2024-10-18 17:05:41+00:00.


Completely Free Resources :

  1. Determine Your Thumbnail Style, Layout, and Display Text ===========================================================
  • Conceptualize the Thumbnail: Start by thinking about the text, style, and layout you want for your thumbnail. Decide how the text should be placed on the image and the style you want, whether it's professional or cartoonish.

image style comparison

  • Specify the Image Style: Be clear about the style—whether you want a professional, cartoonish, or realistic look. This will guide the AI in creating the right type of thumbnail.
  • Include People (Optional): If your thumbnail features people, describe their positions, expressions, and appearance to make the thumbnail more engaging.
  1. How to Control Thumbnail Style Using LoRA Model (Optional) =============================================================

With the Flux Lora model, you can control various aspects of the image, like style and tone, to fit your needs.

  • Selecting the LoRA: Choose from different pre-trained LoRA models to adjust the style of your thumbnail, ranging from professional to vibrant, cartoonish looks.
  • Adjusting the Weights: Fine-tune the intensity of the LoRA’s effect by adjusting the weight parameter. Typically, weights between 0.5 and 0.8 work best for subtle control.
  • Mix and Match LoRAs: Combine multiple LoRAs to create more complex styles, adjusting weights for different parts of the image (e.g., background style, character features).
  1. Using Reference Photos for Personalization (Optional) ========================================================

For a personalized thumbnail, use the IP model to upload a reference photo. Make sure the reference expression matches the one you want in your thumbnail.

reference photo example

  • Enhance Creativity with Image Guides: You can set the “image guide” to 5 for more creativity, but this might result in less consistency in character features.

Important Tips: Write Effective Prompts for YouTube Thumbnail Generation

  • Keep the Prompt Length Moderate: Ensure your prompt is detailed but not overwhelming. A concise prompt is key to generating a thumbnail that captures your vision.
  • Place Key Elements at the Beginning: Mention the most important aspects, such as text, image style, or character details, early in the prompt.
  • Reference Photos: If you're using a reference photo, be sure to describe the person and their characteristics in the prompt for more accurate results.
  • Refer to Examples, But Experiment: Use example prompts as guidance, but don't hesitate to experiment. Creative prompts often yield unique thumbnails that stand out.

If you're unsure about how to write prompts, try this free Prompt Generator Tool. For instance, input something like: "A YouTube thumbnail with the text 'hello, try this with fluxailab' on the left, and a woman pointing at the text on the right."

Some Example:

prompt1 result

prompt1: A vibrant, eye-catching YouTube thumbnail featuring a woman with an excited expression. Her eyes are wide with enthusiasm, her lips slightly parted in a smile, and her finger points directly to the left, drawing the viewer's attention towards a large, bold text that reads "TRY THIS ON FLUX" in a bright, contrasting color. The background is a gradient of energetic colors that complements the overall mood, and the text is overlaid with a subtle, textured shadow for visual depth. The composition is dynamic, with the woman positioned slightly off-center to create visual interest.

prompt2 result

prompt2: A young girl, with bright, curious eyes and a wide, excited smile, points her index finger to the right. Her expression is full of anticipation and wonder. The finger is pointing towards the right side of the frame, where a bold, eye-catching font proclaims "Try this on Flux." The font is rendered in a modern, playful style, inviting viewers to explore the capabilities of Flux. The overall composition is vibrant and dynamic, with strong contrast between the girl and the text, creating a visually striking thumbnail for a YouTube video.

If you'd like to see more use cases, check out this [flux thumbnail example] for additional examples.

13
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Past_Pin415 on 2024-10-18 12:06:43+00:00.

14
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/chi90park on 2024-10-19 04:29:51+00:00.

15
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/renderartist on 2024-10-18 20:48:10+00:00.

16
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/blitzkr1eg on 2024-10-18 19:07:56+00:00.

17
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/wonderflex on 2024-10-18 16:39:39+00:00.

18
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Previous-Street8087 on 2024-10-18 15:23:56+00:00.

19
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/analgerianabroad on 2024-10-18 14:55:15+00:00.

20
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Past_Pin415 on 2024-10-18 13:55:23+00:00.

21
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Most_Way_9754 on 2024-10-18 12:37:05+00:00.

22
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/TheArchivist314 on 2024-10-18 11:47:37+00:00.

23
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Glass-Caterpillar-70 on 2024-10-18 10:29:18+00:00.

24
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/fre-ddo on 2024-10-18 08:21:21+00:00.

25
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/sanobawitch on 2024-10-18 06:45:59+00:00.


Janus is based on the DeepSeek-LLM-1.3b-base which is trained on an approximate corpus of 500B text tokens. For multimodal understanding, it uses the SigLIP-L as the vision encoder, which supports 384 x 384 image input. For image generation, Janus uses the Llama tokenizer.

They have released the model weights under DeepSeek Model License, the code is under MIT license. Commercial usage is permitted.

Three-stage training procedure

Link to the generated images. More examples with longer prompt.

The image resolutions are 1024×1024, 512×512, and 384×384.

Inference code here, they transform the decoded image patches to the final RGB image in... numpy. Yay!

So far, we have Omni, Mei, Sana and Ɉanus.

view more: next ›