StableDiffusion

98 readers
1 users here now

/r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and...

founded 1 year ago
MODERATORS
751
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/tom83_be on 2024-08-29 13:23:40+00:00.

752
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/QuentinWach on 2024-08-29 13:07:29+00:00.


I made this image ranker which has a pretty browser GUI to rank images and fine-tune my own models.

Select images from a local directory, start ranking images through pairwise comparison, and export the ranking data with the file names, elo, up and down votes.

753
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/UnlimitedDuck on 2024-08-29 12:27:58+00:00.

754
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/BlastedRemnants on 2024-08-29 09:11:50+00:00.


Got the idea to try this while checking out this post about artist styles on Flux Pro yesterday, so thanks to the the OP u/AWTom. In any case, I tried all the artists in their list with FluxDev and saved all the images to Google Drive if anyone would like to have a look. There are just over 900 images at 960x1280, each with the prompt "style of (artistname:1.5), catwoman standing with hands on hips

All the images were made with the same prompt, seed, and settings, with the only change being the artist named in the prompt. The images should still have the ComfyUI workflow I used to make them, but you'll need to make the wildcard file yourself. In the original post linked above there is a Pastebin link from the OP, I saved that as a text file and then used notepad++ to reorganize the lines alphabetically so I could read it easier.

I don't know how long it will all be able stay in my Google Drive though, so feel free to suggest another host or download the images yourself and do whatever you like with them. There are a bunch of interesting looking styles in there, and a bunch that just give a sort of generic look, but some are very cool looking and entirely unique so you'll be looking through them for a hot minute I'd expect.

Drive Link

Some examples of the variety

755
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/protector111 on 2024-08-29 07:14:42+00:00.


People often ask - so i post on CivitAi my workflow (All images in this thread were created with it) It can produce up to 4k res images. It is minimalistic with only often used settings visible. Rest is hidden beyond the screen and noodle are also hiden. This creates minimalsitic clean UI with 50% of space occupied with gen preview. Click "reset view" to recenter.

this is how i use it.

how to hide/unhide noodles.

Next are hi-res images generated with it (some are 5k):

3328 x 4864

300% zoom

3328 x 4864

200% zoom

756
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/EldrichArchive on 2024-08-29 06:45:59+00:00.

757
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/OkSpot3819 on 2024-08-29 08:05:10+00:00.


  • CogVideoX-5B: Open-source video generation model originating from QingYing (with diffuserslib, it fits on < 10GB VRAM) (HUGGING FACE | GITHUB | PAPER)
  • Meta Sapiens: AI vision models for human analysis at 1k resolution - 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction (GITHUB | HUGGING FACE)
  • LayerPano3D: a novel framework to generate full-view, explorable panoramic 3D scene from a single text prompt (GITHUB)
  • Kolors Virtual Try-On (HUGGING FACE DEMO)
  • GenWarp: AI model that can generate new views of a scene from just a single input image (PAPER | HUGGING FACE DEMO | GITHUB)
  • Hyper-SD (Flux): Bytedance released Flux.1-Dev 8/16step LoRAs - generate images in just 8/16 steps (HUGGING FACE DEMO)
  • Imagen 3 is now available on Gemini. Source.
  • Background removal with WebGPU: in-browser background removal (GITHUB | HUGGING FACE DEMO)
  • Deforum Studio Updates: four new presets based on "audio events", which you can detect or manually place on the audio track. Also, smoothing is now available for classic presets. Link.
  • Freepik Mystic: New image generator. Source.
  • Fotographer.ai Fuzer v0.1: image editing tool that allows users to combine foreground elements with different backgrounds. It aims to preserve the shape and style of the foreground while integrating it into the new background (HUGGING FACE DEMO)
  • MagicMan: generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement (HUGGING FACE PAPER)
  • MeTTA: Single-View to 3D Textured Mesh Reconstruction with Test-Time Adaptation (PROJECT PAGE)

These will all be covered in the weekly newsletter, check out the most recent issue.

Here are the updates from the previous week:

  •  CCTV-style images: Flux dev capable of generating convincing surveillance-like footage.
  •  Amateur Photography LoRA v2: Enhanced Flux LoRA for realistic casual photographs.
  •  Personal likeness LoRA: Successful training with only 15 self-captioned images.
  •  Low VRAM training: Flux LoRA training achieved on RTX 3060 with 12GB VRAM.
  •  16GB VRAM guide: Method for training Flux LoRA using only 16GB of VRAM shared.
  •  FinetunersAI insights: Valuable recommendations on training LoRA models for Flux.
  •  XLabs ControlNet: New Canny, HED, and Depth models (Version 3) for Flux released.
  •  Union ControlNet: InstantX's union ControlNet implemented in ComfyUI for Flux.
  •  AI in politics: Trump's use of AI-generated images sparks debate on misinformation.
  •  Procreate's stance: Popular illustration app announces no integration of generative AI.
  •  Pony Diffusion V7: Significant update announced with various improvements.
  •  Black Forest Labs interview: Founders discuss journey from Stable Diffusion to new ventures.
  •  Ideogram 2.0: New AI image generation platform released with various features.
  • Luma AI Dream Machine 1.5: Upgraded text-to-video generator with enhanced capabilities.
  •  Flux Deforum: XLabs-AI releases Flux implementation of Deforum framework.
  •  ComfyUI-Nexus: New extension enabling multiplayer collaboration in ComfyUI.
  •  Flux LoRA showcase: New LoRAs for custom typefaces and themed designs.

Compiled resource for all links can be found here.

758
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/iamnickwilde on 2024-08-29 07:42:05+00:00.

759
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/love1008 on 2024-08-29 04:47:20+00:00.

760
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/wonderflex on 2024-08-28 22:47:27+00:00.


I created this simple Comfy UI workflow to automatically caption images using Florence for training LoRAs. Most of the ones I found were overly complex,, so I tried to made this one as easy to use as possible.

  1. Enter your trigger word
  2. Paste your images folder directory path
  3. Set your queue to match the number of images
  4. Run.

It will scan the images, caption each one, and then give it a .txt in the same folder, with the same name as the image.

Instructions for each step are included above, with some options noted below. It should be pretty easy to modify to your needs.

761
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/UnlimitedDuck on 2024-08-28 22:33:25+00:00.

762
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/gurilagarden on 2024-08-28 21:11:11+00:00.


They're merged models. You didn't train the model. You trained loras. Not. The. Same. Thing.

763
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/dr_lm on 2024-08-28 19:22:22+00:00.


SD was always a bit crap at generating at higher resolutions than its training data. Flux seems much better.

I have two monitors, a 27" 2560 x 1440 next to a 34" 3440 x 1440, for a combined 6000 x 1440 if you span across both.

I have got even SD1.5 to generate at this aspect ratio, but it repeated elements predictably a bit like a clone tool. Flux, if prompted with words like "panoramic" makes a much better go of it.

Low-res imgur:

What it looks like in the room:

PNG with comfyui workflow embedded:

764
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Comedian_Then on 2024-08-28 16:12:32+00:00.

765
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/camenduru on 2024-08-28 18:51:41+00:00.

766
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/StrongClass86 on 2024-08-28 16:04:09+00:00.

Original Title: This is the most iconic image in r/stablediffusion history. All the SOTA LLMs fail to describe it correctly. Claude 3.5 Sonnet vs GPT4o vs Gemini 1.5 Pro 0827 (released yesterday) vs 1.5 Pro (Gemini advanced) vs GPT (Microsoft Copilot).

767
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/fab1an on 2024-08-28 15:33:29+00:00.

768
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/EddieGoldenX on 2024-08-28 12:50:41+00:00.

769
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Agreeable_Release549 on 2024-08-28 12:39:44+00:00.


I see a lot of workflows our there that could be easy to understand, but are presented in super complex way.

We read from left to right. Plus it seems obvious that we want to group things together (3 nodes linking to node X should be putted together)

I see many workflows with 'input nodes' on right side and linking it to nodes on left side. Some nodes receive links from nodes putted all over the workflow. It takes 5x more time to understand this spaghetti.

Don't get me wrong - I'm grateful to all community members but don't understand some things :)

Are people making it on purpose? What is the reason for that? :D

770
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/airduster_9000 on 2024-08-28 10:38:34+00:00.

771
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Wilsown on 2024-08-28 08:28:44+00:00.

772
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/camenduru on 2024-08-28 08:23:43+00:00.

773
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/tintwotin on 2024-08-28 09:36:28+00:00.

774
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/eldelentes_mx on 2024-08-28 05:04:18+00:00.

775
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/chicco4life on 2024-08-28 04:05:58+00:00.


Stonelax again,

I made a quick Flux workflow of the long waited open-pose and tile ControlNet modules. (Canny, depth are also included.) The backbone of this workflow is the newly launched ControlNet Union Pro by InstantX.

Workflow here:

I quickly tested it out, anad cleaned up a standard workflow (kinda sucks that a standard workflow wasn't included in huggingface or the loader github repo) so ya'll can have a try for yourselves. Some quick impressions:

  1. ControlNet Union Pro seems to take more computing power than Xlab's ControlNet, so try and keep image size small.
  2. Openpose works, but it seems hard to change the style and subject of the prompt, even with the help of img2img. For example, I inputted a CR7 siu pose and inputted "a robot" in prompt, the output image remained a male soccer player. I had to lower the strength to ~0.2 and finally got a robot, but the pose was slightly off.

Comparison below:

Top - strength ~0.2, pose is slightly off

Bottom- strength ~0.5, pose is accurate but no robot

strength ~0.2, pose is slightly off

~0.5, pose is accurate but no robot

3)The strength of image composition control seems to be slightly better than that of Xlab, but to be honest Xlab's Canny and Depth are quite usable already.

Anyway, having openpose and tile support is a win regardless! I will try to see if speed and style transfer can be optimized tomorrow.

Please let me know any of you make progress on speeding it up & style transfer too!

Cheers

view more: ‹ prev next ›