StableDiffusion

751

1

Nous Research publishes a report on DisTrO (Distributed Training Over-the-Internet) (x.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/tom83_be on 2024-08-29 13:23:40+00:00.

752

1

Free Tool to Rank Images and Fine-Tune Your Models (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/QuentinWach on 2024-08-29 13:07:29+00:00.

I made this image ranker which has a pretty browser GUI to rank images and fine-tune my own models.

Select images from a local directory, start ranking images through pairwise comparison, and export the ranking data with the file names, elo, up and down votes.

753

1

I hate when this happens (i.redd.it)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/UnlimitedDuck on 2024-08-29 12:27:58+00:00.

754

1

Some Examples of Artist Styles, FluxDev (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/BlastedRemnants on 2024-08-29 09:11:50+00:00.

Got the idea to try this while checking out this post about artist styles on Flux Pro yesterday, so thanks to the the OP u/AWTom. In any case, I tried all the artists in their list with FluxDev and saved all the images to Google Drive if anyone would like to have a look. There are just over 900 images at 960x1280, each with the prompt "style of (artistname:1.5), catwoman standing with hands on hips

All the images were made with the same prompt, seed, and settings, with the only change being the artist named in the prompt. The images should still have the ComfyUI workflow I used to make them, but you'll need to make the wildcard file yourself. In the original post linked above there is a Pastebin link from the OP, I saved that as a text file and then used notepad++ to reorganize the lines alphabetically so I could read it easier.

I don't know how long it will all be able stay in my Google Drive though, so feel free to suggest another host or download the images yourself and do whatever you like with them. There are a bunch of interesting looking styles in there, and a bunch that just give a sort of generic look, but some are very cool looking and entirely unique so you'll be looking through them for a hot minute I'd expect.

Drive Link

Some examples of the variety

755

1

Minimalistic version of my Flux Dev workflow (with 2 LORAs, ultimate sd Upscaler and Face Detailer) (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/protector111 on 2024-08-29 07:14:42+00:00.

People often ask - so i post on CivitAi my workflow (All images in this thread were created with it) It can produce up to 4k res images. It is minimalistic with only often used settings visible. Rest is hidden beyond the screen and noodle are also hiden. This creates minimalsitic clean UI with 50% of space occupied with gen preview. Click "reset view" to recenter.

this is how i use it.

how to hide/unhide noodles.

Next are hi-res images generated with it (some are 5k):

756

1

Ships of the Void Space (www.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/EldrichArchive on 2024-08-29 06:45:59+00:00.

757

1

Mid-week update for r/StableDiffusion - all the major developments in a nutshell (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/OkSpot3819 on 2024-08-29 08:05:10+00:00.

CogVideoX-5B: Open-source video generation model originating from QingYing (with diffuserslib, it fits on < 10GB VRAM) (HUGGING FACE | GITHUB | PAPER)
Meta Sapiens: AI vision models for human analysis at 1k resolution - 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction (GITHUB | HUGGING FACE)
LayerPano3D: a novel framework to generate full-view, explorable panoramic 3D scene from a single text prompt (GITHUB)
Kolors Virtual Try-On (HUGGING FACE DEMO)
GenWarp: AI model that can generate new views of a scene from just a single input image (PAPER | HUGGING FACE DEMO | GITHUB)
Hyper-SD (Flux): Bytedance released Flux.1-Dev 8/16step LoRAs - generate images in just 8/16 steps (HUGGING FACE DEMO)
Imagen 3 is now available on Gemini. Source.
Background removal with WebGPU: in-browser background removal (GITHUB | HUGGING FACE DEMO)
Deforum Studio Updates: four new presets based on "audio events", which you can detect or manually place on the audio track. Also, smoothing is now available for classic presets. Link.
Freepik Mystic: New image generator. Source.
Fotographer.ai Fuzer v0.1: image editing tool that allows users to combine foreground elements with different backgrounds. It aims to preserve the shape and style of the foreground while integrating it into the new background (HUGGING FACE DEMO)
MagicMan: generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement (HUGGING FACE PAPER)
MeTTA: Single-View to 3D Textured Mesh Reconstruction with Test-Time Adaptation (PROJECT PAGE)

These will all be covered in the weekly newsletter, check out the most recent issue.

Here are the updates from the previous week:

⚓ CCTV-style images: Flux dev capable of generating convincing surveillance-like footage.
⚓ Amateur Photography LoRA v2: Enhanced Flux LoRA for realistic casual photographs.
⚓ Personal likeness LoRA: Successful training with only 15 self-captioned images.
⚓ Low VRAM training: Flux LoRA training achieved on RTX 3060 with 12GB VRAM.
⚓ 16GB VRAM guide: Method for training Flux LoRA using only 16GB of VRAM shared.
⚓ FinetunersAI insights: Valuable recommendations on training LoRA models for Flux.
⚓ XLabs ControlNet: New Canny, HED, and Depth models (Version 3) for Flux released.
⚓ Union ControlNet: InstantX's union ControlNet implemented in ComfyUI for Flux.
⚓ AI in politics: Trump's use of AI-generated images sparks debate on misinformation.
⚓ Procreate's stance: Popular illustration app announces no integration of generative AI.
⚓ Pony Diffusion V7: Significant update announced with various improvements.
⚓ Black Forest Labs interview: Founders discuss journey from Stable Diffusion to new ventures.
⚓ Ideogram 2.0: New AI image generation platform released with various features.
⚓Luma AI Dream Machine 1.5: Upgraded text-to-video generator with enhanced capabilities.
⚓ Flux Deforum: XLabs-AI releases Flux implementation of Deforum framework.
⚓ ComfyUI-Nexus: New extension enabling multiplayer collaboration in ComfyUI.
⚓ Flux LoRA showcase: New LoRAs for custom typefaces and themed designs.

Compiled resource for all links can be found here.

758

1

How this animation style was made? (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/iamnickwilde on 2024-08-29 07:42:05+00:00.

759

1

Friends [ Flux-Kling ] (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/love1008 on 2024-08-29 04:47:20+00:00.

760

1

Simple Florence Captioning Workflow (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/wonderflex on 2024-08-28 22:47:27+00:00.

I created this simple Comfy UI workflow to automatically caption images using Florence for training LoRAs. Most of the ones I found were overly complex,, so I tried to made this one as easy to use as possible.

Enter your trigger word
Paste your images folder directory path
Set your queue to match the number of images
Run.

It will scan the images, caption each one, and then give it a .txt in the same folder, with the same name as the image.

Instructions for each step are included above, with some options noted below. It should be pretty easy to modify to your needs.

761

1

Ouch... Those were my good eyes... (i.redd.it)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/UnlimitedDuck on 2024-08-28 22:33:25+00:00.

762

1

A Minor Annoyance, but I really wish you guys would stop marking your Lora-merged Models as Trained models on the Civ. (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/gurilagarden on 2024-08-28 21:11:11+00:00.

They're merged models. You didn't train the model. You trained loras. Not. The. Same. Thing.

763

1

Flux for ultra-ultrawide (6000x1440) wallpapers is surprisingly good (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/dr_lm on 2024-08-28 19:22:22+00:00.

SD was always a bit crap at generating at higher resolutions than its training data. Flux seems much better.

I have two monitors, a 27" 2560 x 1440 next to a 34" 3440 x 1440, for a combined 6000 x 1440 if you span across both.

I have got even SD1.5 to generate at this aspect ratio, but it repeated elements predictably a bit like a clone tool. Flux, if prompted with words like "panoramic" makes a much better go of it.

Low-res imgur:

What it looks like in the room:

PNG with comfyui workflow embedded:

764

1

Getting Ready for Flux 2 Be Like (i.redd.it)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Comedian_Then on 2024-08-28 16:12:32+00:00.

765

1

1.3 GB VRAM 😛 (Flux 1 Dev) (i.redd.it)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/camenduru on 2024-08-28 18:51:41+00:00.

766

1

This is the most iconic image in r/stablediffusion history. All the SOTA LLMs fail to describe it correctly. Claude 3.5 Sonnet vs GPT4o vs Gemini 1.5 Pro 0827 (released yesterday) vs 1.5 Pro (Gemi... (www.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/StrongClass86 on 2024-08-28 16:04:09+00:00.

Original Title: This is the most iconic image in r/stablediffusion history. All the SOTA LLMs fail to describe it correctly. Claude 3.5 Sonnet vs GPT4o vs Gemini 1.5 Pro 0827 (released yesterday) vs 1.5 Pro (Gemini advanced) vs GPT (Microsoft Copilot).

767

1

Just an old fashioned selfie. (i.redd.it)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/fab1an on 2024-08-28 15:33:29+00:00.

768

1

Alice in Wonderbuilding - Flux Dev LoRA trained on the illustration of John Tenniel (1820-1914) which are public domain (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/EddieGoldenX on 2024-08-28 12:50:41+00:00.

769

1

Are people making workflow spaghetti on purpose? (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Agreeable_Release549 on 2024-08-28 12:39:44+00:00.

I see a lot of workflows our there that could be easy to understand, but are presented in super complex way.

We read from left to right. Plus it seems obvious that we want to group things together (3 nodes linking to node X should be putted together)

I see many workflows with 'input nodes' on right side and linking it to nodes on left side. Some nodes receive links from nodes putted all over the workflow. It takes 5x more time to understand this spaghetti.

Don't get me wrong - I'm grateful to all community members but don't understand some things :)

Are people making it on purpose? What is the reason for that? :D

770

1

CogVideo 5B - Video-to-Video test (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/airduster_9000 on 2024-08-28 10:38:34+00:00.

771

1

Some proper realism: StoreCCTV LoRa for FLUX-dev (www.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Wilsown on 2024-08-28 08:28:44+00:00.

772

1

🦋 CogVideoX-5B: Text-to-Video Diffusion Models with An Expert Transformer 📽 Jupyter Notebook 🥳 + 🥪 tost ai + 🍇 runpod serverless (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/camenduru on 2024-08-28 08:23:43+00:00.

773

1

CogVideoX-5b (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/tintwotin on 2024-08-28 09:36:28+00:00.

774

1

Adding noise and desaturating the image creates more human-like images. I made a little tool for this. (www.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/eldelentes_mx on 2024-08-28 05:04:18+00:00.

775

1

Flux Openpose & Tile ControlNets finally here! ControlNet Union Pro Workflow & Initial Thoughts (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/chicco4life on 2024-08-28 04:05:58+00:00.

Stonelax again,

I made a quick Flux workflow of the long waited open-pose and tile ControlNet modules. (Canny, depth are also included.) The backbone of this workflow is the newly launched ControlNet Union Pro by InstantX.

Workflow here:

I quickly tested it out, anad cleaned up a standard workflow (kinda sucks that a standard workflow wasn't included in huggingface or the loader github repo) so ya'll can have a try for yourselves. Some quick impressions:

ControlNet Union Pro seems to take more computing power than Xlab's ControlNet, so try and keep image size small.
Openpose works, but it seems hard to change the style and subject of the prompt, even with the help of img2img. For example, I inputted a CR7 siu pose and inputted "a robot" in prompt, the output image remained a male soccer player. I had to lower the strength to ~0.2 and finally got a robot, but the pose was slightly off.

Comparison below:

Top - strength ~0.2, pose is slightly off

Bottom- strength ~0.5, pose is accurate but no robot

strength ~0.2, pose is slightly off

~0.5, pose is accurate but no robot

3）The strength of image composition control seems to be slightly better than that of Xlab, but to be honest Xlab's Canny and Depth are quite usable already.

Anyway, having openpose and tile support is a win regardless! I will try to see if speed and style transfer can be optimized tomorrow.

Please let me know any of you make progress on speeding it up & style transfer too!

Cheers