This week in Stable Diffusion - all the major developments in a nutshell (old.reddit.com)

submitted 2 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/OkSpot3819 on 2024-10-01 09:22:59+00:00.

Interesting find of the week: Kat, an engineer who built a tool to visualize time-based media with gestures.
Flux updates:
- Outpainting: ControlNet Outpainting using FLUX.1 Dev in ComfyUI demonstrated, with workflows provided for implementation.
- Fine-tuning: Flux fine-tuning can now be performed with 10GB of VRAM, making it more accessible to users with mid-range GPUs.
- Quantized model: Flux-Dev-Q5_1.gguf quantized model significantly improves performance on GPUs with 12GB VRAM, such as the NVIDIA RTX 3060.
- New Controlnet models: New depth, upscaler, and surface normals models released for image enhancement in Flux.
- CLIP and Long-CLIP models: Fine-tuned versions of CLIP-L and Long-CLIP models now fully integrated with the HuggingFace Diffusers pipeline.
James Cameron joins Stability.AI: Renowned filmmaker James Cameron has joined Stability AI's Board of Directors, bringing his expertise in merging cutting-edge technology with storytelling to the AI company.
Put This On Your Radar:
- MIMO: Controllable character video synthesis model for creating realistic character videos with controllable attributes.
- Google's Zero-Shot Voice Cloning: New technique that can clone voices using just a few seconds of audio sample.
- Leonardo AI's Image Upscaling Tool: New high-definition image enlargement feature rivaling existing tools like Magnific.
- PortraitGen: AI portrait video editing tool enabling multi-modal portrait editing, including text-based and image-based effects.
- FaceFusion 3.0.0: Advanced face swapping and editing tool with new features like "Pixel Boost" and face editor.
- CogVideoX-I2V Workflow Update: Improved image-to-video generation in ComfyUI with better output quality and efficiency.
- Ctrl-X: New tool for image generation with structure and appearance control, without requiring additional training or guidance.
- Invoke AI 5.0: Major update to open-source image generation tool with new features like Control Canvas and Flux model support.
- JoyCaption: Free and open uncensored vision-language model (Alpha One Release) for training diffusion models.
- ComfyUI-Roboflow: Custom node for image analysis in ComfyUI, integrating Roboflow's capabilities.
- Tiled Diffusion with ControlNet Upscaling: Workflow for generating high-resolution images with fine control over details in ComfyUI.
- 2VEdit: Video editing tool that transforms entire videos by editing just the first frame.
- Flux LoRA showcase: New FLUX LoRA models including Simple Vector Flux, How2Draw, Coloring Book, Amateur Photography v5, Retro Comic Book, and RealFlux 1.0b.

📰 Full newsletter with relevant links, context, and visuals available in the original document.

🔔 If you're having a hard time keeping up in this domain - consider subscribing. We send out our newsletter every Sunday.

252

1

OPTIMUS 5 COMMERCIAL (youtu.be)

submitted 2 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Opening-Ad5541 on 2024-10-01 07:51:06+00:00.

253

1

PyTorch Native Architecture Optimization: torchao (pytorch.org)

submitted 2 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/formalsystem on 2024-10-01 03:43:29+00:00.

254

1

Just the Police. (www.reddit.com)

submitted 2 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/EldrichArchive on 2024-10-01 01:54:45+00:00.

255

1

Ultimate Instagram Influencer LoRA - Flux Edition (www.reddit.com)

submitted 2 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/MikirahMuse on 2024-09-30 18:33:51+00:00.

256

1

Better Flux ControlNets? (old.reddit.com)

submitted 2 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/urgettingtallpip on 2024-09-30 22:33:03+00:00.

has anybody heard of new flux controlnets being trained/coming out soon? the current ones released by Xlabs and instantX feel mediocre at best.

257

1

Shepard Fairey Style LoRA [FLUX] (www.reddit.com)

submitted 2 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/jenza1 on 2024-09-30 15:07:47+00:00.

258

1

Flux [dev] with ControlNets is awesome. (old.reddit.com)

submitted 2 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Angrypenguinpng on 2024-09-30 21:58:03+00:00.

259

1

CogVideoX-Fun-V1.1 (Including versions for Pose) (old.reddit.com)

submitted 2 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Striking-Long-2960 on 2024-09-30 19:51:34+00:00.

New versions of CogVideoX-Fun 5B and 2B have been released. Including a new model that I believe it's thought for animating humans.

Retrain the i2v model and add noise to increase the motion amplitude of the video. Upload the control model training code and control model. [ 2024.09.29 ]

5B

2B

The custom node for comfyUI Cogvdeoxwrapper has an initial support for these new models.

260

1

An img2img recreation of a screenshot from a cutscene from Halo 3 with Flux (www.reddit.com)

submitted 2 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/idunno63 on 2024-09-30 15:47:13+00:00.

261

1

Dr. Farnsworth from Futurama (Flux) (www.reddit.com)

submitted 2 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/theroom_ai on 2024-09-30 12:28:23+00:00.

262

1

New Apache 2.0 licensed small diffusion models: CogView3 and CogView-3 Plus (github.com)

submitted 2 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/woadwarrior on 2024-09-30 13:39:26+00:00.

263

1

How to generate videos like this? (old.reddit.com)

submitted 2 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/gpahul on 2024-09-30 12:39:16+00:00.

264

1

Flux-Ring Light (Lora) (www.reddit.com)

submitted 2 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Halodri88 on 2024-09-30 11:23:12+00:00.

265

1

FLUX.1-dev ControlNet Upscaler (old.reddit.com)

submitted 2 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/hackerzcity on 2024-09-30 04:39:03+00:00.

This model has been trained on lots of artificially damaged images—things like noise, blurriness, or compression. And it learns from those bad images and can turn your blurry pictures into clearer ones.

266

1

Trained a Groovy Psychedelic 70s style LoRA! Hope you dig it ☮️🎨 – Time to get far out with vibrant colors and trippy vibes with "PsyPop70 🌈🌀✨" (www.reddit.com)

submitted 2 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/an303042 on 2024-09-30 08:31:46+00:00.

267

1

Should I stay or should I go (i.redd.it)

submitted 2 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Jonfreakr on 2024-09-30 08:17:15+00:00.

268

1

Emu3: Next-Token Prediction is All You Need (old.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/ninjasaid13 on 2024-09-30 05:21:42+00:00.

Paper: (pdf link is broken for some reason)

Project Page:

Code:

Model: (Apache License for all models) and the vision tokenizer

Disclaimer: I am not the author.

Overview

While next-token prediction is considered a promising path towards AGI, it has struggled to excel in multimodal tasks, which are still dominated by diffusion models (e.g., Stable Diffusion) and compositional approaches (e.g., CLIP combined with LLMs). In this work, we introduce Emu3, a new suite of state-of-the-art multimodal models trained solely with next-token prediction. By tokenizing images, text, and videos into a discrete space, we train a single transformer from scratch on a mixture of multimodal sequences.

Examples

They introduce Emu3, a new suite of state-of-the-art multimodal models trained solely with next-token prediction. They introduce Emu3, a new suite of state-of-the-art multimodal models trained solely with next-token prediction! By tokenizing images, text, and videos into a discrete space, they train a single transformer from scratch on a mixture of multimodal sequences.

Emu3 excels in both generation and perception

Emu3 outperforms several well-established task-specific models in both generation and perception tasks, surpassing flagship open models such as SDXL, LLaVA-1.6 and OpenSora-1.2, while eliminating the need for diffusion or compositional architectures.

! By tokenizing images, text, and videos into a discrete space, they train a single transformer from scratch on a mixture of multimodal sequences.

Emu3 excels in both generation and perception

Emu3 outperforms several well-established task-specific models in both generation and perception tasks, surpassing flagship open models such as SDXL, LLaVA-1.6 and OpenSora-1.2, while eliminating the need for diffusion or compositional architectures.

Video Generation

Emu3 is capable of generating videos. Unlike Sora which employs a video diffusion model to generate the video from noise, Emu3 simply generates a video causally by predicting the next token in a video sequence.

Video Prediction

With a video in context, Emu3 can naturally extend the video and predict what will happen next. The model can simulate some aspects of the environment, people and animals in the physical world.

Vision-Language Understanding

Emu3 demonstrates strong perception capabilities to understand the physical world and provides coherent text responses. Notably, this capability is achieved without depending on a CLIP and a pretrained LLM.

269

1

California governor vetos bill SB-1047 (old.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Pretend_Potential on 2024-09-30 02:52:22+00:00.

Just going to post the link to the news article rather than quote the entire article.

270

1

What model would I need to create images like this? (www.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/stupidxthrowaway on 2024-09-30 01:05:21+00:00.

271

1

Punk generations (i.redd.it)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/myf4pacc0unt on 2024-09-29 21:06:37+00:00.

272

1

FLUX Sci-Fi Enhance Upscale (www.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/nootropicMan on 2024-09-29 21:47:40+00:00.

273

1

Yamato-e style Flux lora (www.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Devajyoti1231 on 2024-09-29 08:47:42+00:00.

274

1

lorakit: A Simple Toolkit for Rapid Prototyping SDXL LoRA Models (old.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/os75 on 2024-09-29 02:53:10+00:00.

Hey guys, So I've been working on this thing I'm calling lorakit. It's just a little toolkit I threw together for training SDXL LoRA models. It is heavily based on DreamBooth from AutoTrain but with similar configuration style as ai-toolkit. Nothing fancy, but it's been pretty handy for quick experiments and prototyping. Thought some of you might wanna check it out:

275

1

How do I make realistic animals like this in Flux? (www.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/smusamashah on 2024-09-29 13:06:03+00:00.

StableDiffusion

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

Emu3 excels in both generation and perception

Emu3 excels in both generation and perception

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.