StableDiffusion

98 readers
1 users here now

/r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and...

founded 1 year ago
MODERATORS
51
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/sayoonarachu on 2024-10-16 02:02:49+00:00.

52
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Norby123 on 2024-10-15 22:50:13+00:00.


Soooo

I recently discovered shakker; it seems to be interesting, pretty multicultural, lot of great resources (that I couldn't find on Civit). But besides that, I'd rather not just google and download from unknown websites.

Do you guys know any - I guess Chinese, since they are pushing AI really hard - websites that has Loras, checkpoints, etc., basically a "different Civitai" that's popular maybe in the asian region? Or Russia, or I don't know. I don't mind if it's not English, I'm willing to do constant translates just for a good Greg Rutkowski Lora.

Also, possibly free. I've seen paid-to-download SDXL stuff, I dunno if that was a scam, but I'd prefer avoiding that.

thank you <3

53
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Robo420- on 2024-10-15 20:04:26+00:00.

54
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Cheap_Fan_7827 on 2024-10-15 17:12:17+00:00.


(from model page)

About Sana

We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096 × 4096 resolution. Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU. Core designs include: Deep compression autoencoder: unlike traditional AEs, which compress images only 8×, we trained an AE that can compress images 32×, effectively reducing the number of latent tokens. Linear DiT: we replace all vanilla attention in DiT with linear attention, which is more efficient at high resolutions without sacrificing quality. Decoder-only text encoder: we replaced T5 with modern decoder-only small LLM as the text encoder and designed complex human instruction with in-context learning to enhance the image-text alignment. Efficient training and sampling: we propose Flow-DPM-Solver to reduce sampling steps, with efficient caption labeling and selection to accelerate convergence.

As a result, Sana-0.6B is very competitive with modern giant diffusion model (e.g. Flux-12B), being 20 times smaller and 100+ times faster in measured throughput. Moreover, Sana-0.6B can be deployed on a 16GB laptop GPU, taking less than 1 second to generate a 1024 × 1024 resolution image. Sana enables content creation at low cost.

(This is a reprint post and is unofficial, again)

55
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/vmandic on 2024-10-15 17:01:13+00:00.

56
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Dersemonia on 2024-10-15 15:55:23+00:00.

57
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Creative-Listen-6847 on 2024-10-15 14:26:26+00:00.

58
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/mehul_gupta1997 on 2024-10-15 03:34:51+00:00.


The CogVideoX team released CogView3, an open-sourced text-image model that can produce images of resolution upto 2048. Check the demo here :

59
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/CeFurkan on 2024-10-15 11:42:17+00:00.


Releases here :

Discussion here :

Main repo here :

Test code here :

I generated a Python 3.10 venv, installed torch 2.4.1, and test code now works directly with released wheel install

You need to have installed C++ tools and SDKs, CUDA 12.4, Python, cuDNN

My tutorial for how to install these are fully valid (fully open access - not paywalled - reminder to mods : you had verified this video) :

Test code result as below

60
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/SemaiSemai on 2024-10-15 10:23:34+00:00.

61
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/BigRub7079 on 2024-10-15 08:28:15+00:00.

62
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/GeekyBit on 2024-10-15 04:30:07+00:00.


Here are the Stats

I am using Forge UI with

Flux1-dev-fpv8.safetensor

Text is T5xxl_fp8_e4m3fn.safetensor

t5xxl_fp16.safetensor

VEA is ae.safetesnor

Clip is ViT-L-14-TEXT-Detail-improved-hiT-GmP-TE-only-HF.safetensor

Lastly the prompt is anime cat adventurers in a small fantasy town with swords walking

This is a simple silly test prompt I have been using...

Now The thing is I have tested several other prompts and since upgrading to the 4060 ti 16gb with no other changes a lot of times the people, or characters have been facing away or side ways instead of looking at the camera/forward...

EDIT: 20 steps for both sets.

EDIT: 2 IT IS FIXED... Now They face the correct way unless prompted to not face the camera/view port. Here is how I Fixed it... Clean install

Please Note my issue isn't the new style, but the fact that they were generating facing away from the view port/ camera when generating with no prompt to do that.

EDIT: 3 Thanks for all you giving great suggestions about seeds and how to set that up. I didn't know any of that before and it is great to know, however it was very much not related to my issue... some how the prompt was just adding extra stuff in the way of making the bodies of the people look the other way instead of at the camera/ view port.

63
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/ninjasaid13 on 2024-10-15 05:42:07+00:00.


Disclaimer: I am not the author.

Paper:

Code:

Weights:

Abstract

Recently, large-scale diffusion models have made impressive progress in text-to-image (T2I) generation. To further equip these T2I models with fine-grained spatial control, approaches like ControlNet introduce an extra network that learns to follow a condition image. However, for every single condition type, ControlNet requires independent training on millions of data pairs with hundreds of GPU hours, which is quite expensive and makes it challenging for ordinary users to explore and develop new types of conditions. To address this problem, we propose the CtrLoRA framework, which trains a Base ControlNet to learn the common knowledge of image-to-image generation from multiple base conditions, along with condition-specific LoRAs to capture distinct characteristics of each condition. Utilizing our pretrained Base ControlNet, users can easily adapt it to new conditions, requiring as few as 1,000 data pairs and less than one hour of single-GPU training to obtain satisfactory results in most scenarios. Moreover, our CtrLoRA reduces the learnable parameters by 90% compared to ControlNet, significantly lowering the threshold to distribute and deploy the model weights. Extensive experiments on various types of conditions demonstrate the efficiency and effectiveness of our method. Codes and model weights will be released at this https URL.

Figure 1: Their results of single-conditional generation, multi-conditional generation, style transfer.

Figure 2: Overview of the CtrLoRA framework. “CN” denotes Base ControlNet, “L” denotes LoRA.(a) They first train a shared Base ControlNet in conjunction with condition-specific LoRAs on a largescale dataset that contains multiple base conditions. (b) The trained Base ControlNet can be easily adapted to novel conditions with significantly less data, fewer devices, and shorter time.

Comparison of model size, dataset size, and training time cost. For N conditions, the totalnumber of parameters is 361M × N for ControlNet and 360M + 37M × N for the CtrLoRA.

Figure 3: Training and inference of the CtrLoRA framework. “SD” denotes Stable Diffusion, “CN”denotes Base ControlNet, and “L”s in different colors denote LoRAs for different conditions.

This paper presents CtrLoRA, a framework for creating controllable image generation models with minimal data and resources. It trains a Base ControlNet with condition-specific LoRAs, then adapts it to new conditions with additional LoRAs, reducing data needs and speeding up training. The models can be easily integrated with community models for multi-condition generation without extra training. A common issue with CtrlLoRA found is that color-related tasks, like Palette and Lineart, converge more slowly than spatial tasks, likely due to network architecture limitations. Future improvements may come from using more advanced transformer diffusion backbones like Stable Diffusion V3 and Flux.1.

64
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/an303042 on 2024-10-14 19:57:11+00:00.

65
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/dasjomsyeet on 2024-10-14 19:30:59+00:00.

66
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Much_Can_4610 on 2024-10-15 02:08:50+00:00.

67
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Brancaleo on 2024-10-14 19:27:03+00:00.

68
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/mrfofr on 2024-10-14 18:21:51+00:00.

69
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Eveia_ on 2024-10-14 17:39:32+00:00.


Feels like inswapper_128 is reaching its limit with flux and other new and upcoming models based on higher res data. Like face swapping when using Flux Models already looks rather shitty when generating 1448x1448 images with upscaling. There are other models on netrunner-exe/Insight-Swap-models-onnx at main (huggingface.co) so i wonder why we have no alternatives yet anyway.

70
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/CeFurkan on 2024-10-14 14:40:14+00:00.

Original Title: Huge FLUX LoRA vs Fine Tuning / DreamBooth Experiments Completed, Moreover Batch Size 1 vs 7 Fully Tested as Well, Not Only for Realism But Also for Stylization - 15 vs 256 images having datasets compared as well (expressions / emotions tested too)

71
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Icy-Corgi4757 on 2024-10-14 11:34:52+00:00.

72
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/camenduru on 2024-10-14 09:48:16+00:00.

73
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/OkSpot3819 on 2024-10-14 09:43:59+00:00.


Stories:

REMspace: California neurotechnology startup achieves two-way communication with people during dreams, potentially revolutionizing mental health treatments and skills training methods.

AI.Lonso Launch: ElevenLabs and DeepReel partner with Aston Martin Aramco Formula One Team to create Ai.lonso, an AI-powered tool enhancing fan engagement through multilingual content translation.

Put This On Your Radar:

  • AI Inverse Painting: New method for recreating masterpieces step-by-step using diffusion-based technology.
  • DressRecon: 3D human model generator from videos, capturing complex clothing and held objects.
  • Podcastfy: Open-source tool for converting text to audio podcasts with multilingual capabilities.
  • PMRF: Advanced image restoration algorithm balancing distortion reduction and perceptual quality.
  • WonderWorld AI: Real-time 3D scene generation from a single image in just 10 seconds.
  • Hailuo AI: New image-to-video generation feature with precise object manipulation and style options.
  • Free 3D Object Texturing Tool: Using Forge and ControlNet for game developers and 3D artists.
  • Gradio: Background removal tool for videos.
  • Image to Pixel Style Converter: ComfyUI workflow for transforming regular images into pixel art style.
  • FacePoke: Interactive face expression editor with drag-and-drop interface.
  • Dreamina AI V2.0: All-in-one AI generator developed by ByteDance, currently in beta testing.
  • Pyramid Flow SD3: New open-source video generation tool based on Stable Diffusion 3.
  • EdgeRunner: NVIDIA's high-quality 3D mesh generator from images and point-clouds.
  • ViBiDSampler: Tool for generating high-quality frames between two keyframes.

📰 Full newsletter with relevant links, context, and visuals available in the original document.

🔔 If you're having a hard time keeping up in this domain - consider subscribing. We send out our newsletter every Sunday.

74
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Previous-Street8087 on 2024-10-14 08:43:46+00:00.

75
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/woadwarrior on 2024-10-14 08:02:39+00:00.

view more: ‹ prev next ›