StableDiffusion

98 readers

1 users here now

/r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and...

founded 1 year ago

MODERATORS

bot@lemmit.online

simpletuner v1.1 - SOAP optim, faster training with torch compile, FP8 for ROCm users and more (old.reddit.com)

submitted 2 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink hide all child comments

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/terminusresearchorg on 2024-10-01 21:02:31+00:00.

Performance

Improved launch speed for large datasets (>1M samples)
Improved speed for quantising on CPU
Optional support for directly quantising on GPU near-instantly (--quantize_via)

Compatibility

SDXL, SD1.5 and SD2.x compatibility with LyCORIS training
Updated documentation to make multiGPU configuration a bit more obvious.
Improved support for torch.compile(), including automatically disabling it when eg. fp8-quanto is enabled
- Enable via accelerate config or config/config.env via TRAINER_DYNAMO_BACKEND=inductor
TorchAO for quantisation as an alternative to Optimum Quanto for int8 weight-only quantisation (int8-torchao)
f8uz-quanto, a compatibility level for AMD ROCm users to experiment with FP8 training dynamics
Support for multigpu PEFT LoRA training with Quanto enabled (not fp8-quanto)
- Previously, only LyCORIS would reliably work with quantised multigpu training sessions.
Ability to quantise models when full-finetuning, without warning or error. Previously, this configuration was blocked. Your mileage may vary, it's an experimental configuration.

Integrations

Images now get logged to tensorboard (thanks u/anhi)
FastAPI endpoints for integrations (undocumented)
"raw" webhook type that sends a large number of HTTP requests containing events, useful for push notification type service

Optims

SOAP optimiser support
- uses fp32 gradients, nice and accurate but uses more memory than other optims, by default slows down every 10 steps as it preconditions
New 8bit and 4bit optimiser options from TorchAO (ao-adamw8bit, ao-adamw4bit etc)

Schnell

Recently we discovered that training LyCORIS LoKr on Flux.1 Dev works perfectly fine on Flux.1 Schnell at just 4 steps, and that the problems of transferring it over are specific to LoRA.

No special training is needed, other than to just train on Dev instead of Schnell.

The release:

The quickstart:

Some docs have been updated for v1.1, mostly OPTIONS.md and the FLUX quickstart.

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here