this post was submitted on 02 Oct 2024
1 points (100.0% liked)

StableDiffusion

98 readers
1 users here now

/r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and...

founded 1 year ago
MODERATORS
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/terminusresearchorg on 2024-10-01 21:02:31+00:00.


Performance

  • Improved launch speed for large datasets (>1M samples)
  • Improved speed for quantising on CPU
  • Optional support for directly quantising on GPU near-instantly (--quantize_via)

Compatibility

  • SDXL, SD1.5 and SD2.x compatibility with LyCORIS training
  • Updated documentation to make multiGPU configuration a bit more obvious.
  • Improved support for torch.compile(), including automatically disabling it when eg. fp8-quanto is enabled
    • Enable via accelerate config or config/config.env via TRAINER_DYNAMO_BACKEND=inductor
  • TorchAO for quantisation as an alternative to Optimum Quanto for int8 weight-only quantisation (int8-torchao)
  • f8uz-quanto, a compatibility level for AMD ROCm users to experiment with FP8 training dynamics
  • Support for multigpu PEFT LoRA training with Quanto enabled (not fp8-quanto)
    • Previously, only LyCORIS would reliably work with quantised multigpu training sessions.
  • Ability to quantise models when full-finetuning, without warning or error. Previously, this configuration was blocked. Your mileage may vary, it's an experimental configuration.

Integrations

  • Images now get logged to tensorboard (thanks u/anhi)
  • FastAPI endpoints for integrations (undocumented)
  • "raw" webhook type that sends a large number of HTTP requests containing events, useful for push notification type service

Optims

  • SOAP optimiser support
    • uses fp32 gradients, nice and accurate but uses more memory than other optims, by default slows down every 10 steps as it preconditions
  • New 8bit and 4bit optimiser options from TorchAO (ao-adamw8bit, ao-adamw4bit etc)

Schnell

Recently we discovered that training LyCORIS LoKr on Flux.1 Dev works perfectly fine on Flux.1 Schnell at just 4 steps, and that the problems of transferring it over are specific to LoRA.

No special training is needed, other than to just train on Dev instead of Schnell.

The release:

The quickstart:

Some docs have been updated for v1.1, mostly OPTIONS.md and the FLUX quickstart.

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here