this post was submitted on 03 Sep 2024
1 points (100.0% liked)

StableDiffusion

98 readers
1 users here now

/r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and...

founded 1 year ago
MODERATORS
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/terminusresearchorg on 2024-09-02 21:30:21+00:00.


release:

Left: Base Flux.1 Dev model, 20 steps

Right: LoKr with configure.py default network settings and --flux_attention_masked_training

this is a chunky release, the trainer was majorly refactored

But for the most part, it should feel like nothing has changed, and you could possibly continue without making any changes.

You know those projects you always want to get around to but you never do because it seems like you don't even know where to begin? I refactored and deprecated a lot to get the beginnings of a Trainer SDK started.

  • the config.env files are now deprecated in favour of config.json or config.toml
    • the env files still work. MOST of it is backwards-compatible.
    • any kind of shell scripting you had in config.env will no longer work, eg. the $(date) call inside TRACKER_RUN_NAME will no longer 'resolve' to the date-time.
    • please open a ticket on github if something you desperately needed is no longer working, eg. datetimes we can add a special string like {timestamp} that will be replaced at startup
  • the default settings that were previously overridden in a hidden manner by train.sh are, as best I could, integrated correctly into the defaults for train.py
    • in other words, some settings / defaults may have changed but, now there is just one source of information for the defaults: train.py --help
  • for developers, there's now a Trainer class to use
    • additionally, for people who are aspiring developers or would like a more interactive environment to mess with SimpleTuner, there is now a Jupyter Notebook that lets you peek deeper into the process of using this Trainer class through a functional training environment
    • it's still new, and I've not had much time to extend it with a public API to use, so it's likely things will change in these internal methods, and not recommended to fully rely on it just yet if this concerns you
      • but, future changes should be easy enough for seasoned developers to integrate into their applications.
    • I'm sure it could be useful to someone who wishes to make a GUI for SimpleTuner, but, remember, currently it's relying on WSL2 for Windows users.
  • bug: multigpu step tracking in the learning rate scheduler was broken, but now works. resuming will correctly start from where the LR last was, and its trajectory is properly deterministic
  • bug: the attention masking we published in the last releases had an input-swapping bug, where the images were being masked instead of the text
    • upside: the resulting fine details and text following in a properly masked model is unparalleled, and really makes Dev feel more like Pro with nearly zero effort
    • upside: it's faster! the new code places the mask properly at the end of the sequence which seems to optimise for pytorch's kernels; just guessing that it simply "chops off" the end of the sequence and stops processing it rather than having to "hop over" the initial positions when we masked at the front when using it on the image embeds.

The first example image at the top used attention masking, but here's another demonstration:

Steampunk inventor in a workshop, intricate gadgets, Victorian attire, mechanical arm, goggles

5000 steps here on the new masking code without much care for the resulting model quality led to a major boost on the outputs. It didn't require 5000 steps - but I think a higher learning rate is needed for training a subject in with this configuration.

The training data is just 22 images of Cheech and Chong, and they're not even that good. They're just my latest test dataset.

Alien marketplace, bizarre creatures, exotic goods, vibrant colors, otherworldly atmosphere

a hand is holding a comic book with a cover that reads 'The Adventures of Superhero'

a cybernetic anne of green gables with neural implant and bio mech augmentations

Oh, okay, so, I guess cheech & chong make everything better. Who would have thought?

I didn't have any text / typography in the data:

A report on the training data and test run here, from a previous go at it (without attention masking):

Quick start guide to get training with Flux:

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here