this post was submitted on 31 Aug 2024
1 points (100.0% liked)

StableDiffusion

98 readers
1 users here now

/r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and...

founded 1 year ago
MODERATORS
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/tom83_be on 2024-08-31 14:33:20+00:00.


Intro

There are a lot of requests on how to do LoRA training with Flux.1 dev. Since not everyone has 24 VRAM, interest in low VRAM configurations is high. Hence, I searched for an easy and convenient but also completely free and local variant. The setup and usage of "ComfyUI Flux Trainer" seemed matching and allows to train with 12 GB VRAM (I think even 10 GB and possibly even below). I am not the creator of these tools nor am I related to them in any way (see credits at the end of the post). Just thought a guide could be helpful.

Prerequisites

git and python (for me 3.11) is installed and available on your console

Steps (for those who know what they are doing)

  • install ComfyUI
  • install ComfyUI manager
  • install "ComfyUI Flux Trainer" via ComfyUI Manager
  • install protobuf via pip (not sure why, probably was forgotten in the requirements.txt)
  • load the "flux_lora_train_example_01.json" workflow
  • install all missing dependencies via ComfyUI Manager
  • download and copy Flux.1 model files including CLIP, T5 and VAE to ComfyUI; use the fp8 versions for Flux.1-dev and the T5 encoder
  • use the nodes to train using:
    • 512x512
    • Adafactor
    • split_mode needs to be set to true (it basically splits the layers of the model, training a lower and upper part per step and offloading the other part to CPU RAM)
    • I got good results with network_dim = 64 and network_alpha = 64
    • fp8 base needs to stay true as well as gradient_dtype and save_dtype at bf16 (at least I never changed that; although I used different settings for SDXL in the past)
  • I had to remove the Flux Train Validate"-nodes and "Preview Image"-nodes since they ran into an error (annyoingly late during the process when sample images were created) "!!! Exception during processing !!! torch.cat(): expected a non-empty list of Tensors"-error" and I was unable to find a fix
  • If you like you can use the configuration provided at the very end of this post
  • you can also use/train using captions; just place the txt-files with the same name as the image in the input-folder

Observations

  • Speed on a 3060 is about 9,5 seconds/iteration, hence 3.000 steps as proposed as the default here (which is ok for small datasets with about 10-20 pictures) is about 8 hours
  • you can get good results with 1.500 - 2.500 steps
  • VRAM stays well below 10GB
  • RAM consumption is/was quite high; 32 GB are barely enough if you have some other applications running; I limited usage to 28GB, and it worked; hence, if you have 28 GB free, it should run; it looks like there have been some recent updates that are optimized better, but I have not tested that yet in detail
  • I was unable to run 1024x1024 or even 768x768 due to RAM contraints (will have to check with recent updates); the same goes for ranks higher than 128. My guess is, that it will work on a 3060 / with 12 GB VRAM, but it will be slower
  • using split_mode reduces VRAM usage as described above at a loss of speed; since I have only PCIe 3.0 and PCIe 4.0 is double the speed, you will probaly see better speeds if you have fast RAM and PCIe 4.0 using the same card; if you have more VRAM, try to set split_mode to false and see if it works; should be a lot faster

Detailed steps (for Linux)

  • mkdir ComfyUI_training
  • cd ComfyUI_training/
  • mkdir training
  • mkdir training/input
  • mkdir training/output
  • git clone
  • cd ComfyUI/
  • python3.11 -m venv venv (depending on your installation it may also be python or python3 instead of python3.11)
  • source venv/bin/activate
  • pip install -r requirements.txt
  • pip install protobuf
  • cd custom_nodes/
  • cd ..
  • systemd-run --scope -p MemoryMax=28000M --user nice -n 19 python3 main.py --lowvram (you can also just run "python3 main.py", but using this command you limit memory usage and prio on CPU)
  • open your browser and go to
  • Click on "Manager" in the menu
  • go to "Custom Nodes Manager"
  • search for "ComfyUI Flux Trainer" (white spaces!) and install the package from Author "kijai" by clicking on "install"
  • click on the "restart" button and agree on rebooting so ComfyUI restarts
  • reload the browser page
  • click on "Load" in the menu
  • navigate to ../ComfyUI_training/ComfyUI/custom_nodes/ComfyUI-FluxTrainer/examples and select/open the file "flux_lora_train_example_01.json"

you can also use the "workflow_adafactor_splitmode_dimalpha64_3000steps_low10GBVRAM.json" configuration I provided here)

  • you will get a Message saying "Warning: Missing Node Types"
  • go to Manager and click "Install Missing Custom Nodes"
  • install the missing packages just like you did for "ComfyUI Flux Trainer" by clicking on the respective "install"-buttons; at the time of writing this it was two packages ("rgthree's ComfyUI Nodes" by "rgthree" and "KJNodes for ComfyUI" by "kijai"
  • click on the "restart" button and agree on rebooting so ComfyUI restarts
  • reload the browser page
  • download "flux1-dev-fp8.safetensors" from and put it into ".../ComfyUI_training/ComfyUI/models/unet/
  • download "t5xxl_fp8_e4m3fn.safetensors" from and put it into ".../ComfyUI_training/ComfyUI/models/clip/"
  • download "clip_l.safetensors" from and put it into ".../ComfyUI_training/ComfyUI/models/clip/"
  • download "ae.safetensors" from and put it into ".../ComfyUI_training/ComfyUI/models/vae/"
  • reload the browser page (ComfyUI)

if you used the "workflow_adafactor_splitmode_dimalpha64_3000steps_low10GBVRAM.json" I provided you can proceed till the end / "Queue Prompt" step here after you put your images into the correct folder; here we use the "../ComfyUI_training/training/input/" created above

  • find the "FluxTrain ModelSelect"-node and select:

=> flux1-dev-fp8.safetensors for "transformer"

=> ae.safetensors for vae

=> clip_l.safetensors for clip_c

=> t5xxl_fp8_e4m3fn.safetensors for t5

  • find the "Init Flux LoRA Training"-node and select:

=> true for split_mode (this is the crucial setting for low VRAM / 12 GB VRAM)

=> 64 for network_dim

=> 64 for network_alpha

=> define a output-path for your LoRA by putting it into outputDir; here we use "../training/output/"

=> define a prompt for sample images in the text box for sample prompts (by default it says something like "cute anime girl blonde..."; this will only be relevant if that works for you; see below)

  • find the "Optimizer Config Adafactor"-node and connect the "optimizer_settings" output with the "optimizer_settings" of the "Init Flux LoRA Training"-node
  • find the three "TrainDataSetAdd"-nodes and remove the two ones with 768 and 1024 for width/height by clicking on their title and pressing the remove/DEL key on your keyboard
  • add the path to your dataset (a folder with the images you want to train on) in the remaining "TrainDataSetAdd"-node (by default it says "../datasets/akihiko_yoshida_no_caps"; if you specify an empty folder you will get an error!); here we use "../training/input/"
  • define a triggerword for your LoRA in the "TrainDataSetAdd"-node; for example "loratrigger" (by default it says "akihikoyoshida")
  • remove all "Flux Train Validate"-nodes and "Preview Image"-nodes (if present I get an error later in training)
  • click on "Queue Prompt"
  • once training finishes, your output is in ../ComfyUI_training/training/output/ (4 files for 4 stages with different steps)

All credits go to the creators of

===== save as workflow_adafactor_splitmode_dimalpha64_3000steps_low10GBVRAM.json =====

https://pastebin.com/CjDyMBHh

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here