Stable Diffusion XLでLoRAを使う

今回は一つのキャラクターのLoRAを作る場合を考える

LoRAの素材用意

LoRAの素材写真を集めることがよいファインチューニングで一番大事である。

そのためのコツをいくつか紹介する

他のキャラクターが映らないようにする
様々な角度を用意する
背景はすべて空白or背景が様々に異なる
- 背景はシンプルな方が良い
サイズは1024×1024
- 512×512のUpScalerなどを使うのも手
- 全身は756×1244
network rankは16~8
LoRAで覚えさせたいキャラクターの一部である特徴はキャプションから外す
高解像度の素材では、20枚程度にとどめたほうが良い結果になる
学習の基本は「学習率低め、STEP数は多め」
- alpha = 1
- 学習率1e4
Adafactorというオプティマイザの使用
- 後で調べる

キャプション

カメラアングルについて
- https://kindanai.com/prompt-compare-angle/
目線について
- https://romptn.com/article/7410
表情について
- https://hikari-aiart.com/stable-diffusion-facial-expression-pronpt/

画像を拡大する

Stable Diffusion Upscaler、Multi Diffusionは、Stable Diffusionで生成された画像の高解像度化のみにしか用いることができない。

GIMPを用いて512->1024に変更した。

今回は素材画像の高解像度化を行いたいので、

画像を訓練する

環境構築

sudo apt install nvidia-cuda-toolkit
nvcc -V

すると、以下のような結果がでるのでハイライトの部分を確認

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

terminal上で以下のコマンドを入力する(以下はCUDA11)

sd-scriptを用いる

git clone https://github.com/kohya-ss/sd-scripts.git
cd sd-scripts

python3 -m venv venv
source venv/bin/activate

pip install torch==2.1.2 torchvision==0.16.2 --index-url https://download.pytorch.org/whl/cu118
pip install --upgrade -r requirements.txt
pip install xformers==0.0.23.post1 --index-url https://download.pytorch.org/whl/cu118

accelerate config

accelerate configを入力したあとは以下のように質問に答える

Answers to accelerate config:

- This machine
- No distributed training
- NO
- NO
- NO
- all
- fp16

訓練

事前準備

画像を20枚ほど用意
それぞれの画像に対応するキャプションファイルを作成
- 画像ファイル名と同じファイル名でテキストファイルにキャプションを書き込む
configファイルを用意

[general]
enable_bucket = true                        # Aspect Ratio Bucketingを使うか否か

[[datasets]]
resolution = 1024                            # 学習解像度
batch_size = 2                              # バッチサイズ

  [[datasets.subsets]]
  image_dir = '/home/kazuki/Documents/Shimakosaku/training/variational_face/Data'
  caption_extension = '.txt' 
  num_repeats = 10

9行目は自分の素材画像を置いたディレクトリの位置

コマンド

source venv/bin/activate

をしたあとに、

以下のコマンドを入力

accelerate launch --num_cpu_threads_per_process 1 sdxl_train_network.py --pretrained_model_name_or_path=/home/kazuki/Documents/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1.0.safetensors --dataset_config=/home/kazuki/Documents/Shimakosaku/training/variational_face/setting.toml --output_dir=/home/kazuki/Documents/Shimakosaku/training/variational_face/result --output_name=variational_face --save_model_as=safetensors --prior_loss_weight=1.0 --max_train_steps=400 --learning_rate=1e-4 --optimizer_type="AdamW8bit" --xformers --mixed_precision="fp16" --cache_latents --gradient_checkpointing --save_every_n_epochs=1 --network_module=networks.lora --no_half_vae --cache_text_encoder_outputs --network_train_unet_only --max_train_epochs 20 --logging_dir=logs --network_dim=12

pretrained_model_name_or_pathにはダウンロードしてあるstable diffusionのモデルが置かれているパス

dataset_configには自分がついさっき用意した設定ファイル

output_dirは保存先のディレクトリ（任意に設定可能）

output_nameは保存するファイルの名前（任意に設定可能）

LoRAの素材用意

キャプション

画像を拡大する

画像を訓練する

環境構築

訓練

事前準備

コマンド

参考文献

Comments

コメントを残すコメントをキャンセル

Stable Diffusion XLでLoRAを使う

LoRAの素材用意

キャプション

画像を拡大する

画像を訓練する

環境構築

訓練

事前準備

コマンド

参考文献

Comments

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル