Model Details

This is an Experimental Conversion of Noobai v-pred to Rectified Flow target, using EQ-VAE.

Model Description

Model is a continuation of Noobai training on same dataset, with new diffusion target and few improvements to existing tag approach*. Given the scope of this undertaking, this is only an experimental version, utilizing only subset of full original data.

Current state of model is acceptable for general and research purposes, like Image Generation, Finetuning, LoRA Training, and others. We will provide example settings for common style training approach below.

Generally, model is fairly stable, but can suffer certain drawbacks coming from lack of training, like malformed understanding of certain tags and colors in our tests, but are not, or rarely observed, in normal prompts in practice.

Developed by: Cabal Research (Bluvoll, Anzhc)
Funded by: Community, Bluvoll
License: fair-ai-public-license-1.0-sd
Finetuned from model: Noobai V-pred 1.0

*Removed massive(in some cases over 6 tags) keep token`, introduced "protected tags", which allows for indiscriminate shuffling, while keeping tokens undroppable.

Bias and Limitations

Due to low budget(~150$ total), we have not been successful in fully stabilizing the model, so you can and will encounter some issues that we were not able to find in our tests, or were not able to address. That wouldn't be too different from the performance of other base models, but your mileage will vary.

Most biases of official dataset will apply(Blue Archive, etc.).

Some color biases were not reduced, or became more apparent due to some of the quirks in convergence of rectified flow from Noobai v-pred. We did our best to mitigate it by training a bit further, but you will encounter them in certain strong color prompts. Some colors are in unstable state and are hard to achieve due to unfortunate state of their convergence at current step (Black and dark in particular, for example, dark will not generate dark image, you need to prompt dark theme for that.)

Model Output Examples

Recommendations

Inference

Comfy

(Workflow is available alongside model in repo)

Same as your normal inference, but with addition of SD3 sampling node, and optional conv padding node, which is required for correct edges(VAE and model has been trained with padded convs in vae, to allow for easier edge content learning.)

Recommended Parameters:
Sampler: Euler, Euler A, DPM++ SDE, etc.
Steps: 20-28
CFG: 5-7
Schedule: Normal/Simple
Positive Quality Tags: masterpiece, best quality
Negative Tags: worst quality, normal quality, bad anatomy

A1111 WebUI

Recommended WebUI: ReForge - has native support for both RF, and conv padding. Possible WebUIs: ErsatzForge - Has native support for RF, but written in a hardcoded name-checking way, so will not work out of the box. I'm also not able to verify if approach is correct, but it worked after adding the model name to list of checked.

How to use in ReForge:

(ignore Sigma max field at the top, this is not used in RF)

Support for RF in ReForge is being implemented through a built-in extension:

Set parameters to that, and you're good to go.

How to turn on padding:

Turn this on, save, FULLY RELOAD the UI, by closing console and launching it again. This is required. Setting does not change until UI is fully reloaded. Recommended Parameters:
Sampler: Euler A Comfy RF, Euler, DPM++ SDE Comfy, etc. ALL VARIANTS MUST BE RF OR COMFY, IF AVAILABLE. In ComfyUI routing is automatic, but not in the case of WebUI.
Steps: 20-28
CFG: 5-7
Schedule: Normal/Simple
Positive Quality Tags: masterpiece, best quality
Negative Tags: worst quality, normal quality, bad anatomy

ADETAILER FIX FOR RF: By default, Adetailer discards Advanced Model Sampling extension, which breaks RF. You need to add AMS to this part of settings:

Add: advanced_model_sampling_script,advanced_model_sampling_script_backported to there.

If that does not work, go into adetailer extension, find args.py, open it, replace _builtin_scripts like this:

Here is a copypaste for easy copy:

_builtin_script = (
    "advanced_model_sampling_script",
    "advanced_model_sampling_script_backported",
    "hypertile_script",
    "soft_inpainting",
)

Training

Model Composition

(Relative to base it's trained from)

Unet: Same CLIP L: Same, Frozen CLIP G: Same, Frozen VAE: Changed, new VAE - EQB7 w/conv padding.

Training Details

(Base / quality-tuned)

Samples seen(unbatched steps): ~2kk / ~400k
Learning Rate: 2e-5 / 2e-5
Effective Batch size: 1280 (40 real * 4 accum * 8 devices) / 1280 (40 * 4 * 8)
Precision: Full BF16
Optimizer: AdamW8bit with Kahan Summation
Weight Decay: 0.01
Schedule: Constant with warmup
Timestep Sampling Strategy: Logit-Normal (sometimes referred to as Lognorm), Shift 2.5
Text Encoders: Frozen
Keep Token: False (Used "Protected Tags" instead), all tags are shuffled.
Tag Dropout: 10%
Uncond Dropout: 10%
Optimal Transport: True

VAE Conv Padding: True
VAE Shift: 0.1726
VAE Scale: 0.1280

(Computed against ~80k of anime images prior to training. Scale is +- same as in base SDXL VAE (negligible difference), but drastically different Shift, 0.1726 vs ~1.60)

Training Data

"Original" Noobai data subset of ~2 million samples, then WAF* subset of ~20 thousand for quality tuning of this intermediate checkpoint. Tags were not changed, data was taken "as-is", as per the wishes of community.

*WAF - Weighted Aesthetic Filter, our recent solution for filtering data based on input of multiple scoring models at the same time(at varied weight, adapted for their specific prediction classes/range), including specialized models for specific content. High general threshold was used, resulting in top ~5% of data being selected for quality tuning.

LoRA Trainig

Current base is highly trainable. We are mostly style trainers and finetuners, so we would give you current recommendation for that, from which you can derive settings you find reasonable based on your experience with other model types.

My current style training settings (Anzhc):

Learning Rate: tested up to 7.5e-4, LoRA is still stable at that. Somehow. Prolonged training(300+ images for 50 epochs) at that LR did not result in degradation, likely can be pushed even further, likely up to 1e-3, at least at the batch im using.
Batch Size: 144 (6 real * 24 accum), using SGA(Stochastic Gradient Accumulation) - without SGA I probably would lower accum to 4-8.
Optimizer: Adamw8bit with Kahan summation
Schedule: ReREX (Use REX for simplicity)
Precision: Full BF16
Weight Decay: 0.02
Timestep Sampling Strategy: Logit-Normal, Shift 2.5 (Closest to what i use result-wise)

Dim/Alpha/Conv/Alpha: 24/24/24/24 (Lycoris/Locon)

Text Encoders: Frozen

Optimal Transport: True

Expected Dataset Size: 100 images (Can be even 10, but balance with repeats to roughly this target.)
Epochs: 50 (Yes, even with 10 repeats. 500 effective epochs works just fine and doesn't break from my tests.)