Complexity-Diffusion VAE

Variational Autoencoder for Complexity-Diffusion image generation pipeline.

Architecture

89M parameters | 256x256 images | 4-channel latent space

Encoder

$z = \mathcal{E}(x) \in \mathbb{R}^{32 \times 32 \times 4}$

Compresses 256x256x3 images to 32x32x4 latents (8x spatial compression).

Decoder

$\hat{x} = \mathcal{D}(z) \in \mathbb{R}^{256 \times 256 \times 3}$

Loss Function

$\mathcal{L} = \mathcal{L}_{\text{recon}} + \beta \cdot D_{KL}(q(z|x) \| p(z)) + \lambda \cdot \mathcal{L}_{\text{perceptual}}$

Where:

$\mathcal{L}_{\text{recon}} = |x - \hat{x}|_1$ (L1 reconstruction)
$D_{KL}$ regularizes latent to $\mathcal{N}(0, I)$
$\mathcal{L}_{\text{perceptual}}$ uses VGG features

Config

Parameter	Value
Image size	256x256
Latent dim	4
Base channels	128
Channel mult	[1, 2, 4, 4]
Res blocks	2

Usage

from safetensors.torch import load_file
from complexity_diffusion.vae import ComplexityVAE

# Load
state_dict = load_file("model.safetensors")
vae = ComplexityVAE(image_size=256, base_channels=128, latent_dim=4)
vae.load_state_dict(state_dict)

# Encode
latents = vae.encode(images)  # [B, 4, 32, 32]

# Decode
reconstructed = vae.decode(latents)  # [B, 3, 256, 256]

Training

Trained on WikiArt (81K images) for 15K steps with:

Batch size: 16
Learning rate: 1e-4
Mixed precision: bf16

Training Curves

Part of Complexity Deep Ecosystem

This VAE is designed to work with the Complexity-Diffusion pipeline, leveraging:

INL Dynamics for stable latent space training
Token-Routed architecture for efficient processing

License

CC BY-NC 4.0 - Attribution-NonCommercial

Commercial use requires explicit permission from the author.

Downloads last month: -

Pacific-Prime
/

diffusion-vae