Complexity-Diffusion VAE
Variational Autoencoder for Complexity-Diffusion image generation pipeline.
Architecture
89M parameters | 256x256 images | 4-channel latent space
Encoder
Compresses 256x256x3 images to 32x32x4 latents (8x spatial compression).
Decoder
Loss Function
Where:
- $\mathcal{L}_{\text{recon}} = |x - \hat{x}|_1$ (L1 reconstruction)
- $D_{KL}$ regularizes latent to $\mathcal{N}(0, I)$
- $\mathcal{L}_{\text{perceptual}}$ uses VGG features
Config
| Parameter | Value |
|---|---|
| Image size | 256x256 |
| Latent dim | 4 |
| Base channels | 128 |
| Channel mult | [1, 2, 4, 4] |
| Res blocks | 2 |
Usage
from safetensors.torch import load_file
from complexity_diffusion.vae import ComplexityVAE
# Load
state_dict = load_file("model.safetensors")
vae = ComplexityVAE(image_size=256, base_channels=128, latent_dim=4)
vae.load_state_dict(state_dict)
# Encode
latents = vae.encode(images) # [B, 4, 32, 32]
# Decode
reconstructed = vae.decode(latents) # [B, 3, 256, 256]
Training
Trained on WikiArt (81K images) for 15K steps with:
- Batch size: 16
- Learning rate: 1e-4
- Mixed precision: bf16
Training Curves
Part of Complexity Deep Ecosystem
This VAE is designed to work with the Complexity-Diffusion pipeline, leveraging:
- INL Dynamics for stable latent space training
- Token-Routed architecture for efficient processing
Links
License
CC BY-NC 4.0 - Attribution-NonCommercial
Commercial use requires explicit permission from the author.
- Downloads last month
- 25
