Grid Geometric Classifier β Sliding Window VAE Analysis
A 638K parameter classifier trained on 38 synthetic geometric primitives that reads the intrinsic manifold structure of diffusion model VAE latent spaces. This tool enables geometric fingerprinting of any VAE by extracting and classifying local geometric patterns at multiple scales.
Key Finding
Diffusion model VAEs learn consistent geometric structure β not noise.
| VAE | Dominant Geometry | Confidence |
|---|---|---|
| SD 1.5 | Saddle (57%) + Pentachoron (35%) | 0.880 |
| SDXL | Saddle (53%) + Pentachoron (30%) | 0.874 |
| Flux.1 | Pentachoron (31%) + Plane (29%) + Saddle (15%) | 0.878 |
| Flux.2 | Saddle (70%) + Pentachoron (21%) | 0.875 |
Flux.1 is the geometric outlier β it learned a richer, more diverse latent geometry while SD 1.5, SDXL, and Flux.2 converged to saddle-dominated hyperbolic manifolds.
Architecture
Input: (B, 8, 16, 16) binary voxel grid
β
Patch Decomposition: 2Γ4Γ4 patches β 64 patches per volume
β
Shared Patch Encoder (MLP + handcrafted features)
β
3Γ Cross-Attention Blocks (patches attend to each other)
β
Global Pool + Classification Heads
β
Output: 38 classes + dimension (0-3D) + curvature type
Parameters: 638,387
Patch Grid: 4Γ4Γ4 macro grid of 2Γ4Γ4 local patches
Attention: 8 heads, 128 embed dim, 3 layers
38 Geometric Classes
| Dimension | Flat | Curved |
|---|---|---|
| 0D | point | β |
| 1D | line_x, line_y, line_z, line_diag, cross, l_shape, collinear | arc, helix |
| 2D | triangle_xy, triangle_xz, triangle_3d, square_xy, square_xz, rectangle, coplanar, plane | circle, ellipse, disc |
| 3D | tetrahedron, pyramid, pentachoron, cube, cuboid, triangular_prism, octahedron | sphere, hemisphere, cylinder, cone, capsule, torus, shell, tube, bowl, saddle |
Curvature Types: none, convex, concave, cylindrical, conical, toroidal, hyperbolic, helical
Quick Start
import torch
from cell2_model import PatchCrossAttentionClassifier, CLASS_NAMES, CURVATURE_NAMES
# Load classifier
model = PatchCrossAttentionClassifier(n_classes=38)
model.load_state_dict(torch.load('best_vae_ca_classifier.pt', map_location='cpu'))
model.eval()
# Classify a binary voxel grid
grid = torch.zeros(1, 8, 16, 16) # Your binarized patch
with torch.no_grad():
out = model(grid)
pred_class = CLASS_NAMES[out['class_logits'].argmax()]
pred_dim = out['dim_logits'].argmax().item()
is_curved = out['is_curved_pred'].squeeze() > 0
pred_curv = CURVATURE_NAMES[out['curv_type_logits'].argmax()]
print(f"Shape: {pred_class}, Dimension: {pred_dim}D, Curved: {is_curved}, Curvature: {pred_curv}")
Full VAE Analysis Pipeline
# Cell 1: Shape generator (training data)
from cell1_shape_generator import ShapeGenerator, CLASS_NAMES, NUM_CLASSES
# Cell 2: Model architecture
from cell2_model import PatchCrossAttentionClassifier
# Cell 3: Training (if retraining)
# python cell3_trainer.py
# Cell 4: Multi-scale extraction from VAE latents
from cell4_vae_pipeline import MultiScaleExtractor, ExtractionConfig
# Cell 5: Single VAE analysis
# python cell5_quad_vae_geometric_analysis.py
# Cell 6: Multi-VAE comparison
# python cell6_quad_vae_analysis_mega_liminal.py
Extraction Pipeline
The pipeline extracts geometric structure from VAE latents at multiple scales:
config = ExtractionConfig(
scales=[(16, 64, 64), (8, 32, 32), (8, 16, 16), (4, 8, 8)],
canonical_shape=(8, 16, 16),
confidence_threshold=0.6,
overlap=0.5,
)
extractor = MultiScaleExtractor(classifier, config)
result = extractor.extract_from_latent(vae_latent, channel_groups)
# Returns: raw_annotations, deviance_annotations
# Each annotation contains: class, confidence, scale, dimension, curvature, location
Two extraction modes:
- Raw: Treat channels as depth dimension directly
- Deviance: Compute inter-channel differences, classify the relational geometry
Results: Why Saddles?
Saddle points dominate because they're optimal for generative models:
- Steering capacity: Small noise changes push trajectories toward different modes
- Mode separation: Unstable directions at saddles = decision boundaries between outputs
- Exponential coverage: Hyperbolic geometry packs more representations per dimension
The VAE didn't learn saddles by accident β it's the natural geometry for a diffusion decoder's latent manifold.
Flux.1's difference: More planar cross-sections (29%) and balanced primitives suggest a different optimization path. The batch norm weights in Flux.2 (bn.running_var, bn.running_mean) may be collapsing this richer structure back to hyperbolic.
Per-Scale Findings
| Scale | Dominant Class | Interpretation |
|---|---|---|
| L0 (16Γ64Γ64) | Pentachoron 73% | Macro-level 5-simplex structure |
| L1 (8Γ32Γ32) | Pentachoron 60% | Transitional |
| L2 (8Γ16Γ16) | Plane 40% | Mid-level planar cross-sections |
| L3 (4Γ8Γ8) | Saddle 59% | Local hyperbolic curvature |
The hierarchy: pentachorons organize the global structure, saddles dominate locally.
Files
| File | Description |
|---|---|
best_vae_ca_classifier.pt |
Trained classifier weights (2.58 MB) |
cell1_shape_generator.py |
38-class synthetic shape generator |
cell2_model.py |
PatchCrossAttentionClassifier architecture |
cell3_trainer.py |
Training pipeline with augmentation |
cell4_vae_pipeline.py |
Multi-scale batched extraction |
cell5_quad_vae_geometric_analysis.py |
Single VAE analysis script |
cell6_quad_vae_analysis_mega_liminal.py |
Multi-VAE comparison script |
liminal.zip |
Test image dataset (957 images) |
mega_liminal_captioned.zip |
Extended dataset (2074 images) |
multi_vae_comparison_*.json |
Raw comparison results |
Training
The classifier was trained on 76,000 synthetic shapes (2000 per class Γ 38 classes) generated procedurally:
gen = ShapeGenerator(seed=42)
train_data = gen.generate_dataset(n_per_class=2000, seed=42)
Training config:
- 60 epochs, batch size 1024
- AdamW, lr=3e-3, cosine annealing
- Multi-task loss: classification + dimension + curved + curvature type
- Augmentation: voxel dropout, boundary addition, small translation
Citation
@misc{abstractphil2025geometric,
author = {AbstractPhil},
title = {Grid Geometric Classifier: Reading VAE Latent Manifold Structure},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/AbstractPhil/grid-geometric-classifier-sliding-proto}
}
Related Work
This classifier is part of a broader research program on geometric deep learning with pentachoron structures β replacing learned embeddings with navigable k-simplex lattices. Key results include:
- 85% MNIST with ~750 parameters (geometry encodes structure, learning only navigates)
- 72KB ImageNet classification head (parameter efficiency through geometric priors)
- Crystalline vocabulary systems representing tokens as 5-vertex structures
License
MIT