Grid Geometric Classifier — Sliding Window VAE Analysis

A 638K parameter classifier trained on 38 synthetic geometric primitives that reads the intrinsic manifold structure of diffusion model VAE latent spaces. This tool enables geometric fingerprinting of any VAE by extracting and classifying local geometric patterns at multiple scales.

Key Finding

Diffusion model VAEs learn consistent geometric structure — not noise.

VAE	Dominant Geometry	Confidence
SD 1.5	Saddle (57%) + Pentachoron (35%)	0.880
SDXL	Saddle (53%) + Pentachoron (30%)	0.874
Flux.1	Pentachoron (31%) + Plane (29%) + Saddle (15%)	0.878
Flux.2	Saddle (70%) + Pentachoron (21%)	0.875

Flux.1 is the geometric outlier — it learned a richer, more diverse latent geometry while SD 1.5, SDXL, and Flux.2 converged to saddle-dominated hyperbolic manifolds.

Architecture

Input: (B, 8, 16, 16) binary voxel grid
  ↓
Patch Decomposition: 2×4×4 patches → 64 patches per volume
  ↓
Shared Patch Encoder (MLP + handcrafted features)
  ↓
3× Cross-Attention Blocks (patches attend to each other)
  ↓
Global Pool + Classification Heads
  ↓
Output: 38 classes + dimension (0-3D) + curvature type

Parameters: 638,387
Patch Grid: 4×4×4 macro grid of 2×4×4 local patches
Attention: 8 heads, 128 embed dim, 3 layers

38 Geometric Classes

Dimension	Flat	Curved
0D	point	—
1D	line_x, line_y, line_z, line_diag, cross, l_shape, collinear	arc, helix
2D	triangle_xy, triangle_xz, triangle_3d, square_xy, square_xz, rectangle, coplanar, plane	circle, ellipse, disc
3D	tetrahedron, pyramid, pentachoron, cube, cuboid, triangular_prism, octahedron	sphere, hemisphere, cylinder, cone, capsule, torus, shell, tube, bowl, saddle

Curvature Types: none, convex, concave, cylindrical, conical, toroidal, hyperbolic, helical

Quick Start

import torch
from cell2_model import PatchCrossAttentionClassifier, CLASS_NAMES, CURVATURE_NAMES

# Load classifier
model = PatchCrossAttentionClassifier(n_classes=38)
model.load_state_dict(torch.load('best_vae_ca_classifier.pt', map_location='cpu'))
model.eval()

# Classify a binary voxel grid
grid = torch.zeros(1, 8, 16, 16)  # Your binarized patch
with torch.no_grad():
    out = model(grid)
    
pred_class = CLASS_NAMES[out['class_logits'].argmax()]
pred_dim = out['dim_logits'].argmax().item()
is_curved = out['is_curved_pred'].squeeze() > 0
pred_curv = CURVATURE_NAMES[out['curv_type_logits'].argmax()]

print(f"Shape: {pred_class}, Dimension: {pred_dim}D, Curved: {is_curved}, Curvature: {pred_curv}")

Full VAE Analysis Pipeline

# Cell 1: Shape generator (training data)
from cell1_shape_generator import ShapeGenerator, CLASS_NAMES, NUM_CLASSES

# Cell 2: Model architecture
from cell2_model import PatchCrossAttentionClassifier

# Cell 3: Training (if retraining)
# python cell3_trainer.py

# Cell 4: Multi-scale extraction from VAE latents
from cell4_vae_pipeline import MultiScaleExtractor, ExtractionConfig

# Cell 5: Single VAE analysis
# python cell5_quad_vae_geometric_analysis.py

# Cell 6: Multi-VAE comparison
# python cell6_quad_vae_analysis_mega_liminal.py

Extraction Pipeline

The pipeline extracts geometric structure from VAE latents at multiple scales:

config = ExtractionConfig(
    scales=[(16, 64, 64), (8, 32, 32), (8, 16, 16), (4, 8, 8)],
    canonical_shape=(8, 16, 16),
    confidence_threshold=0.6,
    overlap=0.5,
)

extractor = MultiScaleExtractor(classifier, config)
result = extractor.extract_from_latent(vae_latent, channel_groups)

# Returns: raw_annotations, deviance_annotations
# Each annotation contains: class, confidence, scale, dimension, curvature, location

Two extraction modes:

Raw: Treat channels as depth dimension directly
Deviance: Compute inter-channel differences, classify the relational geometry

Results: Why Saddles?

Saddle points dominate because they're optimal for generative models:

Steering capacity: Small noise changes push trajectories toward different modes
Mode separation: Unstable directions at saddles = decision boundaries between outputs
Exponential coverage: Hyperbolic geometry packs more representations per dimension

The VAE didn't learn saddles by accident — it's the natural geometry for a diffusion decoder's latent manifold.

Flux.1's difference: More planar cross-sections (29%) and balanced primitives suggest a different optimization path. The batch norm weights in Flux.2 (bn.running_var, bn.running_mean) may be collapsing this richer structure back to hyperbolic.

Per-Scale Findings

Scale	Dominant Class	Interpretation
L0 (16×64×64)	Pentachoron 73%	Macro-level 5-simplex structure
L1 (8×32×32)	Pentachoron 60%	Transitional
L2 (8×16×16)	Plane 40%	Mid-level planar cross-sections
L3 (4×8×8)	Saddle 59%	Local hyperbolic curvature

The hierarchy: pentachorons organize the global structure, saddles dominate locally.

Files

File	Description
`best_vae_ca_classifier.pt`	Trained classifier weights (2.58 MB)
`cell1_shape_generator.py`	38-class synthetic shape generator
`cell2_model.py`	PatchCrossAttentionClassifier architecture
`cell3_trainer.py`	Training pipeline with augmentation
`cell4_vae_pipeline.py`	Multi-scale batched extraction
`cell5_quad_vae_geometric_analysis.py`	Single VAE analysis script
`cell6_quad_vae_analysis_mega_liminal.py`	Multi-VAE comparison script
`liminal.zip`	Test image dataset (957 images)
`mega_liminal_captioned.zip`	Extended dataset (2074 images)
`multi_vae_comparison_*.json`	Raw comparison results

Training

The classifier was trained on 76,000 synthetic shapes (2000 per class × 38 classes) generated procedurally:

gen = ShapeGenerator(seed=42)
train_data = gen.generate_dataset(n_per_class=2000, seed=42)

Training config:

60 epochs, batch size 1024
AdamW, lr=3e-3, cosine annealing
Multi-task loss: classification + dimension + curved + curvature type
Augmentation: voxel dropout, boundary addition, small translation

Citation

@misc{abstractphil2025geometric,
  author = {AbstractPhil},
  title = {Grid Geometric Classifier: Reading VAE Latent Manifold Structure},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/AbstractPhil/grid-geometric-classifier-sliding-proto}
}

Related Work

This classifier is part of a broader research program on geometric deep learning with pentachoron structures — replacing learned embeddings with navigable k-simplex lattices. Key results include:

85% MNIST with ~750 parameters (geometry encodes structure, learning only navigates)
72KB ImageNet classification head (parameter efficiency through geometric priors)
Crystalline vocabulary systems representing tokens as 5-vertex structures

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track