GeoVocab Patch Maker
A geometric vocabulary extractor that reads structural properties from latent patches β and proved that text carries the same geometric structure as images.
This is a two-tier gated geometric transformer trained on 27 geometric primitives (point through channel) in 8Γ16Γ16 voxel grids. It extracts 17-dimensional gate vectors (explicit geometric properties) and 256-dimensional patch features (learned representations) from any compatible latent input.
What It Does
Takes an (8, 16, 16) tensor β originally voxel grids, but proven to work on adapted FLUX VAE latents and text-derived latent patches β and produces per-patch geometric descriptors:
from geometric_model import load_from_hub, extract_features
model = load_from_hub()
gate_vectors, patch_features = extract_features(model, patches)
# gate_vectors: (N, 64, 17) β interpretable geometric properties
# patch_features: (N, 64, 256) β learned representations
Gate Vector Anatomy (17 dimensions)
| Dims | Property | Type | Meaning |
|---|---|---|---|
| 0β3 | dimensionality | softmax(4) | 0D point, 1D line, 2D surface, 3D volume |
| 4β6 | curvature | softmax(3) | rigid, curved, combined |
| 7 | boundary | sigmoid(1) | partial fill (surface patch) |
| 8β10 | axis_active | sigmoid(3) | which axes have spatial extent |
| 11β12 | topology | softmax(2) | open vs closed (neighbor-based) |
| 13 | neighbor_density | sigmoid(1) | normalized neighbor count |
| 14β16 | surface_role | softmax(3) | isolated, boundary, interior |
Dimensions 0β10 are local (intrinsic to each patch, no cross-patch info). Dimensions 11β16 are structural (relational, computed after attention sees neighborhood context).
Architecture
(8, 16, 16) input
β
PatchEmbedding3D β (B, 64, 64) # 64 patches of 32 voxels each
β
Stage 0: Local Encoder + Gate Heads # dims, curvature, boundary, axes
β
proj([embedding, local_gates]) β (B, 64, 128)
β
Stage 1: Bootstrap Transformer Γ2 # standard attention with local context
β
Stage 1.5: Structural Gate Heads # topology, neighbors, surface role
β
Stage 2: Geometric Transformer Γ2 # gated attention modulated by all 17 gates
β
Stage 3: Classification Heads # 27-class shape recognition
The geometric transformer blocks use gate-modulated attention: Q and K are projected from [hidden, all_gates], V is multiplicatively gated, and per-head compatibility scores are computed from gate interactions.
The Rosetta Stone Discovery
This model was used as the analyzer in the GeoVAE Proto experiments, which proved that text descriptions produce 2.5β3.5Γ stronger geometric differentiation than actual images when projected through a lightweight VAE into this model's patch space.
| Source | patch_feat discriminability |
|---|---|
| FLUX images (49k) | +0.020 |
| flan-t5-small text | +0.053 |
| bert-base-uncased text | +0.053 |
| bert-beatrix-2048 text | +0.050 |
Three architecturally different text encoders converge to Β±5% of each other β the geometric structure is in the language, not the encoder. This model reads it.
Training
Trained on procedurally generated multi-shape superposition grids (2β4 overlapping geometric primitives per sample, 27 shape classes). Two-tier gate supervision with ground truth computed from voxel analysis:
- Local gates: dimensionality from axis extent, curvature from fill ratio, boundary from partial occupancy
- Structural gates: topology from 3D convolution neighbor counting, surface role from neighbor density thresholds
200 epochs, achieving 93.8% recall on shape classification with explicit geometric property prediction as auxiliary objectives.
Files
| File | Description |
|---|---|
geometric_model.py |
Standalone model + load_from_hub() + extract_features() |
model.pt |
Pretrained weights (epoch 200) |
Usage
import torch
from geometric_model import SuperpositionPatchClassifier, load_from_hub, extract_features
# Load pretrained
model = load_from_hub()
# From any (8, 16, 16) source
patches = torch.randn(16, 8, 16, 16).cuda()
gate_vectors, patch_features = extract_features(model, patches)
# Or full output dict
out = model(patches)
out["local_dim_logits"] # (B, 64, 4) dimensionality
out["local_curv_logits"] # (B, 64, 3) curvature
out["struct_topo_logits"] # (B, 64, 2) topology
out["patch_features"] # (B, 64, 128) learned features
out["patch_shape_logits"] # (B, 64, 27) shape classification
Related
- AbstractPhil/geovae-proto β The Rosetta Stone experiments (textβgeometry VAEs)
- AbstractPhil/synthetic-characters β 49k FLUX-generated character dataset
- AbstractPhil/grid-geometric-multishape β Original training repo with checkpoints
Citation
Geometric deep learning research by AbstractPhil. The model demonstrates that geometric structure is a universal language bridging text and visual modalities β symbolic association through geometric language.
- Downloads last month
- -