GeoVocab Patch Maker

A geometric vocabulary extractor that reads structural properties from latent patches — and proved that text carries the same geometric structure as images.

This is a two-tier gated geometric transformer trained on 27 geometric primitives (point through channel) in 8×16×16 voxel grids. It extracts 17-dimensional gate vectors (explicit geometric properties) and 256-dimensional patch features (learned representations) from any compatible latent input.

What It Does

Takes an (8, 16, 16) tensor — originally voxel grids, but proven to work on adapted FLUX VAE latents and text-derived latent patches — and produces per-patch geometric descriptors:

from geometric_model import load_from_hub, extract_features

model = load_from_hub()
gate_vectors, patch_features = extract_features(model, patches)
# gate_vectors:   (N, 64, 17)  — interpretable geometric properties
# patch_features: (N, 64, 256) — learned representations

Gate Vector Anatomy (17 dimensions)

Dims	Property	Type	Meaning
0–3	dimensionality	softmax(4)	0D point, 1D line, 2D surface, 3D volume
4–6	curvature	softmax(3)	rigid, curved, combined
7	boundary	sigmoid(1)	partial fill (surface patch)
8–10	axis_active	sigmoid(3)	which axes have spatial extent
11–12	topology	softmax(2)	open vs closed (neighbor-based)
13	neighbor_density	sigmoid(1)	normalized neighbor count
14–16	surface_role	softmax(3)	isolated, boundary, interior

Dimensions 0–10 are local (intrinsic to each patch, no cross-patch info). Dimensions 11–16 are structural (relational, computed after attention sees neighborhood context).

Architecture

(8, 16, 16) input
    ↓
PatchEmbedding3D → (B, 64, 64)         # 64 patches of 32 voxels each
    ↓
Stage 0: Local Encoder + Gate Heads     # dims, curvature, boundary, axes
    ↓
proj([embedding, local_gates]) → (B, 64, 128)
    ↓
Stage 1: Bootstrap Transformer ×2       # standard attention with local context
    ↓
Stage 1.5: Structural Gate Heads        # topology, neighbors, surface role
    ↓
Stage 2: Geometric Transformer ×2       # gated attention modulated by all 17 gates
    ↓
Stage 3: Classification Heads           # 27-class shape recognition

The geometric transformer blocks use gate-modulated attention: Q and K are projected from [hidden, all_gates], V is multiplicatively gated, and per-head compatibility scores are computed from gate interactions.

The Rosetta Stone Discovery

This model was used as the analyzer in the GeoVAE Proto experiments, which proved that text descriptions produce 2.5–3.5× stronger geometric differentiation than actual images when projected through a lightweight VAE into this model's patch space.

Source	patch_feat discriminability
FLUX images (49k)	+0.020
flan-t5-small text	+0.053
bert-base-uncased text	+0.053
bert-beatrix-2048 text	+0.050

Three architecturally different text encoders converge to ±5% of each other — the geometric structure is in the language, not the encoder. This model reads it.

Training

Trained on procedurally generated multi-shape superposition grids (2–4 overlapping geometric primitives per sample, 27 shape classes). Two-tier gate supervision with ground truth computed from voxel analysis:

Local gates: dimensionality from axis extent, curvature from fill ratio, boundary from partial occupancy
Structural gates: topology from 3D convolution neighbor counting, surface role from neighbor density thresholds

200 epochs, achieving 93.8% recall on shape classification with explicit geometric property prediction as auxiliary objectives.

Files

File	Description
`geometric_model.py`	Standalone model + `load_from_hub()` + `extract_features()`
`model.pt`	Pretrained weights (epoch 200)

Usage

import torch
from geometric_model import SuperpositionPatchClassifier, load_from_hub, extract_features

# Load pretrained
model = load_from_hub()

# From any (8, 16, 16) source
patches = torch.randn(16, 8, 16, 16).cuda()
gate_vectors, patch_features = extract_features(model, patches)

# Or full output dict
out = model(patches)
out["local_dim_logits"]       # (B, 64, 4)  dimensionality
out["local_curv_logits"]      # (B, 64, 3)  curvature
out["struct_topo_logits"]     # (B, 64, 2)  topology
out["patch_features"]         # (B, 64, 128) learned features
out["patch_shape_logits"]     # (B, 64, 27) shape classification

AbstractPhil/geovae-proto — The Rosetta Stone experiments (text→geometry VAEs)
AbstractPhil/synthetic-characters — 49k FLUX-generated character dataset
AbstractPhil/grid-geometric-multishape — Original training repo with checkpoints

Citation

Geometric deep learning research by AbstractPhil. The model demonstrates that geometric structure is a universal language bridging text and visual modalities — symbolic association through geometric language.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train AbstractPhil/geovocab-patch-maker

Collection including AbstractPhil/geovocab-patch-maker

Flagships

Collection

My flagship models that actually work or are the best I have capable from a category currently. • 12 items • Updated about 10 hours ago

AbstractPhil
/

geovocab-patch-maker