GeoVocab Patch Maker

A geometric vocabulary extractor that reads structural properties from latent patches β€” and proved that text carries the same geometric structure as images.

This is a two-tier gated geometric transformer trained on 27 geometric primitives (point through channel) in 8Γ—16Γ—16 voxel grids. It extracts 17-dimensional gate vectors (explicit geometric properties) and 256-dimensional patch features (learned representations) from any compatible latent input.

What It Does

Takes an (8, 16, 16) tensor β€” originally voxel grids, but proven to work on adapted FLUX VAE latents and text-derived latent patches β€” and produces per-patch geometric descriptors:

from geometric_model import load_from_hub, extract_features

model = load_from_hub()
gate_vectors, patch_features = extract_features(model, patches)
# gate_vectors:   (N, 64, 17)  β€” interpretable geometric properties
# patch_features: (N, 64, 256) β€” learned representations

Gate Vector Anatomy (17 dimensions)

Dims Property Type Meaning
0–3 dimensionality softmax(4) 0D point, 1D line, 2D surface, 3D volume
4–6 curvature softmax(3) rigid, curved, combined
7 boundary sigmoid(1) partial fill (surface patch)
8–10 axis_active sigmoid(3) which axes have spatial extent
11–12 topology softmax(2) open vs closed (neighbor-based)
13 neighbor_density sigmoid(1) normalized neighbor count
14–16 surface_role softmax(3) isolated, boundary, interior

Dimensions 0–10 are local (intrinsic to each patch, no cross-patch info). Dimensions 11–16 are structural (relational, computed after attention sees neighborhood context).

Architecture

(8, 16, 16) input
    ↓
PatchEmbedding3D β†’ (B, 64, 64)         # 64 patches of 32 voxels each
    ↓
Stage 0: Local Encoder + Gate Heads     # dims, curvature, boundary, axes
    ↓
proj([embedding, local_gates]) β†’ (B, 64, 128)
    ↓
Stage 1: Bootstrap Transformer Γ—2       # standard attention with local context
    ↓
Stage 1.5: Structural Gate Heads        # topology, neighbors, surface role
    ↓
Stage 2: Geometric Transformer Γ—2       # gated attention modulated by all 17 gates
    ↓
Stage 3: Classification Heads           # 27-class shape recognition

The geometric transformer blocks use gate-modulated attention: Q and K are projected from [hidden, all_gates], V is multiplicatively gated, and per-head compatibility scores are computed from gate interactions.

The Rosetta Stone Discovery

This model was used as the analyzer in the GeoVAE Proto experiments, which proved that text descriptions produce 2.5–3.5Γ— stronger geometric differentiation than actual images when projected through a lightweight VAE into this model's patch space.

Source patch_feat discriminability
FLUX images (49k) +0.020
flan-t5-small text +0.053
bert-base-uncased text +0.053
bert-beatrix-2048 text +0.050

Three architecturally different text encoders converge to Β±5% of each other β€” the geometric structure is in the language, not the encoder. This model reads it.

Training

Trained on procedurally generated multi-shape superposition grids (2–4 overlapping geometric primitives per sample, 27 shape classes). Two-tier gate supervision with ground truth computed from voxel analysis:

  • Local gates: dimensionality from axis extent, curvature from fill ratio, boundary from partial occupancy
  • Structural gates: topology from 3D convolution neighbor counting, surface role from neighbor density thresholds

200 epochs, achieving 93.8% recall on shape classification with explicit geometric property prediction as auxiliary objectives.

Files

File Description
geometric_model.py Standalone model + load_from_hub() + extract_features()
model.pt Pretrained weights (epoch 200)

Usage

import torch
from geometric_model import SuperpositionPatchClassifier, load_from_hub, extract_features

# Load pretrained
model = load_from_hub()

# From any (8, 16, 16) source
patches = torch.randn(16, 8, 16, 16).cuda()
gate_vectors, patch_features = extract_features(model, patches)

# Or full output dict
out = model(patches)
out["local_dim_logits"]       # (B, 64, 4)  dimensionality
out["local_curv_logits"]      # (B, 64, 3)  curvature
out["struct_topo_logits"]     # (B, 64, 2)  topology
out["patch_features"]         # (B, 64, 128) learned features
out["patch_shape_logits"]     # (B, 64, 27) shape classification

Related

Citation

Geometric deep learning research by AbstractPhil. The model demonstrates that geometric structure is a universal language bridging text and visual modalities β€” symbolic association through geometric language.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train AbstractPhil/geovocab-patch-maker

Collection including AbstractPhil/geovocab-patch-maker