File size: 12,316 Bytes
da29402 9dc6d56 2818773 b33e1f2 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 30aa695 06250f3 110d93b 0241894 30aa695 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 64f0d57 2818773 da29402 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 |
---
license: apache-2.0
pipeline_tag: zero-shot-classification
tags:
- chemistry
- biology
- art
---
# Pentachora Adaptive Encoded (Multi-Channel) - NOTEBOOK 2 of 5
**A geometry-regularized classifier with a 5-frequency encoder and pentachoron constellation heads.**
*Author:* **AbstractPhil** · *Quartermaster:* **Mirel** · GPT 4o - GPT 5 - GPT 5 Fast - GPT 5 Thinking - GPT 5 Pro
*Assistants:* Claude Opus 4.1 - Claude Sonnet 4 - Gemini 2.5
*License:* **Apache-2.0**
---
## 📌 TL;DR
This repository hosts training runs of a **frequency-aware encoder** (PentaFreq) paired with a **pentachoron constellation classifier** (dispatchers + specialists). The model blends classic cross-entropy with **two contrastive objectives** (dual InfoNCE and **ROSE-weighted** InfoNCE) and a **geometric regularizer** that keeps the learned vertex geometry sane.
It supports **1-channel and 3-channel** 28×28 inputs (e.g., TorchVision MNIST variants and MedMNIST 2D sets), is **seeded/deterministic**, and ships full artifacts (weights, plots, history, TensorBoard) for review.
---
## Authors Notes
- Yes I am human, and this is an AI generated model card so it's probably going to be a little inaccurate. It just looks better than mine would look.
- This is design 2 of 5, the AI seems to always forget - so a reminder ahead of this because I probably won't edit it later. It has some odd stuff that doesn't matter, because this isn't the best one.
- Cataloging this model is important nonetheless, as it's a stepping stone to the more powerful geometric crystalization collective.
- I will include all cites to the adjacent papers used for the mathematics, model weights, inspirations, and test methodologies implemented at a later time.
- I appreciate every single contributor to this - direct or indirect - through your invaluable contributions to science that manifested in utilizable AI form.
- I have included the training notebook as train_notebook.ipynb - which shows the deterministic setup, the weights, the loss methods, and an absolute ton of random functions that I let the AIs monkey patch in because it's faster than trying to teach AI 15 classes in 15 files.
- The patterns on this one struggle based on certain pentachora overlap which is why it had to be rewritten again.
- The deterministic and non-deterministic nature of the combination of utilities manifest quirks and behavior that are unexpected, which is why the deterministic version is required.
- Strict determinism can be enabled for a more robust and accurate recreation but I may have missed some seed any points in this earlier notebook.
## 🧠 Model overview
### Architecture
- **PentaFreq Encoder (multi-channel)**
- 5 spectral branches (ultra-high, high, mid, low-mid, low) → per-branch encoders → cross-attention → MLP fusion → **normalized latent `z`**.
- Channel-aware: supports **C ∈ {1,3}**; input is flattened to `C×28×28`.
- **Pentachoron Constellation Classifier**
- **Two stacks** (dispatchers & specialists) each containing **pentachora** (5-vertex simplices) with learnable vertices.
- **Coherence gate** modulates vertex logits; **group heads** (one per vertex) score class subsets; **pair aggregation** + fusion MLP produce final logits.
- Geometry terms encourage valid simplex structure and separation between the two stacks.
### Objective
- **CE** – main cross-entropy on logits.
- **Dual InfoNCE (stable)** – encourages `z` to match the **correct vertex** across both stacks.
- **ROSE-weighted InfoNCE (stable)** – same idea, but reweights samples by an analytic **ROSE** similarity (triadic cosine + magnitude).
- **Geometry Regularization** – stable Cayley–Menger **proxy** (eigval-based), edge-variance, center separation, and a **soft radius control**; ramped in early epochs.
> All contrastive losses use `log_softmax` + `gather` to avoid `inf−inf` traps; all paths **nan-sanitize** defensively.
### Determinism
- Global seeding (Python/NumPy/Torch), deterministic DataLoader workers, generator-seeded samplers; cuDNN deterministic & TF32 off.
- Optional strict mode (`torch.use_deterministic_algorithms(True)`) and deterministic cuBLAS.
---
## 🗂️ Repository layout per run
Each training run uploads a complete bundle at:
```
<repo>/<root>/<DatasetName>/<Timestamp_or_best>/
weights/
encoder[_<Dataset>].safetensors
constellation[_<Dataset>].safetensors
diagnostic_head[_<Dataset>].safetensors
config.json # exact config used
manifest.json # env, params, dataset, best metrics
history.json / history.csv
tensorboard/ (+ zip)
plots/ # accuracy, loss components, lambda, confusion matrices
```
> We also optionally publish a **`best/`** alias inside each dataset folder pointing to the current champion.
---
## 🧩 Intended use & use cases
**Intended use**: research-grade supervised classification and geometry-regularized representation learning on small images (28×28) across gray and color channels.
**Example use cases**
- **Benchmarking** on MNIST family / MedMNIST 2D sets with defensible, reproducible training and complete artifacts.
- **Geometry-aware representation learning**: analyze how simplex vertices move, how the gate allocates probability mass, and how geometry regularization affects generalization.
- **Class routing / specialization**: per-vertex group heads provide an interpretable split of classes; confusion-driven vertex reweighting helps diagnose hard groups.
- **Curriculum & loss ablations**: toggle ROSE, dual InfoNCE, or geometry terms to study their marginal value under a controlled seed.
- **OOD “pressure tests”** (research): ROSE magnitude and routing entropy can be used as quick signals of uncertainty (not calibrated).
- **Education & reproducibility**: the runs are fully seeded, include TensorBoard logs and plots, and use safe numerical formulations.
---
## 🚫 Out-of-scope / limitations
- **Not a medical device** – even if trained on MedMNIST subsets, this is not a diagnostic tool. Don’t use it for clinical decisions.
- **Input size** is 28×28; higher-resolution domains require retraining and likely architecture tweaks.
- **Dataset bias / shift** – performance depends on the underlying distribution. Evaluate before deployment.
- **Calibration** – logits are not guaranteed calibrated. For decision thresholds, use a validation set or post-hoc calibration.
- **Robustness** – robustness to adversarial perturbations is not a design goal here.
---
## 📈 Example results (single-seed snapshots)
> Numbers below are indicative from our seeded runs with `img_size=28`, size-aware LR schedule and reg ramp; see `manifest.json` in each run for exact details.
| Dataset | C | Best Test Acc | Epoch | Notes |
|----------------|---|---------------:|------:|--------------------------------------|
| MNIST/Fashion* | 1 | 0.97–0.98 | 15–25 | stable losses + reg ramp |
| BloodMNIST | 3 | ~0.95–0.97+ | 20–30 | color preserved, 28×28 |
| EMNIST (bal) | 1 | 0.88–0.92 | 25–45 | many classes; pairs auto-scaled |
\* depending on which of the pair (MNIST / FashionMNIST) is selected.
Consult each dataset folder’s `history.csv` for the full learning curve and the **current best** accuracy.
---
## 🔧 How to use (PyTorch)
```python
import torch
from safetensors.torch import load_file as load_safetensors
# --- load weights (example path) ---
ENC = "weights/encoder_MNIST.safetensors"
CON = "weights/constellation_MNIST.safetensors"
DIA = "weights/diagnostic_head_MNIST.safetensors"
# Recreate model classes (identical definitions to the notebook)
encoder = PentaFreqEncoderV2(input_dim=28*28, input_ch=1, base_dim=56, num_heads=2, channels=12)
constellation = BatchedPentachoronConstellation(num_classes=10, dim=56, num_pairs=5, lambda_sep=0.391)
diag = RoseDiagnosticHead(56)
encoder.load_state_dict(load_safetensors(ENC))
constellation.load_state_dict(load_safetensors(CON))
diag.load_state_dict(load_safetensors(DIA))
encoder.eval(); constellation.eval()
# --- dummy inference ---
# x: [B, C, H, W] converted to float tensor in [0,1]; flatten to [B, C*H*W]
# use the same normalization as training if you want best performance
x = torch.rand(8, 1, 28, 28)
x_flat = x.view(x.size(0), -1)
with torch.no_grad():
z = encoder(x_flat) # [B, D]
logits, diag_out = constellation(z) # [B, C]
pred = logits.argmax(dim=1)
print(pred)
```
> To reproduce training, see `config.json` and `history.csv`; all recipes are encoded in the flagship notebook used for these runs.
---
## 🔬 Training procedure (default)
- **Optimizer**: AdamW (β1=0.9, β2=0.999), size-aware LR (≈2e-2 by default)
- **Schedule**: 10% **warmup** → cosine to `lr_min=1e-6`
- **Batch size**: up to 2048 (fits on T4/A100 at 28×28)
- **Loss**: CE + Dual InfoNCE + ROSE InfoNCE + Geometry Reg (ramped) + Diag MSE
- **Determinism**: seeds for Python/NumPy/Torch (CPU/GPU), deterministic DataLoader workers and samplers, cuDNN deterministic, TF32 off
- **Numerical safety**: log-softmax contrastive, eigval CM proxy, `nan_to_num` guards, optional step rollback if non-finite
---
## 📈 Evaluation
- Main metric: **top-1 accuracy** on the held-out test split defined by each dataset.
- Diagnostics we log:
- **Routing entropy** and vertex probabilities
- **ROSE** magnitudes
- Confusion matrices (per epoch and “best”)
- λ (geometry ↔ attention gate) over epochs
- Full loss decomposition
---
## 🔭 Potential for growth
- **Hypercube Constellations** (shipped classes in the notebook): scale from 4-simplex to n-cube graphs; compare geometry families.
- **Multi-resolution** (56→128→256 latent; 28→64→128 images); add pyramid encoders.
- **Self-distillation / semi-supervised**: use ROSE as a confidence-weighted pseudo-labeling signal.
- **Better routing**: learned vertex priors per class, entropy regularization, temperature schedules.
- **Calibration & OOD**: temperature scaling / Dirichlet heads; exploit ROSE magnitude and gating entropy for improved uncertainty estimates.
- **Deployment adapters**: ONNX / TorchScript exports; small mobile variants of PentaFreq.
---
## ⚖️ Ethical considerations & implications
- **Clinical datasets** (MedMNIST) are simplified proxies; they don’t reflect clinical complexity or demographic coverage.
- **Downstream use** must include dataset-appropriate validation and calibration; this model is for **research** only.
- **Data bias** and **label noise** can be amplified by strong geometry priors—review confusion matrices and per-class accuracies before claiming improvements.
- **Positive implications**: the constellation design offers a **transparent, analyzable structure** (per-vertex heads, explicit geometry), easing **interpretability** and **ablation**.
---
## 🔁 Reproducibility
- `config.json` contains all hyperparameters used for each run.
- `manifest.json` logs environment: Python, Torch, CUDA GPU, RAM, parameter counts.
- Seeds and determinism flags are printed in logs and set in code.
- `history.csv` + TensorBoard fully specify the learning trajectory.
---
## 🧾 License
**Apache License 2.0** – see `LICENSE`.
---
## 📣 Citation
If you use this work, please cite:
```
@software{abstractphil_pentachora_2025,
author = {AbstractPhil and Mirel},
title = {Pentachora Adaptive Encoded: Geometry-Regularized Classification with PentaFreq},
year = {2025},
license = {Apache-2.0},
url = {https://huggingface.co/AbstractPhil/pentachora-multi-channel-frequency-encoded}
}
```
---
## 🛠️ Changelog (excerpt)
- **2025-08**: Flagship notebook stabilized (stable losses, eigval CM proxy, NaN rollback, deterministic sweep).
- **2025-08**: Multi-channel PentaFreq; per-dataset HF folders with full artifacts; optional `best/` alias.
- **2025-08**: Hypercube constellation classes added for follow-up experiments.
---
## 💬 Contact
- **Author:** @AbstractPhil
- **Quartermaster:** Mirel (ChatGPT – GPT-5 Thinking)
- **Issues / questions:** open a Discussion on the HF repo or ping the author |