File size: 12,316 Bytes
da29402
 
 
 
 
 
 
 
9dc6d56
2818773
 
 
b33e1f2
 
2818773
64f0d57
2818773
64f0d57
2818773
64f0d57
2818773
 
64f0d57
2818773
64f0d57
30aa695
 
 
 
 
 
06250f3
110d93b
0241894
 
30aa695
2818773
64f0d57
2818773
64f0d57
2818773
 
 
64f0d57
2818773
 
 
 
64f0d57
2818773
64f0d57
2818773
 
 
 
64f0d57
2818773
64f0d57
2818773
64f0d57
2818773
 
64f0d57
2818773
64f0d57
2818773
64f0d57
 
 
2818773
64f0d57
 
 
 
 
 
 
 
 
 
2818773
64f0d57
2818773
64f0d57
2818773
64f0d57
2818773
64f0d57
2818773
64f0d57
2818773
64f0d57
2818773
 
 
 
 
 
64f0d57
2818773
64f0d57
2818773
64f0d57
2818773
 
 
 
 
64f0d57
2818773
64f0d57
2818773
64f0d57
2818773
64f0d57
2818773
 
 
 
 
64f0d57
2818773
 
64f0d57
2818773
64f0d57
2818773
64f0d57
2818773
64f0d57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2818773
64f0d57
2818773
64f0d57
2818773
64f0d57
2818773
64f0d57
2818773
 
 
 
 
 
64f0d57
2818773
64f0d57
2818773
64f0d57
2818773
 
 
 
 
 
 
64f0d57
2818773
64f0d57
2818773
64f0d57
2818773
 
 
 
 
 
64f0d57
2818773
64f0d57
2818773
64f0d57
2818773
 
 
 
64f0d57
2818773
64f0d57
2818773
64f0d57
2818773
 
 
 
64f0d57
2818773
64f0d57
2818773
64f0d57
2818773
64f0d57
2818773
64f0d57
2818773
64f0d57
 
 
2818773
64f0d57
 
 
 
 
2818773
64f0d57
2818773
64f0d57
2818773
64f0d57
2818773
64f0d57
2818773
 
 
64f0d57
2818773
64f0d57
2818773
64f0d57
2818773
 
da29402
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
---
license: apache-2.0
pipeline_tag: zero-shot-classification
tags:
- chemistry
- biology
- art
---
# Pentachora Adaptive Encoded (Multi-Channel) - NOTEBOOK 2 of 5
**A geometry-regularized classifier with a 5-frequency encoder and pentachoron constellation heads.**  
*Author:* **AbstractPhil** · *Quartermaster:* **Mirel** · GPT 4o - GPT 5 - GPT 5 Fast - GPT 5 Thinking - GPT 5 Pro
*Assistants:* Claude Opus 4.1 - Claude Sonnet 4 - Gemini 2.5


*License:* **Apache-2.0**

---

## 📌 TL;DR

This repository hosts training runs of a **frequency-aware encoder** (PentaFreq) paired with a **pentachoron constellation classifier** (dispatchers + specialists). The model blends classic cross-entropy with **two contrastive objectives** (dual InfoNCE and **ROSE-weighted** InfoNCE) and a **geometric regularizer** that keeps the learned vertex geometry sane.  
It supports **1-channel and 3-channel** 28×28 inputs (e.g., TorchVision MNIST variants and MedMNIST 2D sets), is **seeded/deterministic**, and ships full artifacts (weights, plots, history, TensorBoard) for review.

---

## Authors Notes
- Yes I am human, and this is an AI generated model card so it's probably going to be a little inaccurate. It just looks better than mine would look.
- This is design 2 of 5, the AI seems to always forget - so a reminder ahead of this because I probably won't edit it later. It has some odd stuff that doesn't matter, because this isn't the best one.
- Cataloging this model is important nonetheless, as it's a stepping stone to the more powerful geometric crystalization collective.
- I will include all cites to the adjacent papers used for the mathematics, model weights, inspirations, and test methodologies implemented at a later time.
- I appreciate every single contributor to this - direct or indirect - through your invaluable contributions to science that manifested in utilizable AI form.
- I have included the training notebook as train_notebook.ipynb - which shows the deterministic setup, the weights, the loss methods, and an absolute ton of random functions that I let the AIs monkey patch in because it's faster than trying to teach AI 15 classes in 15 files.
- The patterns on this one struggle based on certain pentachora overlap which is why it had to be rewritten again.
- The deterministic and non-deterministic nature of the combination of utilities manifest quirks and behavior that are unexpected, which is why the deterministic version is required.
- Strict determinism can be enabled for a more robust and accurate recreation but I may have missed some seed any points in this earlier notebook.

## 🧠 Model overview

### Architecture

- **PentaFreq Encoder (multi-channel)**  
  - 5 spectral branches (ultra-high, high, mid, low-mid, low) → per-branch encoders → cross-attention → MLP fusion → **normalized latent `z`**.  
  - Channel-aware: supports **C ∈ {1,3}**; input is flattened to `C×28×28`.

- **Pentachoron Constellation Classifier**  
  - **Two stacks** (dispatchers & specialists) each containing **pentachora** (5-vertex simplices) with learnable vertices.  
  - **Coherence gate** modulates vertex logits; **group heads** (one per vertex) score class subsets; **pair aggregation** + fusion MLP produce final logits.  
  - Geometry terms encourage valid simplex structure and separation between the two stacks.

### Objective

- **CE** – main cross-entropy on logits.  
- **Dual InfoNCE (stable)** – encourages `z` to match the **correct vertex** across both stacks.  
- **ROSE-weighted InfoNCE (stable)** – same idea, but reweights samples by an analytic **ROSE** similarity (triadic cosine + magnitude).  
- **Geometry Regularization** – stable Cayley–Menger **proxy** (eigval-based), edge-variance, center separation, and a **soft radius control**; ramped in early epochs.

> All contrastive losses use `log_softmax` + `gather` to avoid `inf−inf` traps; all paths **nan-sanitize** defensively.

### Determinism

- Global seeding (Python/NumPy/Torch), deterministic DataLoader workers, generator-seeded samplers; cuDNN deterministic & TF32 off.  
- Optional strict mode (`torch.use_deterministic_algorithms(True)`) and deterministic cuBLAS.

---

## 🗂️ Repository layout per run

Each training run uploads a complete bundle at:

```
<repo>/<root>/<DatasetName>/<Timestamp_or_best>/
  weights/
    encoder[_<Dataset>].safetensors
    constellation[_<Dataset>].safetensors
    diagnostic_head[_<Dataset>].safetensors
  config.json               # exact config used
  manifest.json             # env, params, dataset, best metrics
  history.json / history.csv
  tensorboard/ (+ zip)
  plots/  # accuracy, loss components, lambda, confusion matrices
```

> We also optionally publish a **`best/`** alias inside each dataset folder pointing to the current champion.

---

## 🧩 Intended use & use cases

**Intended use**: research-grade supervised classification and geometry-regularized representation learning on small images (28×28) across gray and color channels.

**Example use cases**

- **Benchmarking** on MNIST family / MedMNIST 2D sets with defensible, reproducible training and complete artifacts.  
- **Geometry-aware representation learning**: analyze how simplex vertices move, how the gate allocates probability mass, and how geometry regularization affects generalization.  
- **Class routing / specialization**: per-vertex group heads provide an interpretable split of classes; confusion-driven vertex reweighting helps diagnose hard groups.  
- **Curriculum & loss ablations**: toggle ROSE, dual InfoNCE, or geometry terms to study their marginal value under a controlled seed.  
- **OOD “pressure tests”** (research): ROSE magnitude and routing entropy can be used as quick signals of uncertainty (not calibrated).  
- **Education & reproducibility**: the runs are fully seeded, include TensorBoard logs and plots, and use safe numerical formulations.

---

## 🚫 Out-of-scope / limitations

- **Not a medical device** – even if trained on MedMNIST subsets, this is not a diagnostic tool. Don’t use it for clinical decisions.  
- **Input size** is 28×28; higher-resolution domains require retraining and likely architecture tweaks.  
- **Dataset bias / shift** – performance depends on the underlying distribution. Evaluate before deployment.  
- **Calibration** – logits are not guaranteed calibrated. For decision thresholds, use a validation set or post-hoc calibration.  
- **Robustness** – robustness to adversarial perturbations is not a design goal here.

---

## 📈 Example results (single-seed snapshots)

> Numbers below are indicative from our seeded runs with `img_size=28`, size-aware LR schedule and reg ramp; see `manifest.json` in each run for exact details.

| Dataset        | C | Best Test Acc | Epoch | Notes                                |
|----------------|---|---------------:|------:|--------------------------------------|
| MNIST/Fashion* | 1 | 0.97–0.98      | 15–25 | stable losses + reg ramp             |
| BloodMNIST     | 3 | ~0.95–0.97+    | 20–30 | color preserved, 28×28                |
| EMNIST (bal)   | 1 | 0.88–0.92      | 25–45 | many classes; pairs auto-scaled      |

\* depending on which of the pair (MNIST / FashionMNIST) is selected.  
Consult each dataset folder’s `history.csv` for the full learning curve and the **current best** accuracy.

---

## 🔧 How to use (PyTorch)

```python
import torch
from safetensors.torch import load_file as load_safetensors

# --- load weights (example path) ---
ENC = "weights/encoder_MNIST.safetensors"
CON = "weights/constellation_MNIST.safetensors"
DIA = "weights/diagnostic_head_MNIST.safetensors"

# Recreate model classes (identical definitions to the notebook)
encoder = PentaFreqEncoderV2(input_dim=28*28, input_ch=1, base_dim=56, num_heads=2, channels=12)
constellation = BatchedPentachoronConstellation(num_classes=10, dim=56, num_pairs=5, lambda_sep=0.391)
diag = RoseDiagnosticHead(56)

encoder.load_state_dict(load_safetensors(ENC))
constellation.load_state_dict(load_safetensors(CON))
diag.load_state_dict(load_safetensors(DIA))

encoder.eval(); constellation.eval()

# --- dummy inference ---
# x: [B, C, H, W] converted to float tensor in [0,1]; flatten to [B, C*H*W]
# use the same normalization as training if you want best performance
x = torch.rand(8, 1, 28, 28)
x_flat = x.view(x.size(0), -1)

with torch.no_grad():
    z = encoder(x_flat)                    # [B, D]
    logits, diag_out = constellation(z)    # [B, C]
    pred = logits.argmax(dim=1)
print(pred)
```

> To reproduce training, see `config.json` and `history.csv`; all recipes are encoded in the flagship notebook used for these runs.

---

## 🔬 Training procedure (default)

- **Optimizer**: AdamW (β1=0.9, β2=0.999), size-aware LR (≈2e-2 by default)  
- **Schedule**: 10% **warmup** → cosine to `lr_min=1e-6`  
- **Batch size**: up to 2048 (fits on T4/A100 at 28×28)  
- **Loss**: CE + Dual InfoNCE + ROSE InfoNCE + Geometry Reg (ramped) + Diag MSE  
- **Determinism**: seeds for Python/NumPy/Torch (CPU/GPU), deterministic DataLoader workers and samplers, cuDNN deterministic, TF32 off  
- **Numerical safety**: log-softmax contrastive, eigval CM proxy, `nan_to_num` guards, optional step rollback if non-finite

---

## 📈 Evaluation

- Main metric: **top-1 accuracy** on the held-out test split defined by each dataset.  
- Diagnostics we log:
  - **Routing entropy** and vertex probabilities
  - **ROSE** magnitudes
  - Confusion matrices (per epoch and “best”)
  - λ (geometry ↔ attention gate) over epochs
  - Full loss decomposition

---

## 🔭 Potential for growth

- **Hypercube Constellations** (shipped classes in the notebook): scale from 4-simplex to n-cube graphs; compare geometry families.  
- **Multi-resolution** (56→128→256 latent; 28→64→128 images); add pyramid encoders.  
- **Self-distillation / semi-supervised**: use ROSE as a confidence-weighted pseudo-labeling signal.  
- **Better routing**: learned vertex priors per class, entropy regularization, temperature schedules.  
- **Calibration & OOD**: temperature scaling / Dirichlet heads; exploit ROSE magnitude and gating entropy for improved uncertainty estimates.  
- **Deployment adapters**: ONNX / TorchScript exports; small mobile variants of PentaFreq.

---

## ⚖️ Ethical considerations & implications

- **Clinical datasets** (MedMNIST) are simplified proxies; they don’t reflect clinical complexity or demographic coverage.  
- **Downstream use** must include dataset-appropriate validation and calibration; this model is for **research** only.  
- **Data bias** and **label noise** can be amplified by strong geometry priors—review confusion matrices and per-class accuracies before claiming improvements.  
- **Positive implications**: the constellation design offers a **transparent, analyzable structure** (per-vertex heads, explicit geometry), easing **interpretability** and **ablation**.

---

## 🔁 Reproducibility

- `config.json` contains all hyperparameters used for each run.  
- `manifest.json` logs environment: Python, Torch, CUDA GPU, RAM, parameter counts.  
- Seeds and determinism flags are printed in logs and set in code.  
- `history.csv` + TensorBoard fully specify the learning trajectory.

---

## 🧾 License

**Apache License 2.0** – see `LICENSE`.

---

## 📣 Citation

If you use this work, please cite:

```
@software{abstractphil_pentachora_2025,
  author  = {AbstractPhil and Mirel},
  title   = {Pentachora Adaptive Encoded: Geometry-Regularized Classification with PentaFreq},
  year    = {2025},
  license = {Apache-2.0},
  url     = {https://huggingface.co/AbstractPhil/pentachora-multi-channel-frequency-encoded}
}
```

---

## 🛠️ Changelog (excerpt)

- **2025-08**: Flagship notebook stabilized (stable losses, eigval CM proxy, NaN rollback, deterministic sweep).  
- **2025-08**: Multi-channel PentaFreq; per-dataset HF folders with full artifacts; optional `best/` alias.  
- **2025-08**: Hypercube constellation classes added for follow-up experiments.

---

## 💬 Contact

- **Author:** @AbstractPhil  
- **Quartermaster:** Mirel (ChatGPT – GPT-5 Thinking)  
- **Issues / questions:** open a Discussion on the HF repo or ping the author