File size: 4,739 Bytes
53f407b 272c462 53f407b bdd44a2 53f407b 2aebf46 53f407b ce4e224 53f407b ce4e224 53f407b ce4e224 53f407b 2aebf46 53f407b 2aebf46 72b9442 53f407b 72b9442 53f407b 2aebf46 53f407b 5d98bf5 53f407b 5d98bf5 53f407b 2aebf46 53f407b bdd44a2 53f407b 2aebf46 53f407b 2aebf46 272c462 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
---
license: other
license_name: commercial
license_link: LICENSE
datasets:
- DBbun/EEG-250Hz_v1.0
language:
- en
pipeline_tag: feature-extraction
---
# DBbun EEG Encoder — Pretrained Encoder Evaluation and Demo
## Overview
This repository provides a pretrained **EEG encoder** and two demonstration scripts developed by **DBbun LLC**.
The model converts short segments of multi-channel EEG into **128-dimensional embeddings** that summarize the temporal and spectral structure of the signal.
It was trained self-supervised on DBbun’s synthetic multi-patient EEG corpus sampled at **250 Hz** using the **10–20 montage (38 channels)**.
All data are fully synthetic and privacy-safe.
---
## Key Features
- **2-second EEG encoder** trained at 250 Hz (38 channels).
- Produces **128-D embeddings** suitable for:
- Seizure vs. non-seizure discrimination
- EEG morphology clustering and visualization
- Similarity search and retrieval
- Anomaly and quality detection
- Downstream feature extraction for ML models
- Includes demonstration scripts for embedding extraction and PCA-based visualization.
---
## Related Dataset
The encoder was trained and evaluated using **[DBbun/EEG-250Hz_v1.0](https://huggingface.co/datasets/DBbun/EEG-250Hz_v1.0)**.
Each file represents one synthetic patient with 38-channel EEG sampled at 250 Hz.
When available, `labels_sec` (0 = non-seizure, 1 = seizure) allows computing a **seizure fraction** or training evaluation probes.
---
## Repository Contents
| File | Description |
|------|--------------|
| `encoder_state.pt` | PyTorch weights (state dictionary). |
| `encoder_traced.pt` | TorchScript version for deployment. |
| `model_def.json` | Model configuration (architecture, channels, latent dimension, dropout, etc.). |
| **`DBbun_EEG_Encoder_Eval_Demo_v1.py`** | Baseline script: loads EEG files, runs the pretrained encoder, and exports embeddings. |
| **`DBbun_EEG_Encoder_Eval_Demo_v2.py`** | Extended demo: includes **PCA visualization** that colors seizure vs. non-seizure embeddings for interpretability. |
---
## Intended Use
This model and accompanying scripts are intended for **research, education, and development** purposes.
They support reproducible EEG feature learning, visualization, and benchmarking without access to real patient data.
They are **not intended for clinical diagnosis or medical use**.
---
## Suggested Applications
Evaluate representation quality on labeled synthetic EEG.
Visualize clustering patterns of seizure vs. non-seizure embeddings using PCA.
Train simple classifiers (e.g., logistic regression, SVM) on 128-D features for benchmarking.
Apply the encoder as a fixed feature extractor in other time-series tasks.
---
## What Users Can Do with the Model
The **DBbun EEG Encoder (250 Hz)** acts as a **feature extractor** — it converts raw EEG windows into compact **128-dimensional embeddings** that summarize the shape, rhythm, and energy distribution of brain signals.
### ✅ Typical Use Cases
| Goal | What the user does |
|------|--------------------|
| **Feature extraction** | Feed EEG windows (2 s × 38 channels × 250 Hz) into the encoder → obtain 128-D embeddings for each window. |
| **Classification** | Use the embeddings to train a simple model (e.g., logistic regression, random forest, MLP) for tasks such as seizure vs. non-seizure or artifact vs. clean. |
| **Visualization** | Reduce embeddings to 2-D (PCA or UMAP) to explore clusters or signal structure. |
| **Similarity search** | Build a FAISS or Annoy index to find EEG segments that resemble each other in latent space. |
| **Anomaly detection** | Identify rare or abnormal patterns by computing distances to nearest neighbors. |
| **Patient-level summaries** | Average embeddings across all windows from one patient to form a stable EEG “signature.” |
---
### 💾 Use of Precomputed Embeddings
Precomputed embeddings are optional and depend on the user’s objective:
| Scenario | Use precomputed embeddings? | Reason |
|-----------|-----------------------------|---------|
| **Quick exploration of results** | ✅ Yes | The file `demo_embeddings.npy` already contains 128-D features ready for clustering, visualization, or linear probes. |
| **Custom EEG data (real or synthetic)** | ❌ No | The pretrained encoder can be applied directly to new EEG windows to generate embeddings. |
| **Cross-model or cross-dataset comparison** | Optional | Both the provided embeddings and newly generated ones can be used for benchmarking and evaluation. |
---
## License
Licensed for non-clinical research and educational use.
For commercial licensing inquiries, please contact **DBbun LLC**. |