|
--- |
|
license: other |
|
license_name: commercial |
|
license_link: LICENSE |
|
datasets: |
|
- DBbun/EEG-250Hz_v1.0 |
|
language: |
|
- en |
|
pipeline_tag: feature-extraction |
|
--- |
|
|
|
# DBbun EEG Encoder — Pretrained Encoder Evaluation and Demo |
|
|
|
## Overview |
|
|
|
This repository provides a pretrained **EEG encoder** and two demonstration scripts developed by **DBbun LLC**. |
|
|
|
The model converts short segments of multi-channel EEG into **128-dimensional embeddings** that summarize the temporal and spectral structure of the signal. |
|
|
|
It was trained self-supervised on DBbun’s synthetic multi-patient EEG corpus sampled at **250 Hz** using the **10–20 montage (38 channels)**. |
|
|
|
All data are fully synthetic and privacy-safe. |
|
|
|
--- |
|
|
|
## Key Features |
|
|
|
- **2-second EEG encoder** trained at 250 Hz (38 channels). |
|
- Produces **128-D embeddings** suitable for: |
|
- Seizure vs. non-seizure discrimination |
|
- EEG morphology clustering and visualization |
|
- Similarity search and retrieval |
|
- Anomaly and quality detection |
|
- Downstream feature extraction for ML models |
|
- Includes demonstration scripts for embedding extraction and PCA-based visualization. |
|
|
|
--- |
|
|
|
## Related Dataset |
|
|
|
The encoder was trained and evaluated using **[DBbun/EEG-250Hz_v1.0](https://huggingface.co/datasets/DBbun/EEG-250Hz_v1.0)**. |
|
|
|
Each file represents one synthetic patient with 38-channel EEG sampled at 250 Hz. |
|
|
|
When available, `labels_sec` (0 = non-seizure, 1 = seizure) allows computing a **seizure fraction** or training evaluation probes. |
|
|
|
--- |
|
|
|
## Repository Contents |
|
|
|
| File | Description | |
|
|------|--------------| |
|
| `encoder_state.pt` | PyTorch weights (state dictionary). | |
|
| `encoder_traced.pt` | TorchScript version for deployment. | |
|
| `model_def.json` | Model configuration (architecture, channels, latent dimension, dropout, etc.). | |
|
| **`DBbun_EEG_Encoder_Eval_Demo_v1.py`** | Baseline script: loads EEG files, runs the pretrained encoder, and exports embeddings. | |
|
| **`DBbun_EEG_Encoder_Eval_Demo_v2.py`** | Extended demo: includes **PCA visualization** that colors seizure vs. non-seizure embeddings for interpretability. | |
|
|
|
--- |
|
|
|
## Intended Use |
|
|
|
This model and accompanying scripts are intended for **research, education, and development** purposes. |
|
|
|
They support reproducible EEG feature learning, visualization, and benchmarking without access to real patient data. |
|
|
|
They are **not intended for clinical diagnosis or medical use**. |
|
|
|
--- |
|
|
|
## Suggested Applications |
|
|
|
Evaluate representation quality on labeled synthetic EEG. |
|
|
|
Visualize clustering patterns of seizure vs. non-seizure embeddings using PCA. |
|
|
|
Train simple classifiers (e.g., logistic regression, SVM) on 128-D features for benchmarking. |
|
|
|
Apply the encoder as a fixed feature extractor in other time-series tasks. |
|
|
|
--- |
|
|
|
## What Users Can Do with the Model |
|
|
|
The **DBbun EEG Encoder (250 Hz)** acts as a **feature extractor** — it converts raw EEG windows into compact **128-dimensional embeddings** that summarize the shape, rhythm, and energy distribution of brain signals. |
|
|
|
### ✅ Typical Use Cases |
|
|
|
| Goal | What the user does | |
|
|------|--------------------| |
|
| **Feature extraction** | Feed EEG windows (2 s × 38 channels × 250 Hz) into the encoder → obtain 128-D embeddings for each window. | |
|
| **Classification** | Use the embeddings to train a simple model (e.g., logistic regression, random forest, MLP) for tasks such as seizure vs. non-seizure or artifact vs. clean. | |
|
| **Visualization** | Reduce embeddings to 2-D (PCA or UMAP) to explore clusters or signal structure. | |
|
| **Similarity search** | Build a FAISS or Annoy index to find EEG segments that resemble each other in latent space. | |
|
| **Anomaly detection** | Identify rare or abnormal patterns by computing distances to nearest neighbors. | |
|
| **Patient-level summaries** | Average embeddings across all windows from one patient to form a stable EEG “signature.” | |
|
|
|
--- |
|
|
|
### 💾 Use of Precomputed Embeddings |
|
|
|
Precomputed embeddings are optional and depend on the user’s objective: |
|
|
|
| Scenario | Use precomputed embeddings? | Reason | |
|
|-----------|-----------------------------|---------| |
|
| **Quick exploration of results** | ✅ Yes | The file `demo_embeddings.npy` already contains 128-D features ready for clustering, visualization, or linear probes. | |
|
| **Custom EEG data (real or synthetic)** | ❌ No | The pretrained encoder can be applied directly to new EEG windows to generate embeddings. | |
|
| **Cross-model or cross-dataset comparison** | Optional | Both the provided embeddings and newly generated ones can be used for benchmarking and evaluation. | |
|
|
|
--- |
|
|
|
## License |
|
|
|
Licensed for non-clinical research and educational use. |
|
|
|
For commercial licensing inquiries, please contact **DBbun LLC**. |