kartoun's picture
Update README.md
272c462 verified
---
license: other
license_name: commercial
license_link: LICENSE
datasets:
- DBbun/EEG-250Hz_v1.0
language:
- en
pipeline_tag: feature-extraction
---
# DBbun EEG Encoder — Pretrained Encoder Evaluation and Demo
## Overview
This repository provides a pretrained **EEG encoder** and two demonstration scripts developed by **DBbun LLC**.
The model converts short segments of multi-channel EEG into **128-dimensional embeddings** that summarize the temporal and spectral structure of the signal.
It was trained self-supervised on DBbun’s synthetic multi-patient EEG corpus sampled at **250 Hz** using the **10–20 montage (38 channels)**.
All data are fully synthetic and privacy-safe.
---
## Key Features
- **2-second EEG encoder** trained at 250 Hz (38 channels).
- Produces **128-D embeddings** suitable for:
- Seizure vs. non-seizure discrimination
- EEG morphology clustering and visualization
- Similarity search and retrieval
- Anomaly and quality detection
- Downstream feature extraction for ML models
- Includes demonstration scripts for embedding extraction and PCA-based visualization.
---
## Related Dataset
The encoder was trained and evaluated using **[DBbun/EEG-250Hz_v1.0](https://huggingface.co/datasets/DBbun/EEG-250Hz_v1.0)**.
Each file represents one synthetic patient with 38-channel EEG sampled at 250 Hz.
When available, `labels_sec` (0 = non-seizure, 1 = seizure) allows computing a **seizure fraction** or training evaluation probes.
---
## Repository Contents
| File | Description |
|------|--------------|
| `encoder_state.pt` | PyTorch weights (state dictionary). |
| `encoder_traced.pt` | TorchScript version for deployment. |
| `model_def.json` | Model configuration (architecture, channels, latent dimension, dropout, etc.). |
| **`DBbun_EEG_Encoder_Eval_Demo_v1.py`** | Baseline script: loads EEG files, runs the pretrained encoder, and exports embeddings. |
| **`DBbun_EEG_Encoder_Eval_Demo_v2.py`** | Extended demo: includes **PCA visualization** that colors seizure vs. non-seizure embeddings for interpretability. |
---
## Intended Use
This model and accompanying scripts are intended for **research, education, and development** purposes.
They support reproducible EEG feature learning, visualization, and benchmarking without access to real patient data.
They are **not intended for clinical diagnosis or medical use**.
---
## Suggested Applications
Evaluate representation quality on labeled synthetic EEG.
Visualize clustering patterns of seizure vs. non-seizure embeddings using PCA.
Train simple classifiers (e.g., logistic regression, SVM) on 128-D features for benchmarking.
Apply the encoder as a fixed feature extractor in other time-series tasks.
---
## What Users Can Do with the Model
The **DBbun EEG Encoder (250 Hz)** acts as a **feature extractor** — it converts raw EEG windows into compact **128-dimensional embeddings** that summarize the shape, rhythm, and energy distribution of brain signals.
### ✅ Typical Use Cases
| Goal | What the user does |
|------|--------------------|
| **Feature extraction** | Feed EEG windows (2 s × 38 channels × 250 Hz) into the encoder → obtain 128-D embeddings for each window. |
| **Classification** | Use the embeddings to train a simple model (e.g., logistic regression, random forest, MLP) for tasks such as seizure vs. non-seizure or artifact vs. clean. |
| **Visualization** | Reduce embeddings to 2-D (PCA or UMAP) to explore clusters or signal structure. |
| **Similarity search** | Build a FAISS or Annoy index to find EEG segments that resemble each other in latent space. |
| **Anomaly detection** | Identify rare or abnormal patterns by computing distances to nearest neighbors. |
| **Patient-level summaries** | Average embeddings across all windows from one patient to form a stable EEG “signature.” |
---
### 💾 Use of Precomputed Embeddings
Precomputed embeddings are optional and depend on the user’s objective:
| Scenario | Use precomputed embeddings? | Reason |
|-----------|-----------------------------|---------|
| **Quick exploration of results** | ✅ Yes | The file `demo_embeddings.npy` already contains 128-D features ready for clustering, visualization, or linear probes. |
| **Custom EEG data (real or synthetic)** | ❌ No | The pretrained encoder can be applied directly to new EEG windows to generate embeddings. |
| **Cross-model or cross-dataset comparison** | Optional | Both the provided embeddings and newly generated ones can be used for benchmarking and evaluation. |
---
## License
Licensed for non-clinical research and educational use.
For commercial licensing inquiries, please contact **DBbun LLC**.