File size: 4,739 Bytes

---
license: other
license_name: commercial
license_link: LICENSE
datasets:
- DBbun/EEG-250Hz_v1.0
language:
- en
pipeline_tag: feature-extraction
---

# DBbun EEG Encoder — Pretrained Encoder Evaluation and Demo

## Overview

This repository provides a pretrained **EEG encoder** and two demonstration scripts developed by **DBbun LLC**.  

The model converts short segments of multi-channel EEG into **128-dimensional embeddings** that summarize the temporal and spectral structure of the signal.  

It was trained self-supervised on DBbun’s synthetic multi-patient EEG corpus sampled at **250 Hz** using the **10–20 montage (38 channels)**.  

All data are fully synthetic and privacy-safe.

---

## Key Features

- **2-second EEG encoder** trained at 250 Hz (38 channels).  
- Produces **128-D embeddings** suitable for:
  - Seizure vs. non-seizure discrimination  
  - EEG morphology clustering and visualization  
  - Similarity search and retrieval  
  - Anomaly and quality detection  
  - Downstream feature extraction for ML models  
- Includes demonstration scripts for embedding extraction and PCA-based visualization.

---

## Related Dataset

The encoder was trained and evaluated using **[DBbun/EEG-250Hz_v1.0](https://huggingface.co/datasets/DBbun/EEG-250Hz_v1.0)**.  

Each file represents one synthetic patient with 38-channel EEG sampled at 250 Hz.  

When available, `labels_sec` (0 = non-seizure, 1 = seizure) allows computing a **seizure fraction** or training evaluation probes.

---

## Repository Contents

| File | Description |
|------|--------------|
| `encoder_state.pt` | PyTorch weights (state dictionary). |
| `encoder_traced.pt` | TorchScript version for deployment. |
| `model_def.json` | Model configuration (architecture, channels, latent dimension, dropout, etc.). |
| **`DBbun_EEG_Encoder_Eval_Demo_v1.py`** | Baseline script: loads EEG files, runs the pretrained encoder, and exports embeddings. |
| **`DBbun_EEG_Encoder_Eval_Demo_v2.py`** | Extended demo: includes **PCA visualization** that colors seizure vs. non-seizure embeddings for interpretability. |

---

## Intended Use

This model and accompanying scripts are intended for **research, education, and development** purposes.  

They support reproducible EEG feature learning, visualization, and benchmarking without access to real patient data.  

They are **not intended for clinical diagnosis or medical use**.

---

## Suggested Applications

Evaluate representation quality on labeled synthetic EEG.  

Visualize clustering patterns of seizure vs. non-seizure embeddings using PCA.  

Train simple classifiers (e.g., logistic regression, SVM) on 128-D features for benchmarking.  

Apply the encoder as a fixed feature extractor in other time-series tasks.  

---

## What Users Can Do with the Model

The **DBbun EEG Encoder (250 Hz)** acts as a **feature extractor** — it converts raw EEG windows into compact **128-dimensional embeddings** that summarize the shape, rhythm, and energy distribution of brain signals.

### ✅ Typical Use Cases

| Goal | What the user does |
|------|--------------------|
| **Feature extraction** | Feed EEG windows (2 s × 38 channels × 250 Hz) into the encoder → obtain 128-D embeddings for each window. |
| **Classification** | Use the embeddings to train a simple model (e.g., logistic regression, random forest, MLP) for tasks such as seizure vs. non-seizure or artifact vs. clean. |
| **Visualization** | Reduce embeddings to 2-D (PCA or UMAP) to explore clusters or signal structure. |
| **Similarity search** | Build a FAISS or Annoy index to find EEG segments that resemble each other in latent space. |
| **Anomaly detection** | Identify rare or abnormal patterns by computing distances to nearest neighbors. |
| **Patient-level summaries** | Average embeddings across all windows from one patient to form a stable EEG “signature.” |

---

### 💾 Use of Precomputed Embeddings

Precomputed embeddings are optional and depend on the user’s objective:

| Scenario | Use precomputed embeddings? | Reason |
|-----------|-----------------------------|---------|
| **Quick exploration of results** | ✅ Yes | The file `demo_embeddings.npy` already contains 128-D features ready for clustering, visualization, or linear probes. |
| **Custom EEG data (real or synthetic)** | ❌ No | The pretrained encoder can be applied directly to new EEG windows to generate embeddings. |
| **Cross-model or cross-dataset comparison** | Optional | Both the provided embeddings and newly generated ones can be used for benchmarking and evaluation. |

---

## License

Licensed for non-clinical research and educational use.  

For commercial licensing inquiries, please contact **DBbun LLC**.