File size: 4,739 Bytes
53f407b
 
 
 
272c462
 
 
 
 
53f407b
bdd44a2
53f407b
 
 
2aebf46
53f407b
ce4e224
53f407b
ce4e224
53f407b
ce4e224
53f407b
 
 
 
 
2aebf46
53f407b
 
 
 
 
 
 
 
 
 
 
 
2aebf46
72b9442
 
53f407b
72b9442
53f407b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2aebf46
53f407b
5d98bf5
53f407b
5d98bf5
53f407b
 
 
 
 
2aebf46
 
 
 
 
 
 
 
53f407b
 
 
bdd44a2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53f407b
2aebf46
53f407b
2aebf46
272c462
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
license: other
license_name: commercial
license_link: LICENSE
datasets:
- DBbun/EEG-250Hz_v1.0
language:
- en
pipeline_tag: feature-extraction
---

# DBbun EEG Encoder — Pretrained Encoder Evaluation and Demo

## Overview

This repository provides a pretrained **EEG encoder** and two demonstration scripts developed by **DBbun LLC**.  

The model converts short segments of multi-channel EEG into **128-dimensional embeddings** that summarize the temporal and spectral structure of the signal.  

It was trained self-supervised on DBbun’s synthetic multi-patient EEG corpus sampled at **250 Hz** using the **10–20 montage (38 channels)**.  

All data are fully synthetic and privacy-safe.

---

## Key Features

- **2-second EEG encoder** trained at 250 Hz (38 channels).  
- Produces **128-D embeddings** suitable for:
  - Seizure vs. non-seizure discrimination  
  - EEG morphology clustering and visualization  
  - Similarity search and retrieval  
  - Anomaly and quality detection  
  - Downstream feature extraction for ML models  
- Includes demonstration scripts for embedding extraction and PCA-based visualization.

---

## Related Dataset

The encoder was trained and evaluated using **[DBbun/EEG-250Hz_v1.0](https://huggingface.co/datasets/DBbun/EEG-250Hz_v1.0)**.  

Each file represents one synthetic patient with 38-channel EEG sampled at 250 Hz.  

When available, `labels_sec` (0 = non-seizure, 1 = seizure) allows computing a **seizure fraction** or training evaluation probes.

---

## Repository Contents

| File | Description |
|------|--------------|
| `encoder_state.pt` | PyTorch weights (state dictionary). |
| `encoder_traced.pt` | TorchScript version for deployment. |
| `model_def.json` | Model configuration (architecture, channels, latent dimension, dropout, etc.). |
| **`DBbun_EEG_Encoder_Eval_Demo_v1.py`** | Baseline script: loads EEG files, runs the pretrained encoder, and exports embeddings. |
| **`DBbun_EEG_Encoder_Eval_Demo_v2.py`** | Extended demo: includes **PCA visualization** that colors seizure vs. non-seizure embeddings for interpretability. |

---

## Intended Use

This model and accompanying scripts are intended for **research, education, and development** purposes.  

They support reproducible EEG feature learning, visualization, and benchmarking without access to real patient data.  

They are **not intended for clinical diagnosis or medical use**.

---

## Suggested Applications

Evaluate representation quality on labeled synthetic EEG.  

Visualize clustering patterns of seizure vs. non-seizure embeddings using PCA.  

Train simple classifiers (e.g., logistic regression, SVM) on 128-D features for benchmarking.  

Apply the encoder as a fixed feature extractor in other time-series tasks.  

---

## What Users Can Do with the Model

The **DBbun EEG Encoder (250 Hz)** acts as a **feature extractor** — it converts raw EEG windows into compact **128-dimensional embeddings** that summarize the shape, rhythm, and energy distribution of brain signals.

### ✅ Typical Use Cases

| Goal | What the user does |
|------|--------------------|
| **Feature extraction** | Feed EEG windows (2 s × 38 channels × 250 Hz) into the encoder → obtain 128-D embeddings for each window. |
| **Classification** | Use the embeddings to train a simple model (e.g., logistic regression, random forest, MLP) for tasks such as seizure vs. non-seizure or artifact vs. clean. |
| **Visualization** | Reduce embeddings to 2-D (PCA or UMAP) to explore clusters or signal structure. |
| **Similarity search** | Build a FAISS or Annoy index to find EEG segments that resemble each other in latent space. |
| **Anomaly detection** | Identify rare or abnormal patterns by computing distances to nearest neighbors. |
| **Patient-level summaries** | Average embeddings across all windows from one patient to form a stable EEG “signature.” |

---

### 💾 Use of Precomputed Embeddings

Precomputed embeddings are optional and depend on the user’s objective:

| Scenario | Use precomputed embeddings? | Reason |
|-----------|-----------------------------|---------|
| **Quick exploration of results** | ✅ Yes | The file `demo_embeddings.npy` already contains 128-D features ready for clustering, visualization, or linear probes. |
| **Custom EEG data (real or synthetic)** | ❌ No | The pretrained encoder can be applied directly to new EEG windows to generate embeddings. |
| **Cross-model or cross-dataset comparison** | Optional | Both the provided embeddings and newly generated ones can be used for benchmarking and evaluation. |

---

## License

Licensed for non-clinical research and educational use.  

For commercial licensing inquiries, please contact **DBbun LLC**.