Update README.md

272c462 verified 16 days ago

4.74 kB

	---
	license: other
	license_name: commercial
	license_link: LICENSE
	datasets:
	- DBbun/EEG-250Hz_v1.0
	language:
	- en
	pipeline_tag: feature-extraction
	---

	# DBbun EEG Encoder — Pretrained Encoder Evaluation and Demo

	## Overview

	This repository provides a pretrained EEG encoder and two demonstration scripts developed by DBbun LLC.

	The model converts short segments of multi-channel EEG into 128-dimensional embeddings that summarize the temporal and spectral structure of the signal.

	It was trained self-supervised on DBbun’s synthetic multi-patient EEG corpus sampled at 250 Hz using the 10–20 montage (38 channels).

	All data are fully synthetic and privacy-safe.

	---

	## Key Features

	- 2-second EEG encoder trained at 250 Hz (38 channels).
	- Produces 128-D embeddings suitable for:
	- Seizure vs. non-seizure discrimination
	- EEG morphology clustering and visualization
	- Similarity search and retrieval
	- Anomaly and quality detection
	- Downstream feature extraction for ML models
	- Includes demonstration scripts for embedding extraction and PCA-based visualization.

	---

	## Related Dataset

	The encoder was trained and evaluated using [DBbun/EEG-250Hz_v1.0](https://huggingface.co/datasets/DBbun/EEG-250Hz_v1.0).

	Each file represents one synthetic patient with 38-channel EEG sampled at 250 Hz.

	When available, `labels_sec` (0 = non-seizure, 1 = seizure) allows computing a seizure fraction or training evaluation probes.

	---

	## Repository Contents

	\| File \| Description \|
	\|------\|--------------\|
	\| `encoder_state.pt` \| PyTorch weights (state dictionary). \|
	\| `encoder_traced.pt` \| TorchScript version for deployment. \|
	\| `model_def.json` \| Model configuration (architecture, channels, latent dimension, dropout, etc.). \|
	\| `DBbun_EEG_Encoder_Eval_Demo_v1.py` \| Baseline script: loads EEG files, runs the pretrained encoder, and exports embeddings. \|
	\| `DBbun_EEG_Encoder_Eval_Demo_v2.py` \| Extended demo: includes PCA visualization that colors seizure vs. non-seizure embeddings for interpretability. \|

	---

	## Intended Use

	This model and accompanying scripts are intended for research, education, and development purposes.

	They support reproducible EEG feature learning, visualization, and benchmarking without access to real patient data.

	They are not intended for clinical diagnosis or medical use.

	---

	## Suggested Applications

	Evaluate representation quality on labeled synthetic EEG.

	Visualize clustering patterns of seizure vs. non-seizure embeddings using PCA.

	Train simple classifiers (e.g., logistic regression, SVM) on 128-D features for benchmarking.

	Apply the encoder as a fixed feature extractor in other time-series tasks.

	---

	## What Users Can Do with the Model

	The DBbun EEG Encoder (250 Hz) acts as a feature extractor — it converts raw EEG windows into compact 128-dimensional embeddings that summarize the shape, rhythm, and energy distribution of brain signals.

	### ✅ Typical Use Cases

	\| Goal \| What the user does \|
	\|------\|--------------------\|
	\| Feature extraction \| Feed EEG windows (2 s × 38 channels × 250 Hz) into the encoder → obtain 128-D embeddings for each window. \|
	\| Classification \| Use the embeddings to train a simple model (e.g., logistic regression, random forest, MLP) for tasks such as seizure vs. non-seizure or artifact vs. clean. \|
	\| Visualization \| Reduce embeddings to 2-D (PCA or UMAP) to explore clusters or signal structure. \|
	\| Similarity search \| Build a FAISS or Annoy index to find EEG segments that resemble each other in latent space. \|
	\| Anomaly detection \| Identify rare or abnormal patterns by computing distances to nearest neighbors. \|
	\| Patient-level summaries \| Average embeddings across all windows from one patient to form a stable EEG “signature.” \|

	---

	### 💾 Use of Precomputed Embeddings

	Precomputed embeddings are optional and depend on the user’s objective:

	\| Scenario \| Use precomputed embeddings? \| Reason \|
	\|-----------\|-----------------------------\|---------\|
	\| Quick exploration of results \| ✅ Yes \| The file `demo_embeddings.npy` already contains 128-D features ready for clustering, visualization, or linear probes. \|
	\| Custom EEG data (real or synthetic) \| ❌ No \| The pretrained encoder can be applied directly to new EEG windows to generate embeddings. \|
	\| Cross-model or cross-dataset comparison \| Optional \| Both the provided embeddings and newly generated ones can be used for benchmarking and evaluation. \|

	---

	## License

	Licensed for non-clinical research and educational use.

	For commercial licensing inquiries, please contact DBbun LLC.