Update README.md
Browse files
README.md
CHANGED
|
@@ -21,7 +21,7 @@ metrics:
|
|
| 21 |
|
| 22 |
## Model Description
|
| 23 |
|
| 24 |
-
SODA-VEC embedding model trained with VICReg Exact loss function. This model implements the exact VICReg objective with invariance, variance, and covariance terms for biomedical text embeddings.
|
| 25 |
|
| 26 |
This model is part of the **SODA-VEC** (Scientific Open Domain Adaptation for Vector Embeddings) project, which focuses on creating high-quality embedding models for biomedical and life sciences text.
|
| 27 |
|
|
@@ -42,7 +42,7 @@ This model is part of the **SODA-VEC** (Scientific Open Domain Adaptation for Ve
|
|
| 42 |
|
| 43 |
### Training Procedure
|
| 44 |
|
| 45 |
-
**Loss Function**: VICReg Exact: exact VICReg objective with invariance (MSE), variance (std), and covariance losses
|
| 46 |
|
| 47 |
**Coefficients**: sim=25.0, std=25.0, cov=1.0
|
| 48 |
**Base Model**: `answerdotai/ModernBERT-base`
|
|
@@ -135,7 +135,7 @@ The model has been evaluated on comprehensive biomedical benchmarks including:
|
|
| 135 |
- **Field-Specific Separability**: Distinguishing between different biological fields
|
| 136 |
- **Semantic Search**: Retrieval quality on biomedical text corpora
|
| 137 |
|
| 138 |
-
For detailed evaluation results, see the [SODA-VEC benchmark notebooks](https://github.com/
|
| 139 |
|
| 140 |
## Intended Use
|
| 141 |
|
|
@@ -143,9 +143,6 @@ This model is designed for:
|
|
| 143 |
|
| 144 |
- **Biomedical Semantic Search**: Finding relevant papers, abstracts, or text passages
|
| 145 |
- **Scientific Text Similarity**: Computing similarity between biomedical texts
|
| 146 |
-
- **Information Retrieval**: Building search systems for scientific literature
|
| 147 |
-
- **Downstream Tasks**: As a base for fine-tuning on specific biomedical tasks
|
| 148 |
-
- **Research Applications**: Academic and research use in life sciences
|
| 149 |
|
| 150 |
## Limitations
|
| 151 |
|
|
@@ -163,13 +160,13 @@ If you use this model, please cite:
|
|
| 163 |
title = {SODA-VEC: Scientific Open Domain Adaptation for Vector Embeddings},
|
| 164 |
author = {EMBO},
|
| 165 |
year = {2024},
|
| 166 |
-
url = {https://github.com/
|
| 167 |
}
|
| 168 |
```
|
| 169 |
|
| 170 |
## Model Card Contact
|
| 171 |
|
| 172 |
-
For questions or issues, please open an issue on the [SODA-VEC GitHub repository](https://github.com/
|
| 173 |
|
| 174 |
---
|
| 175 |
|
|
|
|
| 21 |
|
| 22 |
## Model Description
|
| 23 |
|
| 24 |
+
SODA-VEC embedding model trained with [VICReg](https://arxiv.org/pdf/2105.04906) Exact loss function. This model implements the exact VICReg objective with invariance, variance, and covariance terms for biomedical text embeddings.
|
| 25 |
|
| 26 |
This model is part of the **SODA-VEC** (Scientific Open Domain Adaptation for Vector Embeddings) project, which focuses on creating high-quality embedding models for biomedical and life sciences text.
|
| 27 |
|
|
|
|
| 42 |
|
| 43 |
### Training Procedure
|
| 44 |
|
| 45 |
+
**Loss Function**: VICReg Exact: exact [VICReg](https://arxiv.org/pdf/2105.04906) objective with invariance (MSE), variance (std), and covariance losses
|
| 46 |
|
| 47 |
**Coefficients**: sim=25.0, std=25.0, cov=1.0
|
| 48 |
**Base Model**: `answerdotai/ModernBERT-base`
|
|
|
|
| 135 |
- **Field-Specific Separability**: Distinguishing between different biological fields
|
| 136 |
- **Semantic Search**: Retrieval quality on biomedical text corpora
|
| 137 |
|
| 138 |
+
For detailed evaluation results, see the [SODA-VEC benchmark notebooks](https://github.com/source-data/soda-vec).
|
| 139 |
|
| 140 |
## Intended Use
|
| 141 |
|
|
|
|
| 143 |
|
| 144 |
- **Biomedical Semantic Search**: Finding relevant papers, abstracts, or text passages
|
| 145 |
- **Scientific Text Similarity**: Computing similarity between biomedical texts
|
|
|
|
|
|
|
|
|
|
| 146 |
|
| 147 |
## Limitations
|
| 148 |
|
|
|
|
| 160 |
title = {SODA-VEC: Scientific Open Domain Adaptation for Vector Embeddings},
|
| 161 |
author = {EMBO},
|
| 162 |
year = {2024},
|
| 163 |
+
url = {https://github.com/source-data/soda-vec}
|
| 164 |
}
|
| 165 |
```
|
| 166 |
|
| 167 |
## Model Card Contact
|
| 168 |
|
| 169 |
+
For questions or issues, please open an issue on the [SODA-VEC GitHub repository](https://github.com/source-data/soda-vec).
|
| 170 |
|
| 171 |
---
|
| 172 |
|