|
--- |
|
library_name: transformers.js |
|
pipeline_tag: sentence-similarity |
|
tags: |
|
- sentence-transformers |
|
- transformers.js |
|
- onnx |
|
- biblical-search |
|
- semantic-search |
|
- embeddinggemma |
|
- fine-tuned |
|
license: apache-2.0 |
|
datasets: |
|
- biblical-text-pairs |
|
metrics: |
|
- accuracy@1: 12.00% |
|
- accuracy@3: 15.00% |
|
- accuracy@10: 31.00% |
|
language: |
|
- en |
|
--- |
|
|
|
# EmbeddingGemma-300M Fine-tuned for Biblical Text Search (ONNX) |
|
|
|
This is the ONNX version of our fine-tuned EmbeddingGemma-300M model specialized for biblical text search and retrieval. This version is optimized for web deployment using transformers.js. |
|
|
|
## Model Performance |
|
|
|
- **Accuracy@1**: 12.00% (13x improvement over base model) |
|
- **Accuracy@3**: 15.00% |
|
- **Accuracy@10**: 31.00% |
|
- **Training Steps**: 25 (optimal stopping point) |
|
- **Base Model Accuracy@1**: 0.91% |
|
|
|
## Usage with Transformers.js |
|
|
|
```javascript |
|
import { AutoTokenizer, AutoModel } from '@huggingface/transformers'; |
|
|
|
// Load the model |
|
const model = await AutoModel.from_pretrained('dpshade22/embeddinggemma-scripture-v1-onnx'); |
|
const tokenizer = await AutoTokenizer.from_pretrained('dpshade22/embeddinggemma-scripture-v1-onnx'); |
|
|
|
// Encode queries (use search_query: prefix) |
|
const query = "search_query: What is love?"; |
|
const query_embedding = await model.encode([query]); |
|
|
|
// Encode documents (use search_document: prefix) |
|
const document = "search_document: Love is patient and kind"; |
|
const doc_embedding = await model.encode([document]); |
|
``` |
|
|
|
## Prefixes |
|
|
|
For optimal performance, use these prefixes: |
|
|
|
- **Queries**: `"search_query: your question here"` |
|
- **Documents**: `"search_document: scripture text here"` |
|
|
|
## Model Details |
|
|
|
- **Base Model**: `google/embeddinggemma-300m` |
|
- **Training Data**: 26,276 biblical text pairs |
|
- **Training Steps**: 25 steps (optimal stopping point) |
|
- **Learning Rate**: 2.0e-04 |
|
- **Batch Size**: 8 |
|
- **Output Dimensions**: 768D (supports Matryoshka 384D, 128D) |
|
- **ONNX Conversion**: Using nixiesearch/onnx-convert specialized tool |
|
|
|
## Training Details |
|
|
|
- **Training Data**: 26,276 biblical text pairs |
|
- **Learning Rate**: 2.0e-04 |
|
- **Batch Size**: 8 |
|
- **Training Strategy**: Early stopping at 25 steps to prevent overfitting |
|
- **Output Dimensions**: 768D (supports Matryoshka 384D, 128D) |
|
|
|
## Intended Use |
|
|
|
This model is designed for: |
|
- Biblical text search and retrieval in web applications |
|
- Finding relevant scripture passages |
|
- Semantic similarity of religious texts |
|
- Question answering on biblical topics |
|
- Offline PWA applications using transformers.js |
|
|
|
## Conversion Details |
|
|
|
- **Converted using**: nixiesearch/onnx-convert specialized tool |
|
- **ONNX Opset**: 17 |
|
- **Optimization Level**: 1 |
|
- **Max difference from original**: 1.9e-05 (within acceptable tolerance) |
|
|
|
## Related Models |
|
|
|
- **Original PyTorch version**: dpshade22/embeddinggemma-scripture-v1 |
|
- **Base model**: google/embeddinggemma-300m |
|
- **Reference ONNX**: onnx-community/embeddinggemma-300m-ONNX |
|
|