File size: 2,915 Bytes
f1b065c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
library_name: transformers.js
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- transformers.js
- onnx
- biblical-search
- semantic-search
- embeddinggemma
- fine-tuned
license: apache-2.0
datasets:
- biblical-text-pairs
metrics:
- accuracy@1: 12.00%
- accuracy@3: 15.00%  
- accuracy@10: 31.00%
language:
- en
---

# EmbeddingGemma-300M Fine-tuned for Biblical Text Search (ONNX)

This is the ONNX version of our fine-tuned EmbeddingGemma-300M model specialized for biblical text search and retrieval. This version is optimized for web deployment using transformers.js.

## Model Performance

- **Accuracy@1**: 12.00% (13x improvement over base model)
- **Accuracy@3**: 15.00%
- **Accuracy@10**: 31.00%
- **Training Steps**: 25 (optimal stopping point)
- **Base Model Accuracy@1**: 0.91%

## Usage with Transformers.js

```javascript
import { AutoTokenizer, AutoModel } from '@huggingface/transformers';

// Load the model
const model = await AutoModel.from_pretrained('dpshade22/embeddinggemma-scripture-v1-onnx');
const tokenizer = await AutoTokenizer.from_pretrained('dpshade22/embeddinggemma-scripture-v1-onnx');

// Encode queries (use search_query: prefix)
const query = "search_query: What is love?";
const query_embedding = await model.encode([query]);

// Encode documents (use search_document: prefix)  
const document = "search_document: Love is patient and kind";
const doc_embedding = await model.encode([document]);
```

## Prefixes

For optimal performance, use these prefixes:

- **Queries**: `"search_query: your question here"`
- **Documents**: `"search_document: scripture text here"`

## Model Details

- **Base Model**: `google/embeddinggemma-300m`
- **Training Data**: 26,276 biblical text pairs
- **Training Steps**: 25 steps (optimal stopping point)
- **Learning Rate**: 2.0e-04
- **Batch Size**: 8
- **Output Dimensions**: 768D (supports Matryoshka 384D, 128D)
- **ONNX Conversion**: Using nixiesearch/onnx-convert specialized tool

## Training Details

- **Training Data**: 26,276 biblical text pairs
- **Learning Rate**: 2.0e-04
- **Batch Size**: 8
- **Training Strategy**: Early stopping at 25 steps to prevent overfitting
- **Output Dimensions**: 768D (supports Matryoshka 384D, 128D)

## Intended Use

This model is designed for:
- Biblical text search and retrieval in web applications
- Finding relevant scripture passages
- Semantic similarity of religious texts
- Question answering on biblical topics
- Offline PWA applications using transformers.js

## Conversion Details

- **Converted using**: nixiesearch/onnx-convert specialized tool
- **ONNX Opset**: 17
- **Optimization Level**: 1
- **Max difference from original**: 1.9e-05 (within acceptable tolerance)

## Related Models

- **Original PyTorch version**: dpshade22/embeddinggemma-scripture-v1
- **Base model**: google/embeddinggemma-300m
- **Reference ONNX**: onnx-community/embeddinggemma-300m-ONNX