File size: 2,955 Bytes
8c75261
 
 
 
b8d6a72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5275b57
 
 
b54558e
5275b57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b54558e
5275b57
 
 
 
 
92276f2
52d0f03
 
92276f2
 
 
5275b57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f35d9ab
 
 
 
 
 
 
 
 
 
 
92276f2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
license: mit
base_model:
- Xenova/distiluse-base-multilingual-cased-v2
pipeline_tag: feature-extraction
tags:
- feature-extraction
- sentence-embeddings
- sentence-transformers
- sentence-similarity
- semantic-search
- vector-search
- retrieval-augmented-generation
- multilingual
- cross-lingual
- low-resource
- merged-model
- combined-model
- tokenizer-embedded
- tokenizer-integrated
- standalone
- all-in-one
- quantized
- int8
- int8-quantization
- optimized
- efficient
- fast-inference
- low-latency
- lightweight
- small-model
- edge-ready
- arm64
- edge-device
- mobile-device
- on-device
- mobile-inference
- tablet
- smartphone
- embedded-ai
- onnx
- onnx-runtime
- onnx-model
- transformers
- MiniLM
- MiniLM-L12-v2
- paraphrase
- usecase-ready
- plug-and-play
- production-ready
- deployment-ready
- real-time
- fasttext
- distiluse

---

# 🧠 Unified Multilingual Distiluse Text Embedder (ONNX + Tokenizer Merged)

This is a highly optimized, quantized, and fully standalone model for **generating sentence embeddings** from **multilingual text**, including Ukrainian, English, Polish, and more.

Built upon `distiluse-base-multilingual-cased-v2`, the model has been:

- πŸ” **Merged with its tokenizer** into a single ONNX file
- βš™οΈ **Extended with a custom preprocessing layer**
- ⚑ **Quantized to INT8** and ARM64-ready
- πŸ§ͺ **Extensively tested across real-world NLP tasks**
- πŸ› οΈ **Bug-fixed** vs the original `sentence-transformers` quantized version that produced inaccurate cosine similarity

---

## πŸš€ Key Features

- 🧩 **Single-file architecture**: no need for external tokenizer, vocab, or `transformers` library.
- ⚑ **93% faster inference** on mobile compared to the original model.
- πŸ—£οΈ **Multilingual**: robust across many languages, including low-resource ones.
- 🧠 **Output = pure embeddings**: pass a string, get a 768-dim vector. That’s it.
- πŸ› οΈ **Ready for production**: small, fast, accurate, and easy to integrate.
- πŸ“± **Ideal for edge-AI, mobile, and offline scenarios.**

---

πŸ€– Author
@vlad-m-dev Built for edge-ai/phone/tablet offline
Telegram: https://t.me/dwight_schrute_engineer

---

## 🐍 Python Example
```python
import numpy as np
import onnxruntime as ort
from onnxruntime_extensions import get_library_path

sess_options = ort.SessionOptions()
sess_options.register_custom_ops_library(get_library_path())

session = ort.InferenceSession(
    'model.onnx',
    sess_options=sess_options,
    providers=['CPUExecutionProvider']
)

input_feed = {"text": np.asarray(['something..'])}
outputs = session.run(None, input_feed)
embedding = outputs[0]
```

---

## 🐍 JS Example
```JavaScript
const session = await InferenceSession.create(EMBEDDING_FULL_MODEL_PATH); 
const inputTensor = new Tensor('string', ['something..'], [1]); 
const feeds = { text: inputTensor };
const outputMap = await session.run(feeds);
const embedding = outputMap.text_embedding.data;