Feature Extraction
sentence-transformers
ONNX
Transformers
fastText
sentence-embeddings
sentence-similarity
semantic-search
vector-search
retrieval-augmented-generation
multilingual
cross-lingual
low-resource
merged-model
combined-model
tokenizer-embedded
tokenizer-integrated
standalone
all-in-one
quantized
int8
int8-quantization
optimized
efficient
fast-inference
low-latency
lightweight
small-model
edge-ready
arm64
edge-device
mobile-device
on-device
mobile-inference
tablet
smartphone
embedded-ai
onnx-runtime
onnx-model
MiniLM
MiniLM-L12-v2
paraphrase
usecase-ready
plug-and-play
production-ready
deployment-ready
real-time
distiluse
File size: 2,955 Bytes
8c75261 b8d6a72 5275b57 b54558e 5275b57 b54558e 5275b57 92276f2 52d0f03 92276f2 5275b57 f35d9ab 92276f2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
---
license: mit
base_model:
- Xenova/distiluse-base-multilingual-cased-v2
pipeline_tag: feature-extraction
tags:
- feature-extraction
- sentence-embeddings
- sentence-transformers
- sentence-similarity
- semantic-search
- vector-search
- retrieval-augmented-generation
- multilingual
- cross-lingual
- low-resource
- merged-model
- combined-model
- tokenizer-embedded
- tokenizer-integrated
- standalone
- all-in-one
- quantized
- int8
- int8-quantization
- optimized
- efficient
- fast-inference
- low-latency
- lightweight
- small-model
- edge-ready
- arm64
- edge-device
- mobile-device
- on-device
- mobile-inference
- tablet
- smartphone
- embedded-ai
- onnx
- onnx-runtime
- onnx-model
- transformers
- MiniLM
- MiniLM-L12-v2
- paraphrase
- usecase-ready
- plug-and-play
- production-ready
- deployment-ready
- real-time
- fasttext
- distiluse
---
# π§ Unified Multilingual Distiluse Text Embedder (ONNX + Tokenizer Merged)
This is a highly optimized, quantized, and fully standalone model for **generating sentence embeddings** from **multilingual text**, including Ukrainian, English, Polish, and more.
Built upon `distiluse-base-multilingual-cased-v2`, the model has been:
- π **Merged with its tokenizer** into a single ONNX file
- βοΈ **Extended with a custom preprocessing layer**
- β‘ **Quantized to INT8** and ARM64-ready
- π§ͺ **Extensively tested across real-world NLP tasks**
- π οΈ **Bug-fixed** vs the original `sentence-transformers` quantized version that produced inaccurate cosine similarity
---
## π Key Features
- π§© **Single-file architecture**: no need for external tokenizer, vocab, or `transformers` library.
- β‘ **93% faster inference** on mobile compared to the original model.
- π£οΈ **Multilingual**: robust across many languages, including low-resource ones.
- π§ **Output = pure embeddings**: pass a string, get a 768-dim vector. Thatβs it.
- π οΈ **Ready for production**: small, fast, accurate, and easy to integrate.
- π± **Ideal for edge-AI, mobile, and offline scenarios.**
---
π€ Author
@vlad-m-dev Built for edge-ai/phone/tablet offline
Telegram: https://t.me/dwight_schrute_engineer
---
## π Python Example
```python
import numpy as np
import onnxruntime as ort
from onnxruntime_extensions import get_library_path
sess_options = ort.SessionOptions()
sess_options.register_custom_ops_library(get_library_path())
session = ort.InferenceSession(
'model.onnx',
sess_options=sess_options,
providers=['CPUExecutionProvider']
)
input_feed = {"text": np.asarray(['something..'])}
outputs = session.run(None, input_feed)
embedding = outputs[0]
```
---
## π JS Example
```JavaScript
const session = await InferenceSession.create(EMBEDDING_FULL_MODEL_PATH);
const inputTensor = new Tensor('string', ['something..'], [1]);
const feeds = { text: inputTensor };
const outputMap = await session.run(feeds);
const embedding = outputMap.text_embedding.data; |