SPLADE distilbert-base-uncased trained on python docstring code pairs
This is a SPLADE Sparse Encoder model finetuned from distilbert/distilbert-base-uncased using the sentence-transformers library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space and can be used for semantic search and sparse retrieval.
Model Details
Model Description
- Model Type: SPLADE Sparse Encoder
- Base model: distilbert/distilbert-base-uncased
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 30522 dimensions
- Similarity Function: Dot Product
- Language: en
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Documentation: Sparse Encoder Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sparse Encoders on Hugging Face
Full Model Architecture
SparseEncoder(
(0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'DistilBertForMaskedLM'})
(1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SparseEncoder
# Download from the ๐ค Hub
model = SparseEncoder("pulkitmehtawork/sparse-distilbert-base-uncased-python-code-lightening")
# Run inference
sentences = [
'The weather is lovely today.',
"It's so sunny outside!",
'He drove to the stadium.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 30522]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[2148.8340, 1376.2744, 850.4404],
# [1376.2744, 2056.9260, 898.0439],
# [ 850.4404, 898.0439, 2509.7507]])
Training Details
Framework Versions
- Python: 3.10.10
- Sentence Transformers: 5.0.0
- Transformers: 4.53.0
- PyTorch: 2.7.0+cu128
- Accelerate: 1.8.1
- Datasets: 3.6.0
- Tokenizers: 0.21.2
Citation
BibTeX
- Downloads last month
- 22
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for pulkitmehtawork/sparse-distilbert-base-uncased-python-code-lightening
Base model
distilbert/distilbert-base-uncased