SPLADE distilbert-base-uncased trained on python docstring code pairs

This is a SPLADE Sparse Encoder model finetuned from distilbert/distilbert-base-uncased using the sentence-transformers library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space and can be used for semantic search and sparse retrieval.

Model Details

Model Description

  • Model Type: SPLADE Sparse Encoder
  • Base model: distilbert/distilbert-base-uncased
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 30522 dimensions
  • Similarity Function: Dot Product
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SparseEncoder(
  (0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'DistilBertForMaskedLM'})
  (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SparseEncoder

# Download from the ๐Ÿค— Hub
model = SparseEncoder("pulkitmehtawork/sparse-distilbert-base-uncased-python-code-lightening")
# Run inference
sentences = [
    'The weather is lovely today.',
    "It's so sunny outside!",
    'He drove to the stadium.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 30522]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[2148.8340, 1376.2744,  850.4404],
#         [1376.2744, 2056.9260,  898.0439],
#         [ 850.4404,  898.0439, 2509.7507]])

Training Details

Framework Versions

  • Python: 3.10.10
  • Sentence Transformers: 5.0.0
  • Transformers: 4.53.0
  • PyTorch: 2.7.0+cu128
  • Accelerate: 1.8.1
  • Datasets: 3.6.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Downloads last month
22
Safetensors
Model size
67M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for pulkitmehtawork/sparse-distilbert-base-uncased-python-code-lightening

Finetuned
(9173)
this model