SentenceTransformer based on intfloat/multilingual-e5-base

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/multilingual-e5-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("codersan/newfa_e5base2")
# Run inference
sentences = [
    'مرزهای صفحه چیست؟برخی از انواع چیست؟',
    'مرزهای صفحه چیست؟',
    'اتانول چند ایزومر دارد؟',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 142,964 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 6 tokens
    • mean: 16.39 tokens
    • max: 90 tokens
    • min: 6 tokens
    • mean: 15.68 tokens
    • max: 57 tokens
  • Samples:
    anchor positive
    گاو یونجه می خورد گاو در حال چریدن است
    ماشینی به شکلی خطرناک از روی دختری می‌پرد. دختر با بی‌احتیاطی روی ماشین می‌پرد.
    چگونه می توانم کارتهای هدیه iTunes رایگان را در هند دریافت کنم؟ چگونه می توانم کارتهای هدیه iTunes رایگان دریافت کنم؟
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss
0.0224 100 0.0821
0.0448 200 0.0455
0.0671 300 0.0408
0.0895 400 0.0461
0.1119 500 0.0418
0.1343 600 0.0449
0.1567 700 0.0314
0.1791 800 0.0252
0.2014 900 0.0254
0.2238 1000 0.0341
0.2462 1100 0.0239
0.2686 1200 0.0308
0.2910 1300 0.0415
0.3133 1400 0.0386
0.3357 1500 0.027
0.3581 1600 0.0369
0.3805 1700 0.0346
0.4029 1800 0.0301
0.4252 1900 0.03
0.4476 2000 0.0179
0.4700 2100 0.035
0.4924 2200 0.0327
0.5148 2300 0.033
0.5372 2400 0.0272
0.5595 2500 0.0318
0.5819 2600 0.025
0.6043 2700 0.023
0.6267 2800 0.0294
0.6491 2900 0.0337
0.6714 3000 0.0274
0.6938 3100 0.0223
0.7162 3200 0.0384
0.7386 3300 0.0217
0.7610 3400 0.032
0.7833 3500 0.0309
0.8057 3600 0.024
0.8281 3700 0.0273
0.8505 3800 0.0245
0.8729 3900 0.0268
0.8953 4000 0.0322
0.9176 4100 0.0271
0.9400 4200 0.0316
0.9624 4300 0.0179
0.9848 4400 0.0294
1.0072 4500 0.0283
1.0295 4600 0.0171
1.0519 4700 0.017
1.0743 4800 0.0197
1.0967 4900 0.0215
1.1191 5000 0.02
1.1415 5100 0.0144
1.1638 5200 0.015
1.1862 5300 0.0084
1.2086 5400 0.0115
1.2310 5500 0.0143
1.2534 5600 0.0129
1.2757 5700 0.0165
1.2981 5800 0.0168
1.3205 5900 0.0233
1.3429 6000 0.0156
1.3653 6100 0.0207
1.3876 6200 0.0149
1.4100 6300 0.0134
1.4324 6400 0.0108
1.4548 6500 0.0118
1.4772 6600 0.0173
1.4996 6700 0.0171
1.5219 6800 0.0168
1.5443 6900 0.0144
1.5667 7000 0.0111
1.5891 7100 0.0117
1.6115 7200 0.0122
1.6338 7300 0.0143
1.6562 7400 0.0151
1.6786 7500 0.0152
1.7010 7600 0.012
1.7234 7700 0.0177
1.7457 7800 0.0172
1.7681 7900 0.016
1.7905 8000 0.0141
1.8129 8100 0.0112
1.8353 8200 0.011
1.8577 8300 0.0132
1.8800 8400 0.0127
1.9024 8500 0.0188
1.9248 8600 0.0196
1.9472 8700 0.0106
1.9696 8800 0.0108
1.9919 8900 0.0172
2.0143 9000 0.0116
2.0367 9100 0.0089
2.0591 9200 0.0096
2.0815 9300 0.0142
2.1038 9400 0.0112
2.1262 9500 0.0103
2.1486 9600 0.0077
2.1710 9700 0.0082
2.1934 9800 0.0066
2.2158 9900 0.0106
2.2381 10000 0.0072
2.2605 10100 0.0085
2.2829 10200 0.0085
2.3053 10300 0.015
2.3277 10400 0.0113
2.3500 10500 0.0118
2.3724 10600 0.0123
2.3948 10700 0.0071
2.4172 10800 0.0087
2.4396 10900 0.0056
2.4620 11000 0.0091
2.4843 11100 0.0116
2.5067 11200 0.0123
2.5291 11300 0.0108
2.5515 11400 0.0078
2.5739 11500 0.0072
2.5962 11600 0.0084
2.6186 11700 0.0066
2.6410 11800 0.0115
2.6634 11900 0.0088
2.6858 12000 0.008
2.7081 12100 0.0095
2.7305 12200 0.0108
2.7529 12300 0.0113
2.7753 12400 0.0086
2.7977 12500 0.0096
2.8201 12600 0.0093
2.8424 12700 0.0076
2.8648 12800 0.006
2.8872 12900 0.0124
2.9096 13000 0.0131
2.9320 13100 0.0103
2.9543 13200 0.0063
2.9767 13300 0.0067
2.9991 13400 0.0117

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.1
  • Transformers: 4.47.0
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.2.1
  • Datasets: 4.0.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
1
Safetensors
Model size
278M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for codersan/newfa_e5base2

Finetuned
(88)
this model