youssefkhalil320's picture
Upload folder using huggingface_hub
23abf9f verified
|
raw
history blame
15.8 kB
metadata
base_model: sentence-transformers/all-MiniLM-L6-v2
datasets:
  - youssefkhalil320/pairs_three_scores_v5
language:
  - en
library_name: sentence-transformers
license: apache-2.0
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:80000003
  - loss:CoSENTLoss
widget:
  - source_sentence: durable pvc swim ring
    sentences:
      - flaky croissant
      - urban shoes
      - warm drinks mug
  - source_sentence: iso mak retard capsules
    sentences:
      - savory baguette
      - shea butter body cream
      - softwheeled cruiser
  - source_sentence: love sandra potty
    sentences:
      - utensil holder
      - olive pants
      - headwear
  - source_sentence: dusky hair brush
    sentences:
      - back compartment laptop
      - rubber feet platter
      - honed blade knife
  - source_sentence: nkd skn
    sentences:
      - fruit fragrances nail polish remover
      - panini salmon
      - hand drawing bag

all-MiniLM-L6-v8-pair_score

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2 on the pairs_three_scores_v5 dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'nkd skn',
    'hand drawing bag',
    'panini salmon',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

pairs_three_scores_v5

  • Dataset: pairs_three_scores_v5 at 3d8c457
  • Size: 80,000,003 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 3 tokens
    • mean: 6.06 tokens
    • max: 12 tokens
    • min: 3 tokens
    • mean: 5.71 tokens
    • max: 13 tokens
    • min: 0.0
    • mean: 0.11
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    vanilla hair cream free of paraben hair mask 0.5
    nourishing shampoo cumin lemon tea 0.0
    safe materials pacifier facial serum 0.5
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Evaluation Dataset

pairs_three_scores_v5

  • Dataset: pairs_three_scores_v5 at 3d8c457
  • Size: 20,000,001 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 3 tokens
    • mean: 6.21 tokens
    • max: 12 tokens
    • min: 3 tokens
    • mean: 5.75 tokens
    • max: 12 tokens
    • min: 0.0
    • mean: 0.11
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    teddy bear toy long lasting cat food 0.0
    eva hair treatment fresh pineapple 0.0
    soft wave hair conditioner hybrid seat bike 0.0
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss
0.0002 100 10.8792
0.0003 200 10.9284
0.0005 300 10.6466
0.0006 400 10.841
0.0008 500 10.8094
0.0010 600 10.4323
0.0011 700 10.3032
0.0013 800 10.4006
0.0014 900 10.4743
0.0016 1000 10.2334
0.0018 1100 10.0135
0.0019 1200 9.7874
0.0021 1300 9.7419
0.0022 1400 9.7412
0.0024 1500 9.4585
0.0026 1600 9.5339
0.0027 1700 9.4345
0.0029 1800 9.1733
0.0030 1900 8.9952
0.0032 2000 8.9669
0.0034 2100 8.8152
0.0035 2200 8.7936
0.0037 2300 8.6771
0.0038 2400 8.4648
0.0040 2500 8.5764
0.0042 2600 8.4587
0.0043 2700 8.2966
0.0045 2800 8.2329
0.0046 2900 8.1415
0.0048 3000 8.0404
0.0050 3100 7.9698
0.0051 3200 7.9205
0.0053 3300 7.8314
0.0054 3400 7.8369
0.0056 3500 7.6403
0.0058 3600 7.5842
0.0059 3700 7.5812
0.0061 3800 7.4335
0.0062 3900 7.4917
0.0064 4000 7.3204
0.0066 4100 7.2971
0.0067 4200 7.2233
0.0069 4300 7.2081
0.0070 4400 7.1364
0.0072 4500 7.0663
0.0074 4600 6.9601
0.0075 4700 6.9546
0.0077 4800 6.9019
0.0078 4900 6.8801
0.0080 5000 6.7734
0.0082 5100 6.7648
0.0083 5200 6.7498
0.0085 5300 6.6872
0.0086 5400 6.6264
0.0088 5500 6.579
0.0090 5600 6.6001
0.0091 5700 6.5971
0.0093 5800 6.4694
0.0094 5900 6.3983
0.0096 6000 6.4477

Framework Versions

  • Python: 3.8.10
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.2
  • PyTorch: 2.4.1+cu118
  • Accelerate: 1.0.1
  • Datasets: 3.0.1
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}