youssefkhalil320's picture
Upload folder using huggingface_hub
b6070d6 verified
|
raw
history blame
14.9 kB
metadata
base_model: sentence-transformers/all-MiniLM-L6-v2
datasets:
  - youssefkhalil320/pairs_three_scores_v5
language:
  - en
library_name: sentence-transformers
license: apache-2.0
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:80000003
  - loss:CoSENTLoss
widget:
  - source_sentence: durable pvc swim ring
    sentences:
      - flaky croissant
      - urban shoes
      - warm drinks mug
  - source_sentence: iso mak retard capsules
    sentences:
      - savory baguette
      - shea butter body cream
      - softwheeled cruiser
  - source_sentence: love sandra potty
    sentences:
      - utensil holder
      - olive pants
      - headwear
  - source_sentence: dusky hair brush
    sentences:
      - back compartment laptop
      - rubber feet platter
      - honed blade knife
  - source_sentence: nkd skn
    sentences:
      - fruit fragrances nail polish remover
      - panini salmon
      - hand drawing bag

all-MiniLM-L6-v8-pair_score

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2 on the pairs_three_scores_v5 dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'nkd skn',
    'hand drawing bag',
    'panini salmon',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

pairs_three_scores_v5

  • Dataset: pairs_three_scores_v5 at 3d8c457
  • Size: 80,000,003 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 3 tokens
    • mean: 6.06 tokens
    • max: 12 tokens
    • min: 3 tokens
    • mean: 5.71 tokens
    • max: 13 tokens
    • min: 0.0
    • mean: 0.11
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    vanilla hair cream free of paraben hair mask 0.5
    nourishing shampoo cumin lemon tea 0.0
    safe materials pacifier facial serum 0.5
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Evaluation Dataset

pairs_three_scores_v5

  • Dataset: pairs_three_scores_v5 at 3d8c457
  • Size: 20,000,001 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 3 tokens
    • mean: 6.21 tokens
    • max: 12 tokens
    • min: 3 tokens
    • mean: 5.75 tokens
    • max: 12 tokens
    • min: 0.0
    • mean: 0.11
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    teddy bear toy long lasting cat food 0.0
    eva hair treatment fresh pineapple 0.0
    soft wave hair conditioner hybrid seat bike 0.0
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss
0.0002 100 10.8792
0.0003 200 10.9284
0.0005 300 10.6466
0.0006 400 10.841
0.0008 500 10.8094
0.0010 600 10.4323
0.0011 700 10.3032
0.0013 800 10.4006
0.0014 900 10.4743
0.0016 1000 10.2334
0.0018 1100 10.0135
0.0019 1200 9.7874
0.0021 1300 9.7419
0.0022 1400 9.7412
0.0024 1500 9.4585
0.0026 1600 9.5339
0.0027 1700 9.4345
0.0029 1800 9.1733
0.0030 1900 8.9952
0.0032 2000 8.9669
0.0034 2100 8.8152
0.0035 2200 8.7936
0.0037 2300 8.6771
0.0038 2400 8.4648
0.0040 2500 8.5764
0.0042 2600 8.4587
0.0043 2700 8.2966
0.0045 2800 8.2329
0.0046 2900 8.1415
0.0048 3000 8.0404
0.0050 3100 7.9698
0.0051 3200 7.9205
0.0053 3300 7.8314
0.0054 3400 7.8369
0.0056 3500 7.6403

Framework Versions

  • Python: 3.8.10
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.2
  • PyTorch: 2.4.1+cu118
  • Accelerate: 1.0.1
  • Datasets: 3.0.1
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}