SentenceTransformer based on cambridgeltl/SapBERT-from-PubMedBERT-fulltext

This is a sentence-transformers model finetuned from cambridgeltl/SapBERT-from-PubMedBERT-fulltext. It maps sentences & paragraphs to a 1536-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("yyzheng00/sapbert_cls_lora_triplet")
# Run inference
sentences = [
    '|Product containing rifampicin (medicinal product)| + |Product manufactured as parenteral dose form (product)| : |Has manufactured dose form (attribute)| = |Parenteral dose form (dose form)|, { |Has active ingredient (attribute)| = |Rifampicin (substance)| }',
    'Rifampin in parenteral dosage form (medicinal product form)',
    'Product containing tocilizumab in parenteral dose form (medicinal product form)',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1536]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.9728

Training Details

Training Dataset

Unnamed Dataset

  • Size: 600,000 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 7 tokens
    • mean: 46.82 tokens
    • max: 416 tokens
    • min: 6 tokens
    • mean: 12.36 tokens
    • max: 34 tokens
    • min: 7 tokens
    • mean: 21.46 tokens
    • max: 321 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    Vassar-Culling stain method (procedure) Vassar-Culling stain (procedure) Durazol red stain method (procedure)
    Product containing sodium iodide (medicinal product) +
    Product containing lorazepam (medicinal product) +
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 0.2
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • num_train_epochs: 1
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand
Epoch Step Training Loss eval_cosine_accuracy
0.0067 500 0.2021 -
0.0133 1000 0.1845 -
0.02 1500 0.1834 -
0.0267 2000 0.1352 0.9591
0.0333 2500 0.1107 -
0.04 3000 0.1072 -
0.0467 3500 0.15 -
0.0533 4000 0.1009 0.9670
0.06 4500 0.1128 -
0.0667 5000 0.1066 -
0.0733 5500 0.0963 -
0.08 6000 0.0882 0.9682
0.0867 6500 0.0786 -
0.0933 7000 0.0803 -
0.1 7500 0.0708 -
0.1067 8000 0.0544 0.9682
0.1133 8500 0.0587 -
0.12 9000 0.0489 -
0.1267 9500 0.0418 -
0.1333 10000 0.0339 0.9669
0.14 10500 0.0358 -
0.1467 11000 0.0316 -
0.1533 11500 0.0305 -
0.16 12000 0.0233 0.9686
0.1667 12500 0.0308 -
0.1733 13000 0.0261 -
0.18 13500 0.0264 -
0.1867 14000 0.0311 0.9678
0.1933 14500 0.0241 -
0.2 15000 0.0261 -
0.2067 15500 0.0243 -
0.2133 16000 0.0252 0.9699
0.22 16500 0.0235 -
0.2267 17000 0.0226 -
0.2333 17500 0.0223 -
0.24 18000 0.0259 0.9706
0.2467 18500 0.022 -
0.2533 19000 0.0237 -
0.26 19500 0.0236 -
0.2667 20000 0.0241 0.9701
0.2733 20500 0.0235 -
0.28 21000 0.023 -
0.2867 21500 0.0235 -
0.2933 22000 0.0226 0.9717
0.3 22500 0.0225 -
0.3067 23000 0.0206 -
0.3133 23500 0.0206 -
0.32 24000 0.0208 0.9719
0.3267 24500 0.0192 -
0.3333 25000 0.0223 -
0.34 25500 0.0199 -
0.3467 26000 0.0201 0.9715
0.3533 26500 0.0166 -
0.36 27000 0.018 -
0.3667 27500 0.0201 -
0.3733 28000 0.0193 0.9722
0.38 28500 0.0228 -
0.3867 29000 0.0237 -
0.3933 29500 0.0218 -
0.4 30000 0.0205 0.9718
0.4067 30500 0.0202 -
0.4133 31000 0.0226 -
0.42 31500 0.0216 -
0.4267 32000 0.0181 0.9719
0.4333 32500 0.0214 -
0.44 33000 0.0175 -
0.4467 33500 0.0195 -
0.4533 34000 0.0189 0.9720
0.46 34500 0.0167 -
0.4667 35000 0.0175 -
0.4733 35500 0.0185 -
0.48 36000 0.0166 0.9723
0.4867 36500 0.0215 -
0.4933 37000 0.0168 -
0.5 37500 0.0154 -
0.5067 38000 0.0197 0.9700
0.5133 38500 0.0203 -
0.52 39000 0.02 -
0.5267 39500 0.0193 -
0.5333 40000 0.0156 0.9723
0.54 40500 0.0175 -
0.5467 41000 0.0168 -
0.5533 41500 0.0159 -
0.56 42000 0.02 0.9723
0.5667 42500 0.0147 -
0.5733 43000 0.0157 -
0.58 43500 0.0204 -
0.5867 44000 0.0193 0.9715
0.5933 44500 0.0167 -
0.6 45000 0.0147 -
0.6067 45500 0.0166 -
0.6133 46000 0.016 0.9721
0.62 46500 0.0166 -
0.6267 47000 0.0178 -
0.6333 47500 0.0151 -
0.64 48000 0.0172 0.9713
0.6467 48500 0.0147 -
0.6533 49000 0.0173 -
0.66 49500 0.0161 -
0.6667 50000 0.0189 0.9715
0.6733 50500 0.0186 -
0.68 51000 0.0166 -
0.6867 51500 0.0164 -
0.6933 52000 0.0188 0.9724
0.7 52500 0.0174 -
0.7067 53000 0.0166 -
0.7133 53500 0.0176 -
0.72 54000 0.0165 0.9727
0.7267 54500 0.0169 -
0.7333 55000 0.0172 -
0.74 55500 0.0166 -
0.7467 56000 0.0166 0.9729
0.7533 56500 0.016 -
0.76 57000 0.0182 -
0.7667 57500 0.0172 -
0.7733 58000 0.0173 0.9730
0.78 58500 0.0149 -
0.7867 59000 0.0159 -
0.7933 59500 0.0147 -
0.8 60000 0.0153 0.9725
0.8067 60500 0.0149 -
0.8133 61000 0.0162 -
0.82 61500 0.0154 -
0.8267 62000 0.0174 0.9725
0.8333 62500 0.0157 -
0.84 63000 0.0167 -
0.8467 63500 0.0172 -
0.8533 64000 0.0155 0.9720
0.86 64500 0.0171 -
0.8667 65000 0.0144 -
0.8733 65500 0.0144 -
0.88 66000 0.0189 0.9722
0.8867 66500 0.018 -
0.8933 67000 0.015 -
0.9 67500 0.0167 -
0.9067 68000 0.0145 0.9727
0.9133 68500 0.0165 -
0.92 69000 0.017 -
0.9267 69500 0.0145 -
0.9333 70000 0.0168 0.9726
0.94 70500 0.0133 -
0.9467 71000 0.0132 -
0.9533 71500 0.0157 -
0.96 72000 0.016 0.9729
0.9667 72500 0.0182 -
0.9733 73000 0.0159 -
0.98 73500 0.0162 -
0.9867 74000 0.0161 0.9730
0.9933 74500 0.0173 -
1.0 75000 0.0189 0.9728

Framework Versions

  • Python: 3.11.1
  • Sentence Transformers: 4.1.0
  • Transformers: 4.47.0
  • PyTorch: 2.1.1+cu121
  • Accelerate: 1.2.0
  • Datasets: 2.18.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yyzheng00/sapbert_cls_lora_triplet

Finetuned
(9)
this model

Evaluation results