Medical Embedding
Collection
12 items
•
Updated
•
3
This is a sentence-transformers model finetuned from google/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
(3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
(4): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("yasserrmd/nephrology-gemma-300m-emb")
# Run inference
queries = [
"How do some participants believe that reimbursement or compensation for living kidney donors can help minimize disadvantage?",
]
documents = [
'Some participants believe that reimbursement or compensation can effectively help donors and recipients who are socioeconomically disadvantaged by removing financial barriers to donation. They advocate for government subsidies or special paid leave to support potential donors who may not be able to take leave or afford donation-related expenses. The goal is to ensure that financial constraints do not penalize individuals who are willing to donate.',
'The time in therapeutic range (TTR) of INR (International Normalized Ratio) is an important factor in determining the risk of hemorrhagic and ischemic events in hemodialysis patients. If the INR is below 1.5, there is an increased risk of hemorrhagic events, while an INR above 5 increases the risk of ischemic events. Maintaining the INR within the therapeutic range is challenging but crucial in minimizing these risks.',
'Urinary L-PGDS excretions have been found to be superior to other markers, including urinary excretions of type-IV collagen, beta-2 microglobulin, and NAG, as well as serum creatinine levels, in predicting renal injury in type-2 diabetes. Studies have shown that urinary L-PGDS excretions better predict ≥30 mg/gCr albuminuria in type-2 diabetes. The use of urinary L-PGDS excretions as a marker for renal injury in type-2 diabetes is supported by its ability to reflect a slight change in glomerular permeability and its positive correlation with albuminuria.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.6341, 0.0019, 0.0465]])
sentence_0
and sentence_1
sentence_0 | sentence_1 | |
---|---|---|
type | string | string |
details |
|
|
sentence_0 | sentence_1 |
---|---|
How do the CKD-EPI and Japanese equations compare to Ccr and CGF in estimating renal function in cancer patients? |
The CKD-EPI and Japanese equations provide more accurate estimates of renal function compared to 24-hour Ccr and CGF in cancer patients before and after chemotherapy with cisplatin. These new equations have lower bias and higher precision values, indicating better estimation of glomerular filtration rate (GFR). The CKD-EPI and Japanese equations were developed as better estimates of GFR than Ccr and CGF, which were mostly developed in chronic kidney disease (CKD) patients without cancer. The accuracy of the CKD-EPI and Japanese equations in estimating GFR in cancer patients is consistent with previous studies. Therefore, it is recommended to replace Ccr and CGF with these new equations for the evaluation of renal function in cancer patients undergoing cisplatin-containing chemotherapy. |
What are the clinical phenotypes of Bartter-like syndrome? |
Bartter-like syndrome can be divided into at least three different clinical phenotypes: classic Bartter syndrome, Gitelman syndrome, and antenatal (neonatal) Bartter syndrome. Classic Bartter syndrome and Gitelman syndrome have renal tubular hypokalemic alkalosis, while antenatal Bartter syndrome also has profound systemic manifestations such as polyhydramnios, premature delivery, severe water and salt wasting, hypokalemic metabolic alkalosis, severe hypercalciuria, and marked growth retardation. |
What is granulomatous interstitial nephritis (GIN), and how frequently does it occur in patients with sarcoidosis? |
Granulomatous interstitial nephritis (GIN) is a form of renal inflammation characterized by the presence of granulomas in the interstitial tissue of the kidneys. In patients with sarcoidosis, GIN is reportedly present in approximately one-third of patients with clinical evidence of renal disease. Post-mortem series have shown that between 7 and 27% of all patients with sarcoidosis may have GIN. It is important to note that GIN can occur in sarcoidosis patients even in the absence of obvious clinical renal disease. |
MultipleNegativesRankingLoss
with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false
}
per_device_train_batch_size
: 6per_device_eval_batch_size
: 6num_train_epochs
: 1multi_dataset_batch_sampler
: round_robinoverwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: noprediction_loss_only
: Trueper_device_train_batch_size
: 6per_device_eval_batch_size
: 6per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config
: Nonedeepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torch_fusedoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsehub_revision
: Nonegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
: auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseliger_kernel_config
: Noneeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robinrouter_mapping
: {}learning_rate_mapping
: {}Epoch | Step | Training Loss |
---|---|---|
0.1500 | 500 | 0.0296 |
0.2999 | 1000 | 0.0138 |
0.4499 | 1500 | 0.0108 |
0.5999 | 2000 | 0.0107 |
0.7499 | 2500 | 0.0061 |
0.8998 | 3000 | 0.0052 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
google/embeddinggemma-300m