yasserrmd commited on
Commit
c165868
·
verified ·
1 Parent(s): 3006713

Initial commit: Fine-tuned embedding-gemma-300m on GeoGPT-QA dataset

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
2_Dense/config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "in_features": 768,
3
+ "out_features": 3072,
4
+ "bias": false,
5
+ "activation_function": "torch.nn.modules.linear.Identity"
6
+ }
2_Dense/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:35a3f472babbb893d61a6b242edcb6dbe3fb7b582c6d194ac2638a61818b313c
3
+ size 9437272
3_Dense/config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "in_features": 3072,
3
+ "out_features": 768,
4
+ "bias": false,
5
+ "activation_function": "torch.nn.modules.linear.Identity"
6
+ }
3_Dense/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:17281bd8b6d9a7f2c14ebf4ecc332a739a6a8e09a43e551ff7168eb56d47d6ed
3
+ size 9437272
README.md ADDED
@@ -0,0 +1,453 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:20000
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: google/embeddinggemma-300m
11
+ widget:
12
+ - source_sentence: 'What is the Dialysis Symptom Index (DSI) and why is it important
13
+ for assessing symptoms in hemodialysis patients?
14
+
15
+ '
16
+ sentences:
17
+ - Proteinuria in LCDD patients is measured from a 24-hour urine collection. Nephrotic
18
+ range proteinuria (NRP) is defined as 24-hour proteinuria equal to or greater
19
+ than 3 grams. This indicates significant protein loss in the urine, which is a
20
+ characteristic feature of LCDD.
21
+ - The Dialysis Symptom Index (DSI) is a self-reported index that assesses the presence
22
+ and severity of symptoms in patients with end-stage renal disease undergoing hemodialysis.
23
+ It contains 30 items targeting specific physical and emotional symptoms. The DSI
24
+ is important for accurately assessing symptoms in hemodialysis patients, both
25
+ for research and practice purposes, and to improve the care provided to these
26
+ patients.
27
+ - A meta-analysis of trials that utilized high-dose NAC found that it was associated
28
+ with a lower risk of CIAKI compared to controls. The analysis showed no significant
29
+ heterogeneity or publication bias. However, the effectiveness of NAC for the prevention
30
+ of CIAKI is still uncertain, and no definitive conclusions can be drawn at the
31
+ current time. If a beneficial effect exists, it may be related to the use of higher
32
+ doses of NAC. Large clinical trials are needed to better define the clinical utility
33
+ of this agent.
34
+ - source_sentence: 'What is the role of Doppler ultrasound in kidney transplant recipients?
35
+
36
+ '
37
+ sentences:
38
+ - The most common diagnoses for emergency department visits among patients receiving
39
+ maintenance in-center hemodialysis are heart failure, throat and chest pain, and
40
+ abdominal pain. These conditions account for a significant proportion of emergency
41
+ department visits in this patient population.
42
+ - 'Complications associated with leflunomide treatment in kidney transplant patients
43
+ include leukopenia, thrombocytopenia, hepatotoxicity, and anemia. In this particular
44
+ study, 11 out of 28 patients (39%) developed complications while receiving leflunomide. '
45
+ - Doppler ultrasound is a non-invasive imaging method commonly used in kidney transplant
46
+ recipients. It helps verify the patency of vascular anastomoses and exclude thrombotic
47
+ complications in the early period after transplantation. Doppler ultrasound also
48
+ measures the spectrum of blood flow within the kidney graft's segmental arteries,
49
+ providing parameters of vascular resistance such as pulsatility and resistance
50
+ indices. These indices can help detect complications like delayed graft function
51
+ (DGF) and predict the severity and duration of acute tubular necrosis (ATN) in
52
+ kidney transplant recipients.
53
+ - source_sentence: 'What is the natural history of analgesic-associated nephropathy
54
+ (AAN)?
55
+
56
+ '
57
+ sentences:
58
+ - In glomerulonephritis associated with the nephrotic syndrome, a progressive decline
59
+ in proteinuria to less than 2 g/day (or less) is associated with a favorable prognostic
60
+ outlook, whether the reduction occurs spontaneously or in response to treatment.
61
+ This means that a decrease in proteinuria to a lower level is indicative of a
62
+ better prognosis in patients with glomerulonephritis and the nephrotic syndrome.
63
+ - Infection is the second-leading cause of death in hemodialysis patients. Hemodialysis
64
+ patients have various risk factors that impair the immune system, putting them
65
+ at increased risk of infection and its related mortality. Malnutrition is a major
66
+ contributor to the development and fatality of infections in CKD patients. Patients
67
+ with malnutrition have weakened immune systems, making them more susceptible to
68
+ infections and increasing the risk of infection-related mortality. Severe dietary
69
+ protein restriction, which is often necessary to manage hyperphosphatemia in CKD
70
+ patients, can lead to protein-energy malnutrition. However, the use of phosphate
71
+ binders to manage hyperphosphatemia can improve phosphate management with less
72
+ risk of malnutrition compared to dietary protein restriction. Therefore, phosphate
73
+ binders are expected to allow patients to maintain a better nutritional state
74
+ while decreasing the chance of infection, thereby reducing the risk of infection-related
75
+ mortality.
76
+ - The natural history of AAN is poorly understood, especially since the withdrawal
77
+ of phenacetin. However, it is known that AAN can progress slowly and lead to end-stage
78
+ chronic renal failure (ESCRF). The incidence of AAN appears to be declining, but
79
+ it remains an important and preventable cause of ESRF in many areas.
80
+ - source_sentence: What are the treatment options for membranous glomerulonephritis?
81
+ sentences:
82
+ - Chronic kidney disease (CKD) can impact the management of gout by limiting the
83
+ dosage or hampering the use of urate-lowering drugs (ULD), colchicine, and nonsteroidal
84
+ anti-inflammatory agents. The frequent prescription of diuretics in CKD patients
85
+ can also affect outcomes. Diuretics can increase serum urate levels and interfere
86
+ with the effectiveness of allopurinol, a commonly used ULD. This knowledge is
87
+ important for managing gout in patients with CKD who are taking diuretics.
88
+ - Treatment for membranous glomerulonephritis may include the use of corticosteroids,
89
+ such as metacorten or prednisone, to reduce inflammation and proteinuria. However,
90
+ in some cases, treatment may not be effective, and the disease may progress to
91
+ advanced or chronic stages.
92
+ - Some factors that may benefit twice-weekly HD treatment include a longer HD session
93
+ time, a higher spKt/V (a measure of dialysis adequacy), the use of high flux dialyzers,
94
+ and the use of ultrapure dialysate. These factors can contribute to optimal solute
95
+ clearance and improve outcomes for patients undergoing twice-weekly HD.
96
+ - source_sentence: How do some participants believe that reimbursement or compensation
97
+ for living kidney donors can help minimize disadvantage?
98
+ sentences:
99
+ - Urinary L-PGDS excretions have been found to be superior to other markers, including
100
+ urinary excretions of type-IV collagen, beta-2 microglobulin, and NAG, as well
101
+ as serum creatinine levels, in predicting renal injury in type-2 diabetes. Studies
102
+ have shown that urinary L-PGDS excretions better predict ≥30 mg/gCr albuminuria
103
+ in type-2 diabetes. The use of urinary L-PGDS excretions as a marker for renal
104
+ injury in type-2 diabetes is supported by its ability to reflect a slight change
105
+ in glomerular permeability and its positive correlation with albuminuria.
106
+ - The time in therapeutic range (TTR) of INR (International Normalized Ratio) is
107
+ an important factor in determining the risk of hemorrhagic and ischemic events
108
+ in hemodialysis patients. If the INR is below 1.5, there is an increased risk
109
+ of hemorrhagic events, while an INR above 5 increases the risk of ischemic events.
110
+ Maintaining the INR within the therapeutic range is challenging but crucial in
111
+ minimizing these risks.
112
+ - Some participants believe that reimbursement or compensation can effectively help
113
+ donors and recipients who are socioeconomically disadvantaged by removing financial
114
+ barriers to donation. They advocate for government subsidies or special paid leave
115
+ to support potential donors who may not be able to take leave or afford donation-related
116
+ expenses. The goal is to ensure that financial constraints do not penalize individuals
117
+ who are willing to donate.
118
+ pipeline_tag: sentence-similarity
119
+ library_name: sentence-transformers
120
+ ---
121
+
122
+ # SentenceTransformer based on google/embeddinggemma-300m
123
+
124
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
125
+
126
+ ## Model Details
127
+
128
+ ### Model Description
129
+ - **Model Type:** Sentence Transformer
130
+ - **Base model:** [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m) <!-- at revision c5cfa06e5e282a820e85d57f7fb053207494f41d -->
131
+ - **Maximum Sequence Length:** 2048 tokens
132
+ - **Output Dimensionality:** 768 dimensions
133
+ - **Similarity Function:** Cosine Similarity
134
+ <!-- - **Training Dataset:** Unknown -->
135
+ <!-- - **Language:** Unknown -->
136
+ <!-- - **License:** Unknown -->
137
+
138
+ ### Model Sources
139
+
140
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
141
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
142
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
143
+
144
+ ### Full Model Architecture
145
+
146
+ ```
147
+ SentenceTransformer(
148
+ (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
149
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
150
+ (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
151
+ (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
152
+ (4): Normalize()
153
+ )
154
+ ```
155
+
156
+ ## Usage
157
+
158
+ ### Direct Usage (Sentence Transformers)
159
+
160
+ First install the Sentence Transformers library:
161
+
162
+ ```bash
163
+ pip install -U sentence-transformers
164
+ ```
165
+
166
+ Then you can load this model and run inference.
167
+ ```python
168
+ from sentence_transformers import SentenceTransformer
169
+
170
+ # Download from the 🤗 Hub
171
+ model = SentenceTransformer("yasserrmd/nephrology-gemma-300m-emb")
172
+ # Run inference
173
+ queries = [
174
+ "How do some participants believe that reimbursement or compensation for living kidney donors can help minimize disadvantage?",
175
+ ]
176
+ documents = [
177
+ 'Some participants believe that reimbursement or compensation can effectively help donors and recipients who are socioeconomically disadvantaged by removing financial barriers to donation. They advocate for government subsidies or special paid leave to support potential donors who may not be able to take leave or afford donation-related expenses. The goal is to ensure that financial constraints do not penalize individuals who are willing to donate.',
178
+ 'The time in therapeutic range (TTR) of INR (International Normalized Ratio) is an important factor in determining the risk of hemorrhagic and ischemic events in hemodialysis patients. If the INR is below 1.5, there is an increased risk of hemorrhagic events, while an INR above 5 increases the risk of ischemic events. Maintaining the INR within the therapeutic range is challenging but crucial in minimizing these risks.',
179
+ 'Urinary L-PGDS excretions have been found to be superior to other markers, including urinary excretions of type-IV collagen, beta-2 microglobulin, and NAG, as well as serum creatinine levels, in predicting renal injury in type-2 diabetes. Studies have shown that urinary L-PGDS excretions better predict ≥30 mg/gCr albuminuria in type-2 diabetes. The use of urinary L-PGDS excretions as a marker for renal injury in type-2 diabetes is supported by its ability to reflect a slight change in glomerular permeability and its positive correlation with albuminuria.',
180
+ ]
181
+ query_embeddings = model.encode_query(queries)
182
+ document_embeddings = model.encode_document(documents)
183
+ print(query_embeddings.shape, document_embeddings.shape)
184
+ # [1, 768] [3, 768]
185
+
186
+ # Get the similarity scores for the embeddings
187
+ similarities = model.similarity(query_embeddings, document_embeddings)
188
+ print(similarities)
189
+ # tensor([[0.6341, 0.0019, 0.0465]])
190
+ ```
191
+
192
+ <!--
193
+ ### Direct Usage (Transformers)
194
+
195
+ <details><summary>Click to see the direct usage in Transformers</summary>
196
+
197
+ </details>
198
+ -->
199
+
200
+ <!--
201
+ ### Downstream Usage (Sentence Transformers)
202
+
203
+ You can finetune this model on your own dataset.
204
+
205
+ <details><summary>Click to expand</summary>
206
+
207
+ </details>
208
+ -->
209
+
210
+ <!--
211
+ ### Out-of-Scope Use
212
+
213
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
214
+ -->
215
+
216
+ <!--
217
+ ## Bias, Risks and Limitations
218
+
219
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
220
+ -->
221
+
222
+ <!--
223
+ ### Recommendations
224
+
225
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
226
+ -->
227
+
228
+ ## Training Details
229
+
230
+ ### Training Dataset
231
+
232
+ #### Unnamed Dataset
233
+
234
+ * Size: 20,000 training samples
235
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
236
+ * Approximate statistics based on the first 1000 samples:
237
+ | | sentence_0 | sentence_1 |
238
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
239
+ | type | string | string |
240
+ | details | <ul><li>min: 10 tokens</li><li>mean: 22.05 tokens</li><li>max: 56 tokens</li></ul> | <ul><li>min: 20 tokens</li><li>mean: 91.9 tokens</li><li>max: 281 tokens</li></ul> |
241
+ * Samples:
242
+ | sentence_0 | sentence_1 |
243
+ |:-----------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
244
+ | <code>How do the CKD-EPI and Japanese equations compare to Ccr and CGF in estimating renal function in cancer patients?<br></code> | <code>The CKD-EPI and Japanese equations provide more accurate estimates of renal function compared to 24-hour Ccr and CGF in cancer patients before and after chemotherapy with cisplatin. These new equations have lower bias and higher precision values, indicating better estimation of glomerular filtration rate (GFR). The CKD-EPI and Japanese equations were developed as better estimates of GFR than Ccr and CGF, which were mostly developed in chronic kidney disease (CKD) patients without cancer. The accuracy of the CKD-EPI and Japanese equations in estimating GFR in cancer patients is consistent with previous studies. Therefore, it is recommended to replace Ccr and CGF with these new equations for the evaluation of renal function in cancer patients undergoing cisplatin-containing chemotherapy.</code> |
245
+ | <code>What are the clinical phenotypes of Bartter-like syndrome?<br></code> | <code>Bartter-like syndrome can be divided into at least three different clinical phenotypes: classic Bartter syndrome, Gitelman syndrome, and antenatal (neonatal) Bartter syndrome. Classic Bartter syndrome and Gitelman syndrome have renal tubular hypokalemic alkalosis, while antenatal Bartter syndrome also has profound systemic manifestations such as polyhydramnios, premature delivery, severe water and salt wasting, hypokalemic metabolic alkalosis, severe hypercalciuria, and marked growth retardation.</code> |
246
+ | <code>What is granulomatous interstitial nephritis (GIN), and how frequently does it occur in patients with sarcoidosis?</code> | <code>Granulomatous interstitial nephritis (GIN) is a form of renal inflammation characterized by the presence of granulomas in the interstitial tissue of the kidneys. In patients with sarcoidosis, GIN is reportedly present in approximately one-third of patients with clinical evidence of renal disease. Post-mortem series have shown that between 7 and 27% of all patients with sarcoidosis may have GIN. It is important to note that GIN can occur in sarcoidosis patients even in the absence of obvious clinical renal disease.</code> |
247
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
248
+ ```json
249
+ {
250
+ "scale": 20.0,
251
+ "similarity_fct": "cos_sim",
252
+ "gather_across_devices": false
253
+ }
254
+ ```
255
+
256
+ ### Training Hyperparameters
257
+ #### Non-Default Hyperparameters
258
+
259
+ - `per_device_train_batch_size`: 6
260
+ - `per_device_eval_batch_size`: 6
261
+ - `num_train_epochs`: 1
262
+ - `multi_dataset_batch_sampler`: round_robin
263
+
264
+ #### All Hyperparameters
265
+ <details><summary>Click to expand</summary>
266
+
267
+ - `overwrite_output_dir`: False
268
+ - `do_predict`: False
269
+ - `eval_strategy`: no
270
+ - `prediction_loss_only`: True
271
+ - `per_device_train_batch_size`: 6
272
+ - `per_device_eval_batch_size`: 6
273
+ - `per_gpu_train_batch_size`: None
274
+ - `per_gpu_eval_batch_size`: None
275
+ - `gradient_accumulation_steps`: 1
276
+ - `eval_accumulation_steps`: None
277
+ - `torch_empty_cache_steps`: None
278
+ - `learning_rate`: 5e-05
279
+ - `weight_decay`: 0.0
280
+ - `adam_beta1`: 0.9
281
+ - `adam_beta2`: 0.999
282
+ - `adam_epsilon`: 1e-08
283
+ - `max_grad_norm`: 1
284
+ - `num_train_epochs`: 1
285
+ - `max_steps`: -1
286
+ - `lr_scheduler_type`: linear
287
+ - `lr_scheduler_kwargs`: {}
288
+ - `warmup_ratio`: 0.0
289
+ - `warmup_steps`: 0
290
+ - `log_level`: passive
291
+ - `log_level_replica`: warning
292
+ - `log_on_each_node`: True
293
+ - `logging_nan_inf_filter`: True
294
+ - `save_safetensors`: True
295
+ - `save_on_each_node`: False
296
+ - `save_only_model`: False
297
+ - `restore_callback_states_from_checkpoint`: False
298
+ - `no_cuda`: False
299
+ - `use_cpu`: False
300
+ - `use_mps_device`: False
301
+ - `seed`: 42
302
+ - `data_seed`: None
303
+ - `jit_mode_eval`: False
304
+ - `use_ipex`: False
305
+ - `bf16`: False
306
+ - `fp16`: False
307
+ - `fp16_opt_level`: O1
308
+ - `half_precision_backend`: auto
309
+ - `bf16_full_eval`: False
310
+ - `fp16_full_eval`: False
311
+ - `tf32`: None
312
+ - `local_rank`: 0
313
+ - `ddp_backend`: None
314
+ - `tpu_num_cores`: None
315
+ - `tpu_metrics_debug`: False
316
+ - `debug`: []
317
+ - `dataloader_drop_last`: False
318
+ - `dataloader_num_workers`: 0
319
+ - `dataloader_prefetch_factor`: None
320
+ - `past_index`: -1
321
+ - `disable_tqdm`: False
322
+ - `remove_unused_columns`: True
323
+ - `label_names`: None
324
+ - `load_best_model_at_end`: False
325
+ - `ignore_data_skip`: False
326
+ - `fsdp`: []
327
+ - `fsdp_min_num_params`: 0
328
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
329
+ - `fsdp_transformer_layer_cls_to_wrap`: None
330
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
331
+ - `parallelism_config`: None
332
+ - `deepspeed`: None
333
+ - `label_smoothing_factor`: 0.0
334
+ - `optim`: adamw_torch_fused
335
+ - `optim_args`: None
336
+ - `adafactor`: False
337
+ - `group_by_length`: False
338
+ - `length_column_name`: length
339
+ - `ddp_find_unused_parameters`: None
340
+ - `ddp_bucket_cap_mb`: None
341
+ - `ddp_broadcast_buffers`: False
342
+ - `dataloader_pin_memory`: True
343
+ - `dataloader_persistent_workers`: False
344
+ - `skip_memory_metrics`: True
345
+ - `use_legacy_prediction_loop`: False
346
+ - `push_to_hub`: False
347
+ - `resume_from_checkpoint`: None
348
+ - `hub_model_id`: None
349
+ - `hub_strategy`: every_save
350
+ - `hub_private_repo`: None
351
+ - `hub_always_push`: False
352
+ - `hub_revision`: None
353
+ - `gradient_checkpointing`: False
354
+ - `gradient_checkpointing_kwargs`: None
355
+ - `include_inputs_for_metrics`: False
356
+ - `include_for_metrics`: []
357
+ - `eval_do_concat_batches`: True
358
+ - `fp16_backend`: auto
359
+ - `push_to_hub_model_id`: None
360
+ - `push_to_hub_organization`: None
361
+ - `mp_parameters`:
362
+ - `auto_find_batch_size`: False
363
+ - `full_determinism`: False
364
+ - `torchdynamo`: None
365
+ - `ray_scope`: last
366
+ - `ddp_timeout`: 1800
367
+ - `torch_compile`: False
368
+ - `torch_compile_backend`: None
369
+ - `torch_compile_mode`: None
370
+ - `include_tokens_per_second`: False
371
+ - `include_num_input_tokens_seen`: False
372
+ - `neftune_noise_alpha`: None
373
+ - `optim_target_modules`: None
374
+ - `batch_eval_metrics`: False
375
+ - `eval_on_start`: False
376
+ - `use_liger_kernel`: False
377
+ - `liger_kernel_config`: None
378
+ - `eval_use_gather_object`: False
379
+ - `average_tokens_across_devices`: False
380
+ - `prompts`: None
381
+ - `batch_sampler`: batch_sampler
382
+ - `multi_dataset_batch_sampler`: round_robin
383
+ - `router_mapping`: {}
384
+ - `learning_rate_mapping`: {}
385
+
386
+ </details>
387
+
388
+ ### Training Logs
389
+ | Epoch | Step | Training Loss |
390
+ |:------:|:----:|:-------------:|
391
+ | 0.1500 | 500 | 0.0296 |
392
+ | 0.2999 | 1000 | 0.0138 |
393
+ | 0.4499 | 1500 | 0.0108 |
394
+ | 0.5999 | 2000 | 0.0107 |
395
+ | 0.7499 | 2500 | 0.0061 |
396
+ | 0.8998 | 3000 | 0.0052 |
397
+
398
+
399
+ ### Framework Versions
400
+ - Python: 3.12.11
401
+ - Sentence Transformers: 5.1.0
402
+ - Transformers: 4.56.1
403
+ - PyTorch: 2.8.0+cu128
404
+ - Accelerate: 1.10.1
405
+ - Datasets: 4.0.0
406
+ - Tokenizers: 0.22.0
407
+
408
+ ## Citation
409
+
410
+ ### BibTeX
411
+
412
+ #### Sentence Transformers
413
+ ```bibtex
414
+ @inproceedings{reimers-2019-sentence-bert,
415
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
416
+ author = "Reimers, Nils and Gurevych, Iryna",
417
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
418
+ month = "11",
419
+ year = "2019",
420
+ publisher = "Association for Computational Linguistics",
421
+ url = "https://arxiv.org/abs/1908.10084",
422
+ }
423
+ ```
424
+
425
+ #### MultipleNegativesRankingLoss
426
+ ```bibtex
427
+ @misc{henderson2017efficient,
428
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
429
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
430
+ year={2017},
431
+ eprint={1705.00652},
432
+ archivePrefix={arXiv},
433
+ primaryClass={cs.CL}
434
+ }
435
+ ```
436
+
437
+ <!--
438
+ ## Glossary
439
+
440
+ *Clearly define terms in order to be accessible across audiences.*
441
+ -->
442
+
443
+ <!--
444
+ ## Model Card Authors
445
+
446
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
447
+ -->
448
+
449
+ <!--
450
+ ## Model Card Contact
451
+
452
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
453
+ -->
added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "<image_soft_token>": 262144
3
+ }
config.json ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_sliding_window_pattern": 6,
3
+ "architectures": [
4
+ "Gemma3TextModel"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "attn_logit_softcapping": null,
9
+ "bos_token_id": 2,
10
+ "dtype": "float32",
11
+ "eos_token_id": 1,
12
+ "final_logit_softcapping": null,
13
+ "head_dim": 256,
14
+ "hidden_activation": "gelu_pytorch_tanh",
15
+ "hidden_size": 768,
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 1152,
18
+ "layer_types": [
19
+ "sliding_attention",
20
+ "sliding_attention",
21
+ "sliding_attention",
22
+ "sliding_attention",
23
+ "sliding_attention",
24
+ "full_attention",
25
+ "sliding_attention",
26
+ "sliding_attention",
27
+ "sliding_attention",
28
+ "sliding_attention",
29
+ "sliding_attention",
30
+ "full_attention",
31
+ "sliding_attention",
32
+ "sliding_attention",
33
+ "sliding_attention",
34
+ "sliding_attention",
35
+ "sliding_attention",
36
+ "full_attention",
37
+ "sliding_attention",
38
+ "sliding_attention",
39
+ "sliding_attention",
40
+ "sliding_attention",
41
+ "sliding_attention",
42
+ "full_attention"
43
+ ],
44
+ "max_position_embeddings": 2048,
45
+ "model_type": "gemma3_text",
46
+ "num_attention_heads": 3,
47
+ "num_hidden_layers": 24,
48
+ "num_key_value_heads": 1,
49
+ "pad_token_id": 0,
50
+ "query_pre_attn_scalar": 256,
51
+ "rms_norm_eps": 1e-06,
52
+ "rope_local_base_freq": 10000.0,
53
+ "rope_scaling": null,
54
+ "rope_theta": 1000000.0,
55
+ "sliding_window": 512,
56
+ "transformers_version": "4.56.1",
57
+ "use_bidirectional_attention": true,
58
+ "use_cache": true,
59
+ "vocab_size": 262144
60
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.1.0",
5
+ "transformers": "4.56.1",
6
+ "pytorch": "2.8.0+cu128"
7
+ },
8
+ "prompts": {
9
+ "query": "task: search result | query: ",
10
+ "document": "title: none | text: ",
11
+ "BitextMining": "task: search result | query: ",
12
+ "Clustering": "task: clustering | query: ",
13
+ "Classification": "task: classification | query: ",
14
+ "InstructionRetrieval": "task: code retrieval | query: ",
15
+ "MultilabelClassification": "task: classification | query: ",
16
+ "PairClassification": "task: sentence similarity | query: ",
17
+ "Reranking": "task: search result | query: ",
18
+ "Retrieval": "task: search result | query: ",
19
+ "Retrieval-query": "task: search result | query: ",
20
+ "Retrieval-document": "title: none | text: ",
21
+ "STS": "task: sentence similarity | query: ",
22
+ "Summarization": "task: summarization | query: "
23
+ },
24
+ "default_prompt_name": null,
25
+ "similarity_fn_name": "cosine"
26
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd003923e6e618cdda5547e7493e52f6ceb7fc31b895ee6fd9b9fed0a30b7fa8
3
+ size 1211486072
modules.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Dense",
18
+ "type": "sentence_transformers.models.Dense"
19
+ },
20
+ {
21
+ "idx": 3,
22
+ "name": "3",
23
+ "path": "3_Dense",
24
+ "type": "sentence_transformers.models.Dense"
25
+ },
26
+ {
27
+ "idx": 4,
28
+ "name": "4",
29
+ "path": "4_Normalize",
30
+ "type": "sentence_transformers.models.Normalize"
31
+ }
32
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 2048,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "boi_token": "<start_of_image>",
3
+ "bos_token": {
4
+ "content": "<bos>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ },
10
+ "eoi_token": "<end_of_image>",
11
+ "eos_token": {
12
+ "content": "<eos>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false
17
+ },
18
+ "image_token": "<image_soft_token>",
19
+ "pad_token": {
20
+ "content": "<pad>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false
25
+ },
26
+ "unk_token": {
27
+ "content": "<unk>",
28
+ "lstrip": false,
29
+ "normalized": false,
30
+ "rstrip": false,
31
+ "single_word": false
32
+ }
33
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:216e2a79606fe879c9f17c529c71cd241338407fd5646b595ffd3c4b9ea1d503
3
+ size 33385262
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
3
+ size 4689074
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff