Matryoshka Representation Learning
Paper • 2205.13147 • Published • 26
How to use labpt/SLawEmbed with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("labpt/SLawEmbed")
sentences = [
"Sodišče sme na prošnjo obsojenca odločiti, da se obsodba izbriše iz kazenske evidence in da obsojenec velja za neobsojenega, če je potekla polovica z zakonom določenega roka, po poteku katerega se obsodba izbriše, če obsojenec v tem času ni storil novega kaznivega dejanja. Pri odločanju o izbrisu upošteva sodišče vedenje obsojenca po prestani kazni, naravo kaznivega dejanja in druge okoliščine, pomembne za izbris obsodbe.",
"Če dan izročitve stvari kupcu ni določen, mora prodajalec izročiti stvar v roku 15 dni po sklenitvi pogodbe, glede na naravo stvari in na druge okoliščine.",
"Upravljalci, ki so subjekti javnega sektorja, za namene raziskovanja posredujejo osebne podatke po tarifi, določeni za raziskovalne storitve. ",
"Sodišče po uradni dolžnosti izbriše obsodbo iz kazenske evidence, če storilec že dalj časa ni izvršil kaznivega dejanja, pri tem pa prav tako upošteva vedenje obsojenca po prestani kazni, naravo kaznivega dejanja in druge okoliščine, pomembne za izbris obsodbe."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
This model has been specifically finetuned for contradiction retrieval! It is therefore not suitable for regular similarity-based retrieval!
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'(1) Posest je neposredna dejanska oblast nad stvarjo (neposredna posest).\n(2) Posest ima tudi tisti, ki izvršuje dejansko oblast nad stvarjo prek koga drugega, ki\nima neposredno posest iz kakršnegakoli pravnega naslova (posredna posest).',
'Posameznik lahko isto stvar istočasno poseduje neposredno in posredno.',
'Sodišče nikoli ne more odločati o poslu, ki presega redno upravljanje, brez soglasja vseh solastnikov.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.5774, 0.4022],
# [0.5774, 1.0000, 0.2455],
# [0.4022, 0.2455, 1.0000]])
validation-devEmbeddingSimilarityEvaluator| Metric | Value |
|---|---|
| pearson_cosine | 0.0719 |
| spearman_cosine | 0.0193 |
anchor, positive, and label| anchor | positive | label | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| anchor | positive | label |
|---|---|---|
(1) Vsaka stranka v dvostranski pogodbi lahko prenese pogodbo nekomu tretjemu, |
S prenosom pogodbe preide pogodbeno razmerje, pri čemer |
1.0 |
(1) Vsaka stranka v dvostranski pogodbi lahko prenese pogodbo nekomu tretjemu, |
S prenosom pogodbe preide pogodbeno razmerje, pri čemer |
1.0 |
(1) Za škodo, ki jo povzroči delavec pri delu ali v zvezi z delom tretji osebi, odgovarja pravna ali fizična oseba, pri kateri je delavec delal takrat, ko je bila škoda povzročena, razen če dokaže, da je delavec v danih okoliščinah ravnal tako, kot je bilo treba. |
Oškodovanec lahko vedno zahteva odškonino tako od delodajalca kot od delavca, njuna odgovornost je solidarna. |
1.0 |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768
],
"matryoshka_weights": [
1
],
"n_dims_per_step": -1
}
anchor, positive, and label| anchor | positive | label | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| anchor | positive | label |
|---|---|---|
(1) Posest je neposredna dejanska oblast nad stvarjo (neposredna posest). |
Posameznik lahko isto stvar istočasno poseduje neposredno in posredno. |
1.0 |
(1) Posest je neposredna dejanska oblast nad stvarjo (neposredna posest). |
Posameznik lahko isto stvar istočasno poseduje neposredno in posredno. |
1.0 |
Upravljavec videonadzornega sistema, ki izvaja videonadzor javnih površin, mora v primeru, ko videonadzorni sistem posname dogodek, ki ogroža zdravje ali življenje posameznika, o tem nemudoma obvestiti policijo ali drug pristojni subjekt. |
Upravljavec videonadzora ni dolžan obvestiti policije, če posnetek pokaže nevaren dogodek. |
1.0 |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768
],
"matryoshka_weights": [
1
],
"n_dims_per_step": -1
}
eval_strategy: stepsper_device_train_batch_size: 16per_device_eval_batch_size: 16warmup_ratio: 0.1bf16: Truebatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16gradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseuse_cpu: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Nonegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_for_metrics: []eval_do_concat_batches: Truemp_parameters: auto_find_batch_size: Falsefull_determinism: Falseray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | Validation Loss | validation-dev_spearman_cosine |
|---|---|---|---|---|
| 0.3378 | 25 | - | 0.3447 | 0.0390 |
| 0.6757 | 50 | - | 0.2570 | -0.0466 |
| 1.0135 | 75 | - | 0.2282 | -0.0269 |
| 1.3514 | 100 | 0.3073 | 0.1797 | 0.0677 |
| 1.6892 | 125 | - | 0.2085 | 0.0184 |
| 2.0270 | 150 | - | 0.1725 | 0.0479 |
| 2.3649 | 175 | - | 0.1636 | 0.0183 |
| 2.7027 | 200 | 0.0371 | 0.1707 | 0.0193 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}