SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2 on the parquet dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • parquet

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False, 'architecture': 'MPNetModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'DCNN and LDARFRFE Based ShortTerm Electricity Load and Price Forecasting.In this paper Deep Convolutional Neural Network DCNN is proposed for short term electricity load and price forecasting. Extracting useful information from data and then using that information for prediction is a challenging task. This paper presents a model consisting of two stages feature engineering and prediction. Feature engineering comprises of Feature Extraction FE and Feature Selection FS. For FS this paper proposes a technique that is combination of Random Forest RF and Recursive Feature Elimination RFE. The proposed technique is used for feature redundancy removal and dimensionality reduction. After finding the useful features DCNN is used for electricity price and load forecasting. DCNN performance is compared with Convolutional Neural Network CNN and Support Vector Classifier SVC models. Using the forecasting models dayahead and the week ahead forecasting is done for electricity price and load. To evaluate the CNN SVC and DCNN models real electricity market data is used. Mean Absolute Error MAE and Root Mean Square Error RMSE are used to evaluate the performance of the models. DCNN outperforms compared models by yielding lesser errors.',
    'ShortTerm Electricity Load and Price Forecasting using Enhanced KNN.In this paper we introduced a new enhanced technique to resolve the issue of electricity price and load forecasting. In Smart Grids SGs Price and load forecasting is the major issue. Framework of enhanced technique comprises of classification and feature engineering. Feature engineering comprises of feature selection and feature extraction. Decision Tree Regression DTR is used for feature selection. Recursive Feature Elimination RFE is used for feature selection which eliminates the redundancy of features. The second step of feature engineering feature extraction is done using Singular Value Decomposition SVD which reduces the dimensionality of features. Last step is to predict the load and forecast. For forecasting electricity load and price two existing techniques KNearest Neighbors KNN and MultiLayer Perceptron MLP and a newly proposed technique known as Enhanced KNN EKNN is being used. The proposed technique outperforms than MLP and KNN in terms of accuracy. KNN is working on nonparametric method which is used for classification and regression.',
    'Death Ground.Death Ground is a competitive musical installationgame for two players. The work is designed to provide the framework for the playersparticipants in which to perform gamemediated musical gestures against eachother. The main mechanic involves destroying the other playeru0027s avatar by outmaneuvering and using audio weapons and improvised musical actions against it. These weapons are spawned in an enclosed area during the performance and can be used by whoever is collects them first. There is a multitude of such powerups all of which have different properties such as speed boost additional damage ground traps and so on. All of these weapons affect the sound and sonic textures that each of the avatars produce. Additionally the players can use elements of the environment such as platforms obstructions and elevation in order to gain competitive advantage or position themselves strategically to access first the spawned powerups.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7770, 0.0657],
#         [0.7770, 1.0000, 0.0281],
#         [0.0657, 0.0281, 1.0000]])

Evaluation

Metrics

Triplet

  • Datasets: dblp-aminer-50k-dev and dblp-aminer-50k-test
  • Evaluated with TripletEvaluator
Metric dblp-aminer-50k-dev dblp-aminer-50k-test
cosine_accuracy 1.0 1.0

Training Details

Training Dataset

parquet

  • Dataset: parquet
  • Size: 46,900 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 132 tokens
    • mean: 232.79 tokens
    • max: 384 tokens
    • min: 123 tokens
    • mean: 247.67 tokens
    • max: 384 tokens
    • min: 69 tokens
    • mean: 218.48 tokens
    • max: 384 tokens
  • Samples:
    anchor positive negative
    The longterm effect of media violence exposure on aggression of youngsters.Abstract The effect of media violence on aggression has always been a trending issue and a better understanding of the psychological mechanism of the impact of media violence on youth aggression is an extremely important research topic for preventing the negative impacts of media violence and juvenile delinquency. From the perspective of anger this study explored the longterm effect of different degrees of media violence exposure on the aggression of youngsters as well as the role of aggressive emotions. The studies found that individuals with a high degree of media violence exposure HMVE exhibited higher levels of proactive aggression in both irritation situations and higher levels of reactive aggression in lowirritation situations than did participants with a low degree of media violence exposure LMVE. After being provoked the anger of all participants was significantly increased and the anger and proactive ag... Cyberbullying perpetration and victimization among children and adolescents A systematic review of longitudinal studies.Abstract In this systematic review of exclusively longitudinal studies on cyberbullying perpetration and victimization among adolescents we identified 76 original longitudinal studies published between 2007 and 2017. The majority of them approached middle school students in two waves at 6 or 12 months apart. Prevalence rates for cyberbullying perpetration varied between 5.3 and 66.2 percent and for cyberbullying victimization between 1.9 and 84.0 percent. Personrelated factors e.g. traditional bullying internalizing problems were among the most studied concepts primarily examined as significant risk factors. Evidence on the causal relationships with mediarelated factors e.g. problematic Internet use and environmental factors e.g. parent and peer relations was scarce. This review identified gaps for future longitudinal research on cyberbullying perpetration and victimi... Any small multiplicative subgroup is not a sumset.Abstract We prove that for an arbitrary e u003e 0 and any multiplicative subgroup Γ F p 1 Γ p 2 3 e there are no sets B C F p with B C u003e 1 such that Γ B C . Also we obtain that for 1 Γ p 6 7 e and any ξ 0 there is no a set B such that ξ Γ 1 B B .
    The longterm effect of media violence exposure on aggression of youngsters.Abstract The effect of media violence on aggression has always been a trending issue and a better understanding of the psychological mechanism of the impact of media violence on youth aggression is an extremely important research topic for preventing the negative impacts of media violence and juvenile delinquency. From the perspective of anger this study explored the longterm effect of different degrees of media violence exposure on the aggression of youngsters as well as the role of aggressive emotions. The studies found that individuals with a high degree of media violence exposure HMVE exhibited higher levels of proactive aggression in both irritation situations and higher levels of reactive aggression in lowirritation situations than did participants with a low degree of media violence exposure LMVE. After being provoked the anger of all participants was significantly increased and the anger and proactive ag... Cyberbullying perpetration and victimization among children and adolescents A systematic review of longitudinal studies.Abstract In this systematic review of exclusively longitudinal studies on cyberbullying perpetration and victimization among adolescents we identified 76 original longitudinal studies published between 2007 and 2017. The majority of them approached middle school students in two waves at 6 or 12 months apart. Prevalence rates for cyberbullying perpetration varied between 5.3 and 66.2 percent and for cyberbullying victimization between 1.9 and 84.0 percent. Personrelated factors e.g. traditional bullying internalizing problems were among the most studied concepts primarily examined as significant risk factors. Evidence on the causal relationships with mediarelated factors e.g. problematic Internet use and environmental factors e.g. parent and peer relations was scarce. This review identified gaps for future longitudinal research on cyberbullying perpetration and victimi... Unmanned agricultural product sales system.The invention relates to the field of agricultural product sales provides an unmanned agricultural product sales system and aims to solve the problem of agricultural product waste caused by the factthat most farmers can only prepare goods according to guessing and experiences when selling agricultural products at present. The unmanned agricultural product sales system comprises an acquisition module for acquiring selection information of customers a storage module which prestores a vegetable preparation scheme a matching module which is used for matching a corresponding side dish schemefrom the storage module according to the selection information of the client a pushing module which is used for pushing the matched side dish scheme back to the client an acquisition module which isalso used for acquiring confirmation information of a client an order module which is used for generating order information according to the confirmation information ...
    The longterm effect of media violence exposure on aggression of youngsters.Abstract The effect of media violence on aggression has always been a trending issue and a better understanding of the psychological mechanism of the impact of media violence on youth aggression is an extremely important research topic for preventing the negative impacts of media violence and juvenile delinquency. From the perspective of anger this study explored the longterm effect of different degrees of media violence exposure on the aggression of youngsters as well as the role of aggressive emotions. The studies found that individuals with a high degree of media violence exposure HMVE exhibited higher levels of proactive aggression in both irritation situations and higher levels of reactive aggression in lowirritation situations than did participants with a low degree of media violence exposure LMVE. After being provoked the anger of all participants was significantly increased and the anger and proactive ag... Cyberbullying perpetration and victimization among children and adolescents A systematic review of longitudinal studies.Abstract In this systematic review of exclusively longitudinal studies on cyberbullying perpetration and victimization among adolescents we identified 76 original longitudinal studies published between 2007 and 2017. The majority of them approached middle school students in two waves at 6 or 12 months apart. Prevalence rates for cyberbullying perpetration varied between 5.3 and 66.2 percent and for cyberbullying victimization between 1.9 and 84.0 percent. Personrelated factors e.g. traditional bullying internalizing problems were among the most studied concepts primarily examined as significant risk factors. Evidence on the causal relationships with mediarelated factors e.g. problematic Internet use and environmental factors e.g. parent and peer relations was scarce. This review identified gaps for future longitudinal research on cyberbullying perpetration and victimi... Minimum number of additive tuples in groups of prime order.For a prime number p and a sequence of integers a0 . . . ak 01 . . . p lets a0 . . . ak be the minimum number of k 1tuples x0 . . . xk A0Akwithx0x1xk over subsets a0 . . . AkZp of sizes a0 . . . ak respectively. We observe that an elegant argument of Samotij and Sudakov can be extended to show that there exists an extremal configuration with all sets Ai being intervals of appropriate length. The same conclusion also holds for the related problem posed by Bajnok whena0akaandA0Ak provided k is not equal 1 modulop. Finally by applying basic Fourier analysis we show for Bajnoks problem that if pu003e13 and a 3 . . . p3are fixed whilek1 modp tends to infinity then the extremal configuration alternates between at least two affine nonequivalent sets.
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 16,
        "gather_across_devices": false
    }
    

Evaluation Dataset

parquet

  • Dataset: parquet
  • Size: 5,862 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 132 tokens
    • mean: 225.49 tokens
    • max: 384 tokens
    • min: 124 tokens
    • mean: 240.03 tokens
    • max: 384 tokens
    • min: 69 tokens
    • mean: 221.83 tokens
    • max: 384 tokens
  • Samples:
    anchor positive negative
    Nonlocal Recoloring Algorithm for Color Vision Deficiencies with Naturalness and Detail Preserving.People with Color Vision Deficiencies CVD may have difficulty in recognizing and communicating color information especially in the multimedia era. In this paper we proposed a recoloring algorithm to enhance visual perception of people with CVD. In the algorithm color modification for color blindness is conducted in HSV color space under three constraints detail naturalness and authenticity. A new nonlocal recoloring method is used for preserving details. Subjective experiments were conducted among normal vision subjects and color blind subjects. Experimental results show that our algorithm is robust detail preserving and maintains naturalness. Source codes are freely available to noncommercial users at the website httpsdoi.org10.6084m9.figshare.9742337.v2. Improving Color Discrimination for Color Vision Deficiency CVD with TemporalDomain Modulation.Color Vision Deficiency CVD is often characterized by the inability to distinguish color due to a defective or missing cone in the eye. Although it is possible to modify the observed color to make it easier for users to distinguish this can lead to color confusion with unaffected colors. To address this problem we investigate how flicker can assist distinguishing colors for CVD patients. In preliminary study we evaluated the efficiency of color and brightness modulation with 4 participants with normal vision. Our findings suggests that while brightness modulation was ineffective color modulation can help users distinguish between different colors. Pooled Mining is Driving Blockchains Toward Centralized Systems.The decentralization property of blockchains stems from the fact that each miner accepts or refuses transactions and blocks based on its own verification results. However pooled mining causes blockchains to evolve into centralized systems because pool participants delegate their decisionmaking rights to pool managers. In this paper we established and validated a model for ProofofWork mining introduced the concept of equivalent blocks and quantitatively derived that pooling effectively lowers the income variance of miners. We also analyzed Bitcoin and Ethereum data to prove that pooled mining has become prevalent in the real world. The percentage of poolmined blocks increased from 49.91 to 91.12 within four months in Bitcoin and from 76.9 to 92.2 within five months in Ethereum. In July 2018 Bitcoin and Ethereum mining were dominated by only six and five pools respectively.
    Nonlocal Recoloring Algorithm for Color Vision Deficiencies with Naturalness and Detail Preserving.People with Color Vision Deficiencies CVD may have difficulty in recognizing and communicating color information especially in the multimedia era. In this paper we proposed a recoloring algorithm to enhance visual perception of people with CVD. In the algorithm color modification for color blindness is conducted in HSV color space under three constraints detail naturalness and authenticity. A new nonlocal recoloring method is used for preserving details. Subjective experiments were conducted among normal vision subjects and color blind subjects. Experimental results show that our algorithm is robust detail preserving and maintains naturalness. Source codes are freely available to noncommercial users at the website httpsdoi.org10.6084m9.figshare.9742337.v2. Improving Color Discrimination for Color Vision Deficiency CVD with TemporalDomain Modulation.Color Vision Deficiency CVD is often characterized by the inability to distinguish color due to a defective or missing cone in the eye. Although it is possible to modify the observed color to make it easier for users to distinguish this can lead to color confusion with unaffected colors. To address this problem we investigate how flicker can assist distinguishing colors for CVD patients. In preliminary study we evaluated the efficiency of color and brightness modulation with 4 participants with normal vision. Our findings suggests that while brightness modulation was ineffective color modulation can help users distinguish between different colors. Effects of Brownfield Remediation on Total Gaseous Mercury Concentrations in an Urban Landscape.In order to obtain a better perspective of the impacts of brownfields on the landatmosphere exchange of mercury in urban areas total gaseous mercury TGM was measured at two heights 1.8 m and 42.7 m prior to 20112012 and after 20152016 for the remediation of a brownfield and installation of a parking lot adjacent to the Syracuse Center of Excellence in Syracuse NY USA. Prior to brownfield remediation the annual average TGM concentrations were 1.6 0.6 and 1.4 0.4 ng m 3 at the ground and upper heights respectively. After brownfield remediation the annual average TGM concentrations decreased by 32 and 22 at the ground and the upper height respectively. Mercury soil flux measurements during summer after remediation showed net TGM deposition of 1.7 ng m 2 day 1 suggesting that the site transitioned from a mercury source to a net mercury sink. Measurements from the Atmospheric Mercury Netw...
    Named Entity Recognition for Nepali Language.Named Entity Recognition NER has been studied for many languages like English German Spanish and others but virtually no studies have focused on the Nepali language. One key reason is the lack of an appropriate annotated dataset. In this paper we describe a Nepali NER dataset that we created. We discuss and compare the performance of various machine learning models on this dataset. We also propose a novel NER scheme for Nepali and show that this scheme based on graphemelevel representations outperforms characterlevel representations when combined with BiLSTM models. Our best models obtain an overall F1 score of 86.89 which is a significant improvement on previously reported performance in literature. Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer Features.Named entity recognition NER is a fundamental step for many natural language processing tasks and hence enhancing the performance of NER models is always appreciated. With limited resources being available NER for SouthEast Asian languages like Telugu is quite a challenging problem. This paper attempts to improve the NER performance for Telugu using gazetteerrelated features which are automatically generated using Wikipedia pages. We make use of these gazetteer features along with other wellknown features like contextual wordlevel and corpus features to build NER models. NER models are developed using three wellknown classifiersconditional random field CRF support vector machine SVM and margin infused relaxed algorithms MIRA. The gazetteer features are shown to improve the performance and theMIRAbased NER model fared better than its counterparts SVM and CRF. Using Inversionmode MOS Varactors and 3port Inductor in 018µm CMOS Voltage Controlled Oscillator.This paper presents a RF voltage controlled oscillator VCO using inversionmode MOS varactors and 3port inductors to achieve low power consumption low phase noise broad tuning range and minimized chip size. The proposed circuit architecture using bodybiased technique operates from 4.3 to 5 GHz with 20.8 tuning range. The measured phase noise is less than 125.34 dBc at a displacement frequency of 1 MHz. The power consumption of this VCO is 25 mW when biased at 1.8 V. This VCO was implemented in standard TSMC 0.18µm 1P6M process. The chip size is 0.476 mm2 including the pads which is only 63 comparing with an identical VCO using TSMC inductor model.
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 16,
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss dblp-aminer-50k-dev_cosine_accuracy dblp-aminer-50k-test_cosine_accuracy
-1 -1 - - 1.0 -
0.2725 100 0.223 0.0166 1.0 -
0.5450 200 0.0699 0.0208 1.0 -
0.8174 300 0.0267 0.0196 1.0 -
-1 -1 - - - 1.0

Framework Versions

  • Python: 3.11.4
  • Sentence Transformers: 5.1.1
  • Transformers: 4.56.2
  • PyTorch: 2.8.0+cu128
  • Accelerate: 1.10.1
  • Datasets: 4.1.1
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
109
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vaios-stergio/all-mpnet-base-v2-dblp-aminer-60k-triplet

Finetuned
(305)
this model

Evaluation results