|
|
--- |
|
|
tags: |
|
|
- sentence-transformers |
|
|
- sentence-similarity |
|
|
- feature-extraction |
|
|
- dense |
|
|
- generated_from_trainer |
|
|
- dataset_size:46900 |
|
|
- loss:CachedMultipleNegativesRankingLoss |
|
|
base_model: sentence-transformers/all-mpnet-base-v2 |
|
|
widget: |
|
|
- source_sentence: Recurrent Neural Network Based MultiRobot Route Planning for SteepLand |
|
|
Harvesting Systems.Terrains with steep inclines cannot be utilized for crop production |
|
|
as it is hazardous to operate conventional farm equipment for various cropping |
|
|
tasks. This research which is a part of a larger effort to deploy a team of robots |
|
|
for seeding and other similar tasks proposes an approach for route planning of |
|
|
a team of ground mobile robots drones for seed delivery as well as an unmanned |
|
|
aerial vehicle UAV to continually replenish the drones with seeds. The overall |
|
|
cost is formulated in terms of linearly constrained integer quadratic programming |
|
|
LCICQ. Using a Lagrangian function a recurrent neural network with primal and |
|
|
dual sets of neurons is proposed to converge to optimal solutions. |
|
|
sentences: |
|
|
- Neural Path Planning Fixed Time NearOptimal Path Generation via Oracle Imitation.Fast |
|
|
and efficient path generation is critical for robots operating in complex environments. |
|
|
This motion planning problem is often performed in a robots actuation or configuration |
|
|
space where popular pathfinding methods such as A RRT get exponentially more computationally |
|
|
expensive to execute as the dimensionality increases or the spaces become more |
|
|
cluttered and complex. On the other hand if one were to save the entire set of |
|
|
paths connecting all pair of locations in the configuration space a priori one |
|
|
would run out of memory very quickly. In this work we introduce a novel way of |
|
|
producing fast and optimal motion plans for static environments by using a stepping |
|
|
neural network approach called OracleNet. OracleNet uses Recurrent Neural Networks |
|
|
to determine endtoend trajectories in an iterative manner that implicitly generates |
|
|
optimal motion plans with minimal loss in performance in a compact form. The algorithm |
|
|
is straightforward in implementation while consistently generating nearoptimal |
|
|
paths in a single iterative endtoend rollout. In practice OracleNet generally |
|
|
has fixedtime execution regardless of the configuration space complexity while |
|
|
outperforming popular pathfinding algorithms in complex environments and higher |
|
|
dimensions1. |
|
|
- A Critical Look at the 2019 College Admissions Scandal.Discusses the 2019 College |
|
|
admissions scandal. Let me begin with a disclaimer I am making no legal excuses |
|
|
for the participants in the current scandal. I am only offering contextual background |
|
|
that places it in the broader academic cultural and political perspective required |
|
|
for understanding. It is only the most recent installment of a wellworn narrative |
|
|
the controlling elite make their own rules and live by them if they can get away |
|
|
with it. Unfortunately some of the participants who are either serving or facing |
|
|
jail time didnt know to not go into a gunfight with a sharp stick. Money alone |
|
|
is not enough to avoid prosecution for fraud you need political clout. The best |
|
|
protection a defendant can have is a prosecutor who fears political reprisal. |
|
|
Compare how the Koch brothers escaped prosecution for stealing millions of oil |
|
|
dollars from Native American tribes12 with the fate of actresses Lori Loughlin |
|
|
and Felicity Huffman who at the time of this writing face jail time for paying |
|
|
bribes to get their children into good universities.34 In the former case the |
|
|
federal prosecutor who dared to empanel a grand jury to get at the truth was fired |
|
|
for cause which put a quick end to the prosecution. In the latter case the prosecutors |
|
|
pushed for jail terms and public admonishment with the zeal of Oliver Cromwell. |
|
|
There you have it stealing oil from Native Americans versus trying to bribe your |
|
|
kids into a great university. Where is the greater crime Admittedly these actresses |
|
|
and their |
|
|
- Sensitivity Enhanced Photoacoustic Imaging Using a HighFrequency PZT Transducer |
|
|
with an Integrated FrontEnd Amplifier.Photoacoustic PA imaging is a hybrid imaging |
|
|
technique that can provide both structural and functional information of biological |
|
|
tissues. Due to limited permissible laser energy deposited on tissues highly sensitive |
|
|
PA imaging is required. Here we developed a 20 MHz lead zirconium titanate PZT |
|
|
transducer 1.5 mm 3 mm with frontend amplifier circuits for local signal processing |
|
|
to achieve sensitivity enhanced PA imaging. The electrical and acoustic performance |
|
|
was characterized. Experiments on phantoms and chicken breast tissue were conducted |
|
|
to validate the imaging performance. The fabricated prototype shows a bandwidth |
|
|
of 63 and achieves a noise equivalent pressure NEP of 0.24 mPaHz and a receiving |
|
|
sensitivity of 62.1 μVPa at 20 MHz without degradation of the bandwidth. PA imaging |
|
|
of wire phantoms demonstrates that the prototype is capable of improving the detection |
|
|
sensitivity by 10 dB compared with the traditional transducer without integrated |
|
|
amplifier. In addition in vitro experiments on chicken breast tissue show that |
|
|
structures could be imaged with enhanced contrast using the prototype and the |
|
|
imaging depth range was improved by 1 mm. These results demonstrate that the transducer |
|
|
with an integrated frontend amplifier enables highly sensitive PA imaging with |
|
|
improved penetration depth. The proposed method holds the potential for visualization |
|
|
of deep tissue structures and enhanced detection of weak physiological changes. |
|
|
- source_sentence: A method for reconstructing label images from a few projections |
|
|
as motivated by electron microscopy.Our aim is to produce a tessellation of space |
|
|
into small voxels and based on only a few tomographic projections of an object |
|
|
assign to each voxel a label that indicates one of the components of interest |
|
|
constituting the object. Traditional methods are not reliable in applications |
|
|
such as electron microscopy in which due to the damage by radiation only a few |
|
|
projections are available. We postulate a low level prior knowledge regarding |
|
|
the underlying distribution of label images and then directly estimate the label |
|
|
image based on the prior and the projections. We use a relatively efficient approximation |
|
|
to a global search for the optimal estimate. Copyright Springer Science Business |
|
|
Media LLC 2006 |
|
|
sentences: |
|
|
- Airline Miles Redemption.The business of Airline firms has deviated from their |
|
|
main business of flying passengers over the past decade. Now they have diversified |
|
|
into other lines of business as well. The revenue model has therefore changed |
|
|
over the past decade. Airline miles is one of the main revenue generating venture |
|
|
for airlines currently. It has been mentioned that airline miles business has |
|
|
turned cash cow for these firms. Their normal way of business flying passengers |
|
|
is not an attractive method of running business for them. Selling airline miles |
|
|
allow them to generate a higher revenue. We look at how the redemption of airline |
|
|
miles affect the bottom line of the company using data from publicly available |
|
|
data sources |
|
|
- Extensive Examination of XOR Arbiter PUFs as Security Primitives for ResourceConstrained |
|
|
IoT Devices.Communication security is essential for the proper functioning of |
|
|
the Internet of Things. Traditional approaches that rely on cryptographic keys |
|
|
are vulnerable to sidechannel attacks. Physical Unclonable Functions PUFs leveraging |
|
|
unavoidable and irreproducible variations of integrated circuits to produce responses |
|
|
unique for individual PUF devices are emerging as promising candidates as security |
|
|
primitives to provide keyless solutions. Before a PUF can be adopted for real |
|
|
applications the PUF must be thoroughly examined to understand its various properties |
|
|
for its application feasibility. In this paper we study XOR PUFs for broad ranges |
|
|
of values for circuit architecture parameters. XOR PUFs have been extensively |
|
|
studied and have been shown to be unable to withstand machine learning attacks |
|
|
for 64bit XOR PUFs with less than ten component PUFs. Attack methods employed |
|
|
in existing studies need a large number of challengeresponse pairs CRPs which |
|
|
are obtainable only if the PUF has an open access interface. When PUFembedded |
|
|
devices equipped with mutual authentication or response obfuscating techniques |
|
|
it is difficult for attackers to accumulate large numbers of CRPs. With only a |
|
|
small number of accumulated CRPs available to attackers small size PUFs like XOR |
|
|
PUFs with a small number of component PUFs and stages may become resistant to |
|
|
machine learning attacks. Since smaller sizes mean less resourcedemanding it is |
|
|
worthwhile to examine such PUFs which have usually been considered unsafe against |
|
|
attacks. Such are thoughts that have been motivating us in this paper to explore |
|
|
the PUF performances for a wide range of values of the PUF architecture parameters. |
|
|
- LocalConvexity Reinforcement for Scene Reconstruction from Sparse Point Clouds.Several |
|
|
methods reconstruct surfaces from sparse point clouds that are estimated from |
|
|
images. Most of them build 3D Delaunay triangulation of the points and compute |
|
|
occupancy labeling of the tetrahedra thanks to visibility information and surface |
|
|
constraints. However their most notable errors are falselylabeled freespace tetrahedra. |
|
|
We present labeling corrections of these errors based on a new shape constraint |
|
|
localconvexity. In the simplest case this means that a freespace tetrahedron of |
|
|
the Delaunay is relabeled matter if its size is small enough and all its vertices |
|
|
are in matter tetrahedra. The allowed corrections are more important in the vertical |
|
|
direction than in the horizontal ones to take into account the anisotropy of usual |
|
|
scenes. In the experiments our corrections improve the results of previous surface |
|
|
reconstruction methods applied to videos taken by a consumer 360 camera. |
|
|
- source_sentence: Buzzer Detection to Maintain Information Neutrality in 2019 Indonesia |
|
|
Presidential Election.This paper proposed a method which detects a political buzzer |
|
|
in social media specifically Instagram. With Indonesia undergoing 2019 presidential |
|
|
election a detection of buzzers that causes much trouble in maintaining information |
|
|
neutrality is seen as a needed. One of the many reasons is because those buzzers |
|
|
spread false news making the information gained by the use of social media to |
|
|
be not neutral and deliberately offends or attack those that they are not in favor |
|
|
of. Those buzzers share a similar characteristic tendency or even possess the |
|
|
same pattern. Grouping classification and detection method are used to counter |
|
|
this problem. This research gives a slight overview of what is happening in social |
|
|
media and a theory of how to deal with those problems. The argument is expected |
|
|
to help to identify buzzer in real life thus helps in maintaining information |
|
|
neutrality along with the social media in Indonesia. |
|
|
sentences: |
|
|
- Fake News Detection on Social Media A Systematic Survey.These days there are instabilities |
|
|
in many societies in the world either because of political economic and other |
|
|
societal issues. The advance in mobile technology has enabled social media to |
|
|
play a vital role in organizing activities in favour or against certain parties |
|
|
or countries. Many researchers see the need to develop automated systems that |
|
|
are capable of detecting and tracking fake news on social media. In this paper |
|
|
we introduce a systematic survey on the process of fake news detection on social |
|
|
media. The types of data and the categories of features used in the detection |
|
|
model as well as benchmark datasets are discussed. |
|
|
- Automatic Guided Waves Data Transmission System Using an Oil Industry Multiwire |
|
|
Cable.Alternative wireless data communication systems are a necessity in industries |
|
|
that operate in harsh environments such as the oil and gas industry. Ultrasonic |
|
|
guided wave propagation through solid metallic structures such as metal barriers |
|
|
rods and multiwire cables have been proposed for data transmission purposes. In |
|
|
this context multiwire cables have been explored as a communication media for |
|
|
the transmission of encoded ultrasonic guided waves. This work presents the proprietary |
|
|
hardware design and implementation of an automatic data transmission system based |
|
|
on the propagation of ultrasonic guided waves using as communication channels |
|
|
a hightemperature and corrosionresistant oil industry multiwire cable. A dedicated |
|
|
communication protocol has been implemented at physical and data link layers which |
|
|
involved pulse position modulation PPM digital signal processing DSP and an integrity |
|
|
validation byte. The data transmission system was composed of an ultrasonic guided |
|
|
waves PPM encoded data transmitter a 1K22 MP35N multiwire cable a hardware preamplifier |
|
|
a data acquisition module a realtime RT DSP LabVIEW National Instruments Austin |
|
|
TX based demodulator and a humanmachine interface HMI running on a personal computer. |
|
|
To evaluate the communication system the transmitter generated 60 kHz PPM energy |
|
|
packets containing three different bytes and their corresponding integrity validation |
|
|
bytes. Experimental tests were conducted in the laboratory using 1 and 10 m length |
|
|
cables. Although a dispersive solid elastic media was used as a communication |
|
|
channel results showed that digital data transmission rates up to 470 bps were |
|
|
effectively validated. |
|
|
- Improved Optimization of Motion Primitives for Motion Planning in State Lattices.In |
|
|
this paper we propose a framework for generating motion primitives for latticebased |
|
|
motion planners automatically. Given a family of systems the user only needs to |
|
|
specify which principle types of motions which are here denoted maneuvers that |
|
|
are relevant for the considered system family. Based on the selected maneuver |
|
|
types and a selected system instance the algorithm not only automatically optimizes |
|
|
the motions connecting predefined boundary conditions but also simultaneously |
|
|
optimizes the endpoint boundary conditions as well. This significantly reduces |
|
|
the time consuming part of manually specifying all boundary value problems that |
|
|
should be solved and no exhaustive search to generate feasible motions is required. |
|
|
In addition to handling static a priori known system parameters the framework |
|
|
also allows for fast automatic reoptimization of motion primitives if the system |
|
|
parameters change while the system is in use e.g if the load significantly changes |
|
|
or a trailer with a new geometry is picked up by an autonomous truck. We also |
|
|
show in several numerical examples that the framework can enhance the performance |
|
|
of the motion planner in terms of total cost for the produced solution. |
|
|
- source_sentence: Exploring corporate governance research in accounting journals |
|
|
through latent semantic and topic analyses.The literature on corporate governance |
|
|
CG has been expanding at an unprecedented rate since major corporate scandals |
|
|
surfaced such as Enron WorldCom and HealthSouth. Corresponding with accountingu0027s |
|
|
important role in CG accounting scholars increasingly have investigated CG in |
|
|
recent years so the body of literature is growing. Although previous attempts |
|
|
have been made to summarize extant literature on CG via reviews none of these |
|
|
attempts has utilized recent developments in text analyses and natural language |
|
|
processing. This study uses latent semantic and topic analyses to address this |
|
|
research gap by analysing abstracts from 1399 articles in all accounting journals |
|
|
that the Australian Business Deans Council ABDC has rated A and A. The ABDC journal |
|
|
list is widely recognized as a journalquality indicator across many universities |
|
|
worldwide. The analyses revealed 10 distinct research topics on CG in the ABDCu0027s |
|
|
top accounting journals. The results presented include the five most representative |
|
|
articles for each topic as distinguished by topic scores. This study carries important |
|
|
practice and policy implications as it reveals major research streams and exhibits |
|
|
how researchers respond to various CG problems. |
|
|
sentences: |
|
|
- Performance Analysis of Small Cells Deployment under Imperfect Traffic Hotspot |
|
|
Localization.Heterogeneous Networks HetNets long been considered in operatorsu0027 |
|
|
roadmaps for macrocellsu0027 network improvements still continue to attract interest |
|
|
for 5G network deployments. Understanding the efficiency of small cell deployment |
|
|
in the presence of traffic hotspots can further draw operatorsu0027 attention |
|
|
to this feature. In this context we evaluate the impact of imperfect small cell |
|
|
positioning on the network performances. We show that the latter is mainly impacted |
|
|
by the position of the hotspot within the cell in case the hotspot is near the |
|
|
macrocell even a perfect positioning of the small cell will not yield improved |
|
|
performance due to the interference coming from the macrocell. In the case where |
|
|
the hotspot is located far enough from the macrocell even a large error in small |
|
|
cell positioning would still be beneficial in offloading traffic from the congested |
|
|
macrocell. |
|
|
- Corporate disclosure via social media a data science approach.The purpose of this |
|
|
paper is to investigate corporate financial disclosure via Twitter among the top |
|
|
listed 350 companies in the UK as well as identify the determinants of the extent |
|
|
of social media usage to disclose financial information.This study applies an |
|
|
unsupervised machine learning technique namely Latent Dirichlet Allocation topic |
|
|
modeling to identify financial disclosure tweets. Panel Logistic and Generalized |
|
|
Linear Model Regressions are also run to identify the determinants of financial |
|
|
disclosure on Twitter focusing mainly on board characteristics.Topic modeling |
|
|
results reveal that companies mainly tweet about 12 topics including financial |
|
|
disclosure which has a probability of occurrence of about 7 percent. Several board |
|
|
characteristics are found to be associated with the extent of Twitter usage as |
|
|
a financial disclosure platform among which are board independence gender diversity |
|
|
and board tenure.The extensive literature examines disclosure via traditional |
|
|
media and its determinants yet this paper extends the literature by investigating |
|
|
the relatively new disclosure channel of social media. This study is among the |
|
|
first to utilize machine learning instead of manual coding techniques to automatically |
|
|
unveil the tweets topics and reveal financial disclosure tweets. It is also among |
|
|
the first to investigate the relationships between several board characteristics |
|
|
and financial disclosure on Twitter providing a distinction between the roles |
|
|
of executive vs nonexecutive directors relating to disclosure decisions. |
|
|
- Feasibility of Replacing the Range Doppler Equation of Spaceborne Synthetic Aperture |
|
|
Radar Considering Atmospheric Propagation Delay with a Rational Polynomial Coefficient |
|
|
Model.Usually the rational polynomial coefficient RPC model of spaceborne synthetic |
|
|
aperture radar SAR is fitted by the original range Doppler RD model. However the |
|
|
radar signal is affected by twoway atmospheric delay which causes measurement |
|
|
error in the slant range term of the RD model. In this paper two atmospheric delay |
|
|
correction methods are proposed for use in terrainindependent RPC fitting singlescene |
|
|
SAR imaging with a unique atmospheric delay correction parameter plan 1 and singlescene |
|
|
SAR imaging with spatially varying atmospheric delay correction parameters plan |
|
|
2. The feasibility of the two methods was verified by conducting fitting experiments |
|
|
and geometric positioning accuracy verification of the RPC model. The experiments |
|
|
for the GF3 satellite were performed by using global meteorological data a global |
|
|
digital elevation model and ground control data from several regions in China. |
|
|
The experimental results show that it is feasible to use plan 1 or plan 2 to correct |
|
|
the atmospheric delay error no matter whether in plain mountainous or plateau |
|
|
areas. Moreover the geometric positioning accuracy of the RPC model after correcting |
|
|
the atmospheric delay was improved to better than 3 m. This is of great significance |
|
|
for the efficient and highprecision geometric processing of spaceborne SAR images. |
|
|
- source_sentence: DCNN and LDARFRFE Based ShortTerm Electricity Load and Price Forecasting.In |
|
|
this paper Deep Convolutional Neural Network DCNN is proposed for short term electricity |
|
|
load and price forecasting. Extracting useful information from data and then using |
|
|
that information for prediction is a challenging task. This paper presents a model |
|
|
consisting of two stages feature engineering and prediction. Feature engineering |
|
|
comprises of Feature Extraction FE and Feature Selection FS. For FS this paper |
|
|
proposes a technique that is combination of Random Forest RF and Recursive Feature |
|
|
Elimination RFE. The proposed technique is used for feature redundancy removal |
|
|
and dimensionality reduction. After finding the useful features DCNN is used for |
|
|
electricity price and load forecasting. DCNN performance is compared with Convolutional |
|
|
Neural Network CNN and Support Vector Classifier SVC models. Using the forecasting |
|
|
models dayahead and the week ahead forecasting is done for electricity price and |
|
|
load. To evaluate the CNN SVC and DCNN models real electricity market data is |
|
|
used. Mean Absolute Error MAE and Root Mean Square Error RMSE are used to evaluate |
|
|
the performance of the models. DCNN outperforms compared models by yielding lesser |
|
|
errors. |
|
|
sentences: |
|
|
- Stable twosided satisfied matching for ridesharing system based on preference |
|
|
orders.Ridesharing has emerged as an alternative transportation mode along road |
|
|
networks around the world. Rideshare matching problem is vital to improve the |
|
|
sustainable development of ridesharing systems. This paper aims to address the |
|
|
stable twosided satisfied matching problem considering the participants psychological |
|
|
perception. First of all we investigate the elements that influence passengers |
|
|
and drivers ridesharing experience by means of semistructured research interviews |
|
|
and questionnaire survey. Two ridesharing perception evaluation systems are originally |
|
|
established to get the preference orders of passengers and drivers separately. |
|
|
Then rideshare matchingrelated definitions are stated and rideshare matching linguistic |
|
|
information processing is also elaborated in detail based on preference utility |
|
|
function disappointment function as well as elation function. Furthermore we propose |
|
|
a stable twosided satisfied matching model on account of fuzzy linguistic information |
|
|
processing about ridesharing which is able to reflect participants psychological |
|
|
factors. To verify the validity of our model we present a twosided matching case |
|
|
based on hypothetical rideshare matching platform. The analytical results indicate |
|
|
that the use of stable twosided satisfied matching method based on fuzzy linguistic |
|
|
information enables to substantially satisfy both drivers and passengers expectation |
|
|
and improve the sustainability of ridesharing systems. |
|
|
- ShortTerm Electricity Load and Price Forecasting using Enhanced KNN.In this paper |
|
|
we introduced a new enhanced technique to resolve the issue of electricity price |
|
|
and load forecasting. In Smart Grids SGs Price and load forecasting is the major |
|
|
issue. Framework of enhanced technique comprises of classification and feature |
|
|
engineering. Feature engineering comprises of feature selection and feature extraction. |
|
|
Decision Tree Regression DTR is used for feature selection. Recursive Feature |
|
|
Elimination RFE is used for feature selection which eliminates the redundancy |
|
|
of features. The second step of feature engineering feature extraction is done |
|
|
using Singular Value Decomposition SVD which reduces the dimensionality of features. |
|
|
Last step is to predict the load and forecast. For forecasting electricity load |
|
|
and price two existing techniques KNearest Neighbors KNN and MultiLayer Perceptron |
|
|
MLP and a newly proposed technique known as Enhanced KNN EKNN is being used. The |
|
|
proposed technique outperforms than MLP and KNN in terms of accuracy. KNN is working |
|
|
on nonparametric method which is used for classification and regression. |
|
|
- Death Ground.Death Ground is a competitive musical installationgame for two players. |
|
|
The work is designed to provide the framework for the playersparticipants in which |
|
|
to perform gamemediated musical gestures against eachother. The main mechanic |
|
|
involves destroying the other playeru0027s avatar by outmaneuvering and using |
|
|
audio weapons and improvised musical actions against it. These weapons are spawned |
|
|
in an enclosed area during the performance and can be used by whoever is collects |
|
|
them first. There is a multitude of such powerups all of which have different |
|
|
properties such as speed boost additional damage ground traps and so on. All of |
|
|
these weapons affect the sound and sonic textures that each of the avatars produce. |
|
|
Additionally the players can use elements of the environment such as platforms |
|
|
obstructions and elevation in order to gain competitive advantage or position |
|
|
themselves strategically to access first the spawned powerups. |
|
|
pipeline_tag: sentence-similarity |
|
|
library_name: sentence-transformers |
|
|
metrics: |
|
|
- cosine_accuracy |
|
|
model-index: |
|
|
- name: SentenceTransformer based on sentence-transformers/all-mpnet-base-v2 |
|
|
results: |
|
|
- task: |
|
|
type: triplet |
|
|
name: Triplet |
|
|
dataset: |
|
|
name: dblp aminer 50k dev |
|
|
type: dblp-aminer-50k-dev |
|
|
metrics: |
|
|
- type: cosine_accuracy |
|
|
value: 1.0 |
|
|
name: Cosine Accuracy |
|
|
- task: |
|
|
type: triplet |
|
|
name: Triplet |
|
|
dataset: |
|
|
name: dblp aminer 50k test |
|
|
type: dblp-aminer-50k-test |
|
|
metrics: |
|
|
- type: cosine_accuracy |
|
|
value: 1.0 |
|
|
name: Cosine Accuracy |
|
|
--- |
|
|
|
|
|
# SentenceTransformer based on sentence-transformers/all-mpnet-base-v2 |
|
|
|
|
|
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) on the parquet dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
- **Model Type:** Sentence Transformer |
|
|
- **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) <!-- at revision e8c3b32edf5434bc2275fc9bab85f82640a19130 --> |
|
|
- **Maximum Sequence Length:** 384 tokens |
|
|
- **Output Dimensionality:** 768 dimensions |
|
|
- **Similarity Function:** Cosine Similarity |
|
|
- **Training Dataset:** |
|
|
- parquet |
|
|
<!-- - **Language:** Unknown --> |
|
|
<!-- - **License:** Unknown --> |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net) |
|
|
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) |
|
|
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) |
|
|
|
|
|
### Full Model Architecture |
|
|
|
|
|
``` |
|
|
SentenceTransformer( |
|
|
(0): Transformer({'max_seq_length': 384, 'do_lower_case': False, 'architecture': 'MPNetModel'}) |
|
|
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) |
|
|
(2): Normalize() |
|
|
) |
|
|
``` |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Direct Usage (Sentence Transformers) |
|
|
|
|
|
First install the Sentence Transformers library: |
|
|
|
|
|
```bash |
|
|
pip install -U sentence-transformers |
|
|
``` |
|
|
|
|
|
Then you can load this model and run inference. |
|
|
```python |
|
|
from sentence_transformers import SentenceTransformer |
|
|
|
|
|
# Download from the 🤗 Hub |
|
|
model = SentenceTransformer("sentence_transformers_model_id") |
|
|
# Run inference |
|
|
sentences = [ |
|
|
'DCNN and LDARFRFE Based ShortTerm Electricity Load and Price Forecasting.In this paper Deep Convolutional Neural Network DCNN is proposed for short term electricity load and price forecasting. Extracting useful information from data and then using that information for prediction is a challenging task. This paper presents a model consisting of two stages feature engineering and prediction. Feature engineering comprises of Feature Extraction FE and Feature Selection FS. For FS this paper proposes a technique that is combination of Random Forest RF and Recursive Feature Elimination RFE. The proposed technique is used for feature redundancy removal and dimensionality reduction. After finding the useful features DCNN is used for electricity price and load forecasting. DCNN performance is compared with Convolutional Neural Network CNN and Support Vector Classifier SVC models. Using the forecasting models dayahead and the week ahead forecasting is done for electricity price and load. To evaluate the CNN SVC and DCNN models real electricity market data is used. Mean Absolute Error MAE and Root Mean Square Error RMSE are used to evaluate the performance of the models. DCNN outperforms compared models by yielding lesser errors.', |
|
|
'ShortTerm Electricity Load and Price Forecasting using Enhanced KNN.In this paper we introduced a new enhanced technique to resolve the issue of electricity price and load forecasting. In Smart Grids SGs Price and load forecasting is the major issue. Framework of enhanced technique comprises of classification and feature engineering. Feature engineering comprises of feature selection and feature extraction. Decision Tree Regression DTR is used for feature selection. Recursive Feature Elimination RFE is used for feature selection which eliminates the redundancy of features. The second step of feature engineering feature extraction is done using Singular Value Decomposition SVD which reduces the dimensionality of features. Last step is to predict the load and forecast. For forecasting electricity load and price two existing techniques KNearest Neighbors KNN and MultiLayer Perceptron MLP and a newly proposed technique known as Enhanced KNN EKNN is being used. The proposed technique outperforms than MLP and KNN in terms of accuracy. KNN is working on nonparametric method which is used for classification and regression.', |
|
|
'Death Ground.Death Ground is a competitive musical installationgame for two players. The work is designed to provide the framework for the playersparticipants in which to perform gamemediated musical gestures against eachother. The main mechanic involves destroying the other playeru0027s avatar by outmaneuvering and using audio weapons and improvised musical actions against it. These weapons are spawned in an enclosed area during the performance and can be used by whoever is collects them first. There is a multitude of such powerups all of which have different properties such as speed boost additional damage ground traps and so on. All of these weapons affect the sound and sonic textures that each of the avatars produce. Additionally the players can use elements of the environment such as platforms obstructions and elevation in order to gain competitive advantage or position themselves strategically to access first the spawned powerups.', |
|
|
] |
|
|
embeddings = model.encode(sentences) |
|
|
print(embeddings.shape) |
|
|
# [3, 768] |
|
|
|
|
|
# Get the similarity scores for the embeddings |
|
|
similarities = model.similarity(embeddings, embeddings) |
|
|
print(similarities) |
|
|
# tensor([[1.0000, 0.7770, 0.0657], |
|
|
# [0.7770, 1.0000, 0.0281], |
|
|
# [0.0657, 0.0281, 1.0000]]) |
|
|
``` |
|
|
|
|
|
<!-- |
|
|
### Direct Usage (Transformers) |
|
|
|
|
|
<details><summary>Click to see the direct usage in Transformers</summary> |
|
|
|
|
|
</details> |
|
|
--> |
|
|
|
|
|
<!-- |
|
|
### Downstream Usage (Sentence Transformers) |
|
|
|
|
|
You can finetune this model on your own dataset. |
|
|
|
|
|
<details><summary>Click to expand</summary> |
|
|
|
|
|
</details> |
|
|
--> |
|
|
|
|
|
<!-- |
|
|
### Out-of-Scope Use |
|
|
|
|
|
*List how the model may foreseeably be misused and address what users ought not to do with the model.* |
|
|
--> |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Metrics |
|
|
|
|
|
#### Triplet |
|
|
|
|
|
* Datasets: `dblp-aminer-50k-dev` and `dblp-aminer-50k-test` |
|
|
* Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator) |
|
|
|
|
|
| Metric | dblp-aminer-50k-dev | dblp-aminer-50k-test | |
|
|
|:--------------------|:--------------------|:---------------------| |
|
|
| **cosine_accuracy** | **1.0** | **1.0** | |
|
|
|
|
|
<!-- |
|
|
## Bias, Risks and Limitations |
|
|
|
|
|
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.* |
|
|
--> |
|
|
|
|
|
<!-- |
|
|
### Recommendations |
|
|
|
|
|
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.* |
|
|
--> |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Dataset |
|
|
|
|
|
#### parquet |
|
|
|
|
|
* Dataset: parquet |
|
|
* Size: 46,900 training samples |
|
|
* Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code> |
|
|
* Approximate statistics based on the first 1000 samples: |
|
|
| | anchor | positive | negative | |
|
|
|:--------|:--------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------| |
|
|
| type | string | string | string | |
|
|
| details | <ul><li>min: 132 tokens</li><li>mean: 232.79 tokens</li><li>max: 384 tokens</li></ul> | <ul><li>min: 123 tokens</li><li>mean: 247.67 tokens</li><li>max: 384 tokens</li></ul> | <ul><li>min: 69 tokens</li><li>mean: 218.48 tokens</li><li>max: 384 tokens</li></ul> | |
|
|
* Samples: |
|
|
| anchor | positive | negative | |
|
|
|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
|
|
| <code>The longterm effect of media violence exposure on aggression of youngsters.Abstract The effect of media violence on aggression has always been a trending issue and a better understanding of the psychological mechanism of the impact of media violence on youth aggression is an extremely important research topic for preventing the negative impacts of media violence and juvenile delinquency. From the perspective of anger this study explored the longterm effect of different degrees of media violence exposure on the aggression of youngsters as well as the role of aggressive emotions. The studies found that individuals with a high degree of media violence exposure HMVE exhibited higher levels of proactive aggression in both irritation situations and higher levels of reactive aggression in lowirritation situations than did participants with a low degree of media violence exposure LMVE. After being provoked the anger of all participants was significantly increased and the anger and proactive ag...</code> | <code>Cyberbullying perpetration and victimization among children and adolescents A systematic review of longitudinal studies.Abstract In this systematic review of exclusively longitudinal studies on cyberbullying perpetration and victimization among adolescents we identified 76 original longitudinal studies published between 2007 and 2017. The majority of them approached middle school students in two waves at 6 or 12 months apart. Prevalence rates for cyberbullying perpetration varied between 5.3 and 66.2 percent and for cyberbullying victimization between 1.9 and 84.0 percent. Personrelated factors e.g. traditional bullying internalizing problems were among the most studied concepts primarily examined as significant risk factors. Evidence on the causal relationships with mediarelated factors e.g. problematic Internet use and environmental factors e.g. parent and peer relations was scarce. This review identified gaps for future longitudinal research on cyberbullying perpetration and victimi...</code> | <code>Any small multiplicative subgroup is not a sumset.Abstract We prove that for an arbitrary e u003e 0 and any multiplicative subgroup Γ F p 1 Γ p 2 3 e there are no sets B C F p with B C u003e 1 such that Γ B C . Also we obtain that for 1 Γ p 6 7 e and any ξ 0 there is no a set B such that ξ Γ 1 B B .</code> | |
|
|
| <code>The longterm effect of media violence exposure on aggression of youngsters.Abstract The effect of media violence on aggression has always been a trending issue and a better understanding of the psychological mechanism of the impact of media violence on youth aggression is an extremely important research topic for preventing the negative impacts of media violence and juvenile delinquency. From the perspective of anger this study explored the longterm effect of different degrees of media violence exposure on the aggression of youngsters as well as the role of aggressive emotions. The studies found that individuals with a high degree of media violence exposure HMVE exhibited higher levels of proactive aggression in both irritation situations and higher levels of reactive aggression in lowirritation situations than did participants with a low degree of media violence exposure LMVE. After being provoked the anger of all participants was significantly increased and the anger and proactive ag...</code> | <code>Cyberbullying perpetration and victimization among children and adolescents A systematic review of longitudinal studies.Abstract In this systematic review of exclusively longitudinal studies on cyberbullying perpetration and victimization among adolescents we identified 76 original longitudinal studies published between 2007 and 2017. The majority of them approached middle school students in two waves at 6 or 12 months apart. Prevalence rates for cyberbullying perpetration varied between 5.3 and 66.2 percent and for cyberbullying victimization between 1.9 and 84.0 percent. Personrelated factors e.g. traditional bullying internalizing problems were among the most studied concepts primarily examined as significant risk factors. Evidence on the causal relationships with mediarelated factors e.g. problematic Internet use and environmental factors e.g. parent and peer relations was scarce. This review identified gaps for future longitudinal research on cyberbullying perpetration and victimi...</code> | <code>Unmanned agricultural product sales system.The invention relates to the field of agricultural product sales provides an unmanned agricultural product sales system and aims to solve the problem of agricultural product waste caused by the factthat most farmers can only prepare goods according to guessing and experiences when selling agricultural products at present. The unmanned agricultural product sales system comprises an acquisition module for acquiring selection information of customers a storage module which prestores a vegetable preparation scheme a matching module which is used for matching a corresponding side dish schemefrom the storage module according to the selection information of the client a pushing module which is used for pushing the matched side dish scheme back to the client an acquisition module which isalso used for acquiring confirmation information of a client an order module which is used for generating order information according to the confirmation information ...</code> | |
|
|
| <code>The longterm effect of media violence exposure on aggression of youngsters.Abstract The effect of media violence on aggression has always been a trending issue and a better understanding of the psychological mechanism of the impact of media violence on youth aggression is an extremely important research topic for preventing the negative impacts of media violence and juvenile delinquency. From the perspective of anger this study explored the longterm effect of different degrees of media violence exposure on the aggression of youngsters as well as the role of aggressive emotions. The studies found that individuals with a high degree of media violence exposure HMVE exhibited higher levels of proactive aggression in both irritation situations and higher levels of reactive aggression in lowirritation situations than did participants with a low degree of media violence exposure LMVE. After being provoked the anger of all participants was significantly increased and the anger and proactive ag...</code> | <code>Cyberbullying perpetration and victimization among children and adolescents A systematic review of longitudinal studies.Abstract In this systematic review of exclusively longitudinal studies on cyberbullying perpetration and victimization among adolescents we identified 76 original longitudinal studies published between 2007 and 2017. The majority of them approached middle school students in two waves at 6 or 12 months apart. Prevalence rates for cyberbullying perpetration varied between 5.3 and 66.2 percent and for cyberbullying victimization between 1.9 and 84.0 percent. Personrelated factors e.g. traditional bullying internalizing problems were among the most studied concepts primarily examined as significant risk factors. Evidence on the causal relationships with mediarelated factors e.g. problematic Internet use and environmental factors e.g. parent and peer relations was scarce. This review identified gaps for future longitudinal research on cyberbullying perpetration and victimi...</code> | <code>Minimum number of additive tuples in groups of prime order.For a prime number p and a sequence of integers a0 . . . ak 01 . . . p lets a0 . . . ak be the minimum number of k 1tuples x0 . . . xk A0Akwithx0x1xk over subsets a0 . . . AkZp of sizes a0 . . . ak respectively. We observe that an elegant argument of Samotij and Sudakov can be extended to show that there exists an extremal configuration with all sets Ai being intervals of appropriate length. The same conclusion also holds for the related problem posed by Bajnok whena0akaandA0Ak provided k is not equal 1 modulop. Finally by applying basic Fourier analysis we show for Bajnoks problem that if pu003e13 and a 3 . . . p3are fixed whilek1 modp tends to infinity then the extremal configuration alternates between at least two affine nonequivalent sets.</code> | |
|
|
* Loss: [<code>CachedMultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedmultiplenegativesrankingloss) with these parameters: |
|
|
```json |
|
|
{ |
|
|
"scale": 20.0, |
|
|
"similarity_fct": "cos_sim", |
|
|
"mini_batch_size": 16, |
|
|
"gather_across_devices": false |
|
|
} |
|
|
``` |
|
|
|
|
|
### Evaluation Dataset |
|
|
|
|
|
#### parquet |
|
|
|
|
|
* Dataset: parquet |
|
|
* Size: 5,862 evaluation samples |
|
|
* Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code> |
|
|
* Approximate statistics based on the first 1000 samples: |
|
|
| | anchor | positive | negative | |
|
|
|:--------|:--------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------| |
|
|
| type | string | string | string | |
|
|
| details | <ul><li>min: 132 tokens</li><li>mean: 225.49 tokens</li><li>max: 384 tokens</li></ul> | <ul><li>min: 124 tokens</li><li>mean: 240.03 tokens</li><li>max: 384 tokens</li></ul> | <ul><li>min: 69 tokens</li><li>mean: 221.83 tokens</li><li>max: 384 tokens</li></ul> | |
|
|
* Samples: |
|
|
| anchor | positive | negative | |
|
|
|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
|
|
| <code>Nonlocal Recoloring Algorithm for Color Vision Deficiencies with Naturalness and Detail Preserving.People with Color Vision Deficiencies CVD may have difficulty in recognizing and communicating color information especially in the multimedia era. In this paper we proposed a recoloring algorithm to enhance visual perception of people with CVD. In the algorithm color modification for color blindness is conducted in HSV color space under three constraints detail naturalness and authenticity. A new nonlocal recoloring method is used for preserving details. Subjective experiments were conducted among normal vision subjects and color blind subjects. Experimental results show that our algorithm is robust detail preserving and maintains naturalness. Source codes are freely available to noncommercial users at the website httpsdoi.org10.6084m9.figshare.9742337.v2.</code> | <code>Improving Color Discrimination for Color Vision Deficiency CVD with TemporalDomain Modulation.Color Vision Deficiency CVD is often characterized by the inability to distinguish color due to a defective or missing cone in the eye. Although it is possible to modify the observed color to make it easier for users to distinguish this can lead to color confusion with unaffected colors. To address this problem we investigate how flicker can assist distinguishing colors for CVD patients. In preliminary study we evaluated the efficiency of color and brightness modulation with 4 participants with normal vision. Our findings suggests that while brightness modulation was ineffective color modulation can help users distinguish between different colors.</code> | <code>Pooled Mining is Driving Blockchains Toward Centralized Systems.The decentralization property of blockchains stems from the fact that each miner accepts or refuses transactions and blocks based on its own verification results. However pooled mining causes blockchains to evolve into centralized systems because pool participants delegate their decisionmaking rights to pool managers. In this paper we established and validated a model for ProofofWork mining introduced the concept of equivalent blocks and quantitatively derived that pooling effectively lowers the income variance of miners. We also analyzed Bitcoin and Ethereum data to prove that pooled mining has become prevalent in the real world. The percentage of poolmined blocks increased from 49.91 to 91.12 within four months in Bitcoin and from 76.9 to 92.2 within five months in Ethereum. In July 2018 Bitcoin and Ethereum mining were dominated by only six and five pools respectively.</code> | |
|
|
| <code>Nonlocal Recoloring Algorithm for Color Vision Deficiencies with Naturalness and Detail Preserving.People with Color Vision Deficiencies CVD may have difficulty in recognizing and communicating color information especially in the multimedia era. In this paper we proposed a recoloring algorithm to enhance visual perception of people with CVD. In the algorithm color modification for color blindness is conducted in HSV color space under three constraints detail naturalness and authenticity. A new nonlocal recoloring method is used for preserving details. Subjective experiments were conducted among normal vision subjects and color blind subjects. Experimental results show that our algorithm is robust detail preserving and maintains naturalness. Source codes are freely available to noncommercial users at the website httpsdoi.org10.6084m9.figshare.9742337.v2.</code> | <code>Improving Color Discrimination for Color Vision Deficiency CVD with TemporalDomain Modulation.Color Vision Deficiency CVD is often characterized by the inability to distinguish color due to a defective or missing cone in the eye. Although it is possible to modify the observed color to make it easier for users to distinguish this can lead to color confusion with unaffected colors. To address this problem we investigate how flicker can assist distinguishing colors for CVD patients. In preliminary study we evaluated the efficiency of color and brightness modulation with 4 participants with normal vision. Our findings suggests that while brightness modulation was ineffective color modulation can help users distinguish between different colors.</code> | <code>Effects of Brownfield Remediation on Total Gaseous Mercury Concentrations in an Urban Landscape.In order to obtain a better perspective of the impacts of brownfields on the landatmosphere exchange of mercury in urban areas total gaseous mercury TGM was measured at two heights 1.8 m and 42.7 m prior to 20112012 and after 20152016 for the remediation of a brownfield and installation of a parking lot adjacent to the Syracuse Center of Excellence in Syracuse NY USA. Prior to brownfield remediation the annual average TGM concentrations were 1.6 0.6 and 1.4 0.4 ng m 3 at the ground and upper heights respectively. After brownfield remediation the annual average TGM concentrations decreased by 32 and 22 at the ground and the upper height respectively. Mercury soil flux measurements during summer after remediation showed net TGM deposition of 1.7 ng m 2 day 1 suggesting that the site transitioned from a mercury source to a net mercury sink. Measurements from the Atmospheric Mercury Netw...</code> | |
|
|
| <code>Named Entity Recognition for Nepali Language.Named Entity Recognition NER has been studied for many languages like English German Spanish and others but virtually no studies have focused on the Nepali language. One key reason is the lack of an appropriate annotated dataset. In this paper we describe a Nepali NER dataset that we created. We discuss and compare the performance of various machine learning models on this dataset. We also propose a novel NER scheme for Nepali and show that this scheme based on graphemelevel representations outperforms characterlevel representations when combined with BiLSTM models. Our best models obtain an overall F1 score of 86.89 which is a significant improvement on previously reported performance in literature.</code> | <code>Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer Features.Named entity recognition NER is a fundamental step for many natural language processing tasks and hence enhancing the performance of NER models is always appreciated. With limited resources being available NER for SouthEast Asian languages like Telugu is quite a challenging problem. This paper attempts to improve the NER performance for Telugu using gazetteerrelated features which are automatically generated using Wikipedia pages. We make use of these gazetteer features along with other wellknown features like contextual wordlevel and corpus features to build NER models. NER models are developed using three wellknown classifiersconditional random field CRF support vector machine SVM and margin infused relaxed algorithms MIRA. The gazetteer features are shown to improve the performance and theMIRAbased NER model fared better than its counterparts SVM and CRF.</code> | <code>Using Inversionmode MOS Varactors and 3port Inductor in 018µm CMOS Voltage Controlled Oscillator.This paper presents a RF voltage controlled oscillator VCO using inversionmode MOS varactors and 3port inductors to achieve low power consumption low phase noise broad tuning range and minimized chip size. The proposed circuit architecture using bodybiased technique operates from 4.3 to 5 GHz with 20.8 tuning range. The measured phase noise is less than 125.34 dBc at a displacement frequency of 1 MHz. The power consumption of this VCO is 25 mW when biased at 1.8 V. This VCO was implemented in standard TSMC 0.18µm 1P6M process. The chip size is 0.476 mm2 including the pads which is only 63 comparing with an identical VCO using TSMC inductor model.</code> | |
|
|
* Loss: [<code>CachedMultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedmultiplenegativesrankingloss) with these parameters: |
|
|
```json |
|
|
{ |
|
|
"scale": 20.0, |
|
|
"similarity_fct": "cos_sim", |
|
|
"mini_batch_size": 16, |
|
|
"gather_across_devices": false |
|
|
} |
|
|
``` |
|
|
|
|
|
### Training Hyperparameters |
|
|
#### Non-Default Hyperparameters |
|
|
|
|
|
- `eval_strategy`: steps |
|
|
- `per_device_train_batch_size`: 128 |
|
|
- `per_device_eval_batch_size`: 16 |
|
|
- `num_train_epochs`: 1 |
|
|
- `warmup_ratio`: 0.1 |
|
|
- `fp16`: True |
|
|
- `batch_sampler`: no_duplicates |
|
|
|
|
|
#### All Hyperparameters |
|
|
<details><summary>Click to expand</summary> |
|
|
|
|
|
- `overwrite_output_dir`: False |
|
|
- `do_predict`: False |
|
|
- `eval_strategy`: steps |
|
|
- `prediction_loss_only`: True |
|
|
- `per_device_train_batch_size`: 128 |
|
|
- `per_device_eval_batch_size`: 16 |
|
|
- `per_gpu_train_batch_size`: None |
|
|
- `per_gpu_eval_batch_size`: None |
|
|
- `gradient_accumulation_steps`: 1 |
|
|
- `eval_accumulation_steps`: None |
|
|
- `torch_empty_cache_steps`: None |
|
|
- `learning_rate`: 5e-05 |
|
|
- `weight_decay`: 0.0 |
|
|
- `adam_beta1`: 0.9 |
|
|
- `adam_beta2`: 0.999 |
|
|
- `adam_epsilon`: 1e-08 |
|
|
- `max_grad_norm`: 1.0 |
|
|
- `num_train_epochs`: 1 |
|
|
- `max_steps`: -1 |
|
|
- `lr_scheduler_type`: linear |
|
|
- `lr_scheduler_kwargs`: {} |
|
|
- `warmup_ratio`: 0.1 |
|
|
- `warmup_steps`: 0 |
|
|
- `log_level`: passive |
|
|
- `log_level_replica`: warning |
|
|
- `log_on_each_node`: True |
|
|
- `logging_nan_inf_filter`: True |
|
|
- `save_safetensors`: True |
|
|
- `save_on_each_node`: False |
|
|
- `save_only_model`: False |
|
|
- `restore_callback_states_from_checkpoint`: False |
|
|
- `no_cuda`: False |
|
|
- `use_cpu`: False |
|
|
- `use_mps_device`: False |
|
|
- `seed`: 42 |
|
|
- `data_seed`: None |
|
|
- `jit_mode_eval`: False |
|
|
- `use_ipex`: False |
|
|
- `bf16`: False |
|
|
- `fp16`: True |
|
|
- `fp16_opt_level`: O1 |
|
|
- `half_precision_backend`: auto |
|
|
- `bf16_full_eval`: False |
|
|
- `fp16_full_eval`: False |
|
|
- `tf32`: None |
|
|
- `local_rank`: 0 |
|
|
- `ddp_backend`: None |
|
|
- `tpu_num_cores`: None |
|
|
- `tpu_metrics_debug`: False |
|
|
- `debug`: [] |
|
|
- `dataloader_drop_last`: False |
|
|
- `dataloader_num_workers`: 0 |
|
|
- `dataloader_prefetch_factor`: None |
|
|
- `past_index`: -1 |
|
|
- `disable_tqdm`: False |
|
|
- `remove_unused_columns`: True |
|
|
- `label_names`: None |
|
|
- `load_best_model_at_end`: False |
|
|
- `ignore_data_skip`: False |
|
|
- `fsdp`: [] |
|
|
- `fsdp_min_num_params`: 0 |
|
|
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} |
|
|
- `fsdp_transformer_layer_cls_to_wrap`: None |
|
|
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} |
|
|
- `parallelism_config`: None |
|
|
- `deepspeed`: None |
|
|
- `label_smoothing_factor`: 0.0 |
|
|
- `optim`: adamw_torch_fused |
|
|
- `optim_args`: None |
|
|
- `adafactor`: False |
|
|
- `group_by_length`: False |
|
|
- `length_column_name`: length |
|
|
- `ddp_find_unused_parameters`: None |
|
|
- `ddp_bucket_cap_mb`: None |
|
|
- `ddp_broadcast_buffers`: False |
|
|
- `dataloader_pin_memory`: True |
|
|
- `dataloader_persistent_workers`: False |
|
|
- `skip_memory_metrics`: True |
|
|
- `use_legacy_prediction_loop`: False |
|
|
- `push_to_hub`: False |
|
|
- `resume_from_checkpoint`: None |
|
|
- `hub_model_id`: None |
|
|
- `hub_strategy`: every_save |
|
|
- `hub_private_repo`: None |
|
|
- `hub_always_push`: False |
|
|
- `hub_revision`: None |
|
|
- `gradient_checkpointing`: False |
|
|
- `gradient_checkpointing_kwargs`: None |
|
|
- `include_inputs_for_metrics`: False |
|
|
- `include_for_metrics`: [] |
|
|
- `eval_do_concat_batches`: True |
|
|
- `fp16_backend`: auto |
|
|
- `push_to_hub_model_id`: None |
|
|
- `push_to_hub_organization`: None |
|
|
- `mp_parameters`: |
|
|
- `auto_find_batch_size`: False |
|
|
- `full_determinism`: False |
|
|
- `torchdynamo`: None |
|
|
- `ray_scope`: last |
|
|
- `ddp_timeout`: 1800 |
|
|
- `torch_compile`: False |
|
|
- `torch_compile_backend`: None |
|
|
- `torch_compile_mode`: None |
|
|
- `include_tokens_per_second`: False |
|
|
- `include_num_input_tokens_seen`: False |
|
|
- `neftune_noise_alpha`: None |
|
|
- `optim_target_modules`: None |
|
|
- `batch_eval_metrics`: False |
|
|
- `eval_on_start`: False |
|
|
- `use_liger_kernel`: False |
|
|
- `liger_kernel_config`: None |
|
|
- `eval_use_gather_object`: False |
|
|
- `average_tokens_across_devices`: False |
|
|
- `prompts`: None |
|
|
- `batch_sampler`: no_duplicates |
|
|
- `multi_dataset_batch_sampler`: proportional |
|
|
- `router_mapping`: {} |
|
|
- `learning_rate_mapping`: {} |
|
|
|
|
|
</details> |
|
|
|
|
|
### Training Logs |
|
|
| Epoch | Step | Training Loss | Validation Loss | dblp-aminer-50k-dev_cosine_accuracy | dblp-aminer-50k-test_cosine_accuracy | |
|
|
|:------:|:----:|:-------------:|:---------------:|:-----------------------------------:|:------------------------------------:| |
|
|
| -1 | -1 | - | - | 1.0 | - | |
|
|
| 0.2725 | 100 | 0.223 | 0.0166 | 1.0 | - | |
|
|
| 0.5450 | 200 | 0.0699 | 0.0208 | 1.0 | - | |
|
|
| 0.8174 | 300 | 0.0267 | 0.0196 | 1.0 | - | |
|
|
| -1 | -1 | - | - | - | 1.0 | |
|
|
|
|
|
|
|
|
### Framework Versions |
|
|
- Python: 3.11.4 |
|
|
- Sentence Transformers: 5.1.1 |
|
|
- Transformers: 4.56.2 |
|
|
- PyTorch: 2.8.0+cu128 |
|
|
- Accelerate: 1.10.1 |
|
|
- Datasets: 4.1.1 |
|
|
- Tokenizers: 0.22.1 |
|
|
|
|
|
## Citation |
|
|
|
|
|
### BibTeX |
|
|
|
|
|
#### Sentence Transformers |
|
|
```bibtex |
|
|
@inproceedings{reimers-2019-sentence-bert, |
|
|
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", |
|
|
author = "Reimers, Nils and Gurevych, Iryna", |
|
|
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", |
|
|
month = "11", |
|
|
year = "2019", |
|
|
publisher = "Association for Computational Linguistics", |
|
|
url = "https://arxiv.org/abs/1908.10084", |
|
|
} |
|
|
``` |
|
|
|
|
|
#### CachedMultipleNegativesRankingLoss |
|
|
```bibtex |
|
|
@misc{gao2021scaling, |
|
|
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup}, |
|
|
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan}, |
|
|
year={2021}, |
|
|
eprint={2101.06983}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.LG} |
|
|
} |
|
|
``` |
|
|
|
|
|
<!-- |
|
|
## Glossary |
|
|
|
|
|
*Clearly define terms in order to be accessible across audiences.* |
|
|
--> |
|
|
|
|
|
<!-- |
|
|
## Model Card Authors |
|
|
|
|
|
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* |
|
|
--> |
|
|
|
|
|
<!-- |
|
|
## Model Card Contact |
|
|
|
|
|
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* |
|
|
--> |