|
--- |
|
base_model: sentence-transformers/all-mpnet-base-v2 |
|
library_name: sentence-transformers |
|
pipeline_tag: sentence-similarity |
|
tags: |
|
- sentence-transformers |
|
- sentence-similarity |
|
- feature-extraction |
|
- generated_from_trainer |
|
- dataset_size:5579240 |
|
- loss:CachedMultipleNegativesRankingLoss |
|
widget: |
|
- source_sentence: Program Coordinator RN |
|
sentences: |
|
- >- |
|
discuss the medical history of the healthcare user, evidence-based approach |
|
in general practice, apply various lifting techniques, establish daily |
|
priorities, manage time, demonstrate disciplinary expertise, tolerate |
|
sitting for long periods, think critically, provide professional care in |
|
nursing, attend meetings, represent union members, nursing science, manage a |
|
multidisciplinary team involved in patient care, implement nursing care, |
|
customer service, work under supervision in care, keep up-to-date with |
|
training subjects, evidence-based nursing care, operate lifting equipment, |
|
follow code of ethics for biomedical practices, coordinate care, provide |
|
learning support in healthcare |
|
- >- |
|
provide written content, prepare visual data, design computer network, |
|
deliver visual presentation of data, communication, operate relational |
|
database management system, ICT communications protocols, document |
|
management, use threading techniques, search engines, computer science, |
|
analyse network bandwidth requirements, analyse network configuration and |
|
performance, develop architectural plans, conduct ICT code review, hardware |
|
architectures, computer engineering, video-games functionalities, conduct |
|
web searches, use databases, use online tools to collaborate |
|
- >- |
|
nursing science, administer appointments, administrative tasks in a medical |
|
environment, intravenous infusion, plan nursing care, prepare intravenous |
|
packs, work with nursing staff, supervise nursing staff, clinical perfusion |
|
- source_sentence: Director of Federal Business Development and Capture Mgmt |
|
sentences: |
|
- >- |
|
develop business plans, strive for company growth, develop personal skills, |
|
channel marketing, prepare financial projections, perform market research, |
|
identify new business opportunities, market research, maintain relationship |
|
with customers, manage government funding, achieve sales targets, build |
|
business relationships, expand the network of providers, make decisions, |
|
guarantee customer satisfaction, collaborate in the development of marketing |
|
strategies, analyse business plans, think analytically, develop revenue |
|
generation strategies, health care legislation, align efforts towards |
|
business development, assume responsibility, solve problems, deliver |
|
business research proposals, identify potential markets for companies |
|
- >- |
|
operate warehouse materials, goods transported from warehouse facilities, |
|
organise social work packages, coordinate orders from various suppliers, |
|
warehouse operations, work in assembly line teams, work in a logistics team, |
|
footwear materials |
|
- >- |
|
manufacturing plant equipment, use hand tools, assemble hardware components, |
|
use traditional toolbox tools, perform product testing, control panel |
|
components, perform pre-assembly quality checks, oversee equipment |
|
operation, assemble mechatronic units, arrange equipment repairs, assemble |
|
machines, build machines, resolve equipment malfunctions, electromechanics, |
|
develop assembly instructions, install hydraulic systems, revise quality |
|
control systems documentation, detect product defects, operate hydraulic |
|
machinery controls, show an exemplary leading role in an organisation, |
|
assemble manufactured pipeline parts, types of pallets, perform office |
|
routine activities, conform with production requirements, comply with |
|
quality standards related to healthcare practice |
|
- source_sentence: director of production |
|
sentences: |
|
- >- |
|
use customer relationship management software, sales strategies, create |
|
project specifications, document project progress, attend trade fairs, |
|
building automation, sales department processes, work independently, develop |
|
account strategy, build business relationships, facilitate the bidding |
|
process, close sales at auction, satisfy technical requirements, |
|
results-based management, achieve sales targets, manage sales teams, liaise |
|
with specialist contractors for well operations, sales activities, use sales |
|
forecasting softwares, guarantee customer satisfaction, integrate building |
|
requirements in the architectural design, participate actively in civic |
|
life, customer relationship management, implement sales strategies |
|
- >- |
|
translate strategy into operation, lead the brand strategic planning |
|
process, assist in developing marketing campaigns, implement sales |
|
strategies, sales promotion techniques, negotiate with employment agencies, |
|
perform market research, communicate with customers, develop media strategy, |
|
change power distribution systems, beverage products, project management, |
|
provide advertisement samples, devise military tactics, use microsoft |
|
office, market analysis, manage sales teams, create brand guidelines, brand |
|
marketing techniques, use sales forecasting softwares, supervise brand |
|
management, analyse packaging requirements, provide written content, hand |
|
out product samples, channel marketing |
|
- >- |
|
use microsoft office, use scripting programming, build team spirit, operate |
|
games, production processes, create project specifications, analyse |
|
production processes for improvement, manage production enterprise, Agile |
|
development, apply basic programming skills, document project progress, |
|
supervise game operations, work to develop physical ability to perform at |
|
the highest level in sport, fix meetings, office software, enhance |
|
production workflow, manage a team, set production KPI, manage commercial |
|
risks, work in teams, teamwork principles, address identified risks, meet |
|
deadlines, consult with production director |
|
- source_sentence: Nursing Assistant |
|
sentences: |
|
- >- |
|
supervise medical residents, observe healthcare users, provide domestic |
|
care, prepare health documentation, position patients undergoing |
|
interventions, work with broad variety of personalities, supervise food in |
|
healthcare, tend to elderly people, monitor patient's vital signs, transfer |
|
patients, show empathy, provide in-home support for disabled individuals, |
|
hygiene in a health care setting, supervise housekeeping operations, perform |
|
cleaning duties, monitor patient's health condition, provide basic support |
|
to patients, work with nursing staff, involve service users and carers in |
|
care planning, use electronic health records management system, arrange |
|
in-home services for patients, provide nursing care in community settings , |
|
work in shifts, supervise nursing staff |
|
- >- |
|
manage relationships with stakeholders, use microsoft office, maintain |
|
records of financial transactions, software components suppliers, tools for |
|
software configuration management, attend to detail, keep track of expenses, |
|
build business relationships, issue sales invoices, financial department |
|
processes, supplier management, process payments, perform records |
|
management, manage standard enterprise resource planning system |
|
- >- |
|
inspect quality of products, apply HACCP, test package, follow verbal |
|
instructions, laboratory equipment, assist in the production of laboratory |
|
documentation, ensure quality control in packaging, develop food safety |
|
programmes, packaging engineering, appropriate packaging of dangerous goods, |
|
maintain laboratory equipment, SAP Data Services, calibrate laboratory |
|
equipment, analyse packaging requirements, write English |
|
- source_sentence: Branch Manager |
|
sentences: |
|
- >- |
|
support employability of people with disabilities, schedule shifts, issue |
|
licences, funding methods, maintain correspondence records, computer |
|
equipment, decide on providing funds, tend filing machine, use microsoft |
|
office, lift stacks of paper, transport office equipment, tend to guests |
|
with special needs, provide written content, foreign affairs policy |
|
development, provide charity services, philanthropy, maintain financial |
|
records, meet deadlines, manage fundraising activities, assist individuals |
|
with disabilities in community activities, report on grants, prepare |
|
compliance documents, manage grant applications, tolerate sitting for long |
|
periods, follow work schedule |
|
- >- |
|
cook pastry products, create new recipes, food service operations, assess |
|
shelf life of food products, apply requirements concerning manufacturing of |
|
food and beverages, food waste monitoring systems, maintain work area |
|
cleanliness, comply with food safety and hygiene, coordinate catering, |
|
maintain store cleanliness, work according to recipe, health, safety and |
|
hygiene legislation, install refrigeration equipment, prepare desserts, |
|
measure precise food processing operations, conform with production |
|
requirements, work in an organised manner, demand excellence from |
|
performers, refrigerants, attend to detail, ensure food quality, manufacture |
|
prepared meals |
|
- >- |
|
teamwork principles, office administration, delegate responsibilities, |
|
create banking accounts, manage alarm system, make independent operating |
|
decisions, use microsoft office, offer financial services, ensure proper |
|
document management, own management skills, use spreadsheets software, |
|
manage cash flow, integrate community outreach, manage time, perform |
|
multiple tasks at the same time, carry out calculations, assess customer |
|
credibility, maintain customer service, team building, digitise documents, |
|
promote financial products, communication, assist customers, follow |
|
procedures in the event of an alarm, office equipment |
|
license: mit |
|
language: |
|
- en |
|
--- |
|
|
|
# SentenceTransformer based on sentence-transformers/all-mpnet-base-v2 |
|
|
|
This is a [sentence-transformers](https://www.SBERT.net) model specifically trained for job title matching and similarity. It's finetuned from [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) on a large dataset of job titles and their associated skills/requirements. The model maps job titles and descriptions to a 1024-dimensional dense vector space and can be used for semantic job title matching, job similarity search, and related HR/recruitment tasks. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
- **Model Type:** Sentence Transformer |
|
- **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) |
|
- **Maximum Sequence Length:** 64 tokens |
|
- **Output Dimensionality:** 1024 tokens |
|
- **Similarity Function:** Cosine Similarity |
|
- **Training Dataset:** 5.5M+ job title - skills pairs |
|
- **Primary Use Case:** Job title matching and similarity |
|
- **Performance:** Achieves 0.6457 MAP on TalentCLEF benchmark |
|
|
|
### Model Sources |
|
|
|
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net) |
|
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) |
|
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) |
|
|
|
### Full Model Architecture |
|
|
|
``` |
|
SentenceTransformer( |
|
(0): Transformer({'max_seq_length': 64, 'do_lower_case': False}) with Transformer model: MPNetModel |
|
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) |
|
(2): Asym( |
|
(anchor-0): Dense({'in_features': 768, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'}) |
|
(positive-0): Dense({'in_features': 768, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'}) |
|
) |
|
) |
|
``` |
|
|
|
## Usage |
|
|
|
### Direct Usage (Sentence Transformers) |
|
|
|
First install the required packages: |
|
|
|
```bash |
|
pip install -U sentence-transformers |
|
``` |
|
|
|
Then you can load and use the model with the following code: |
|
|
|
```python |
|
import torch |
|
import numpy as np |
|
from tqdm.auto import tqdm |
|
from sentence_transformers import SentenceTransformer |
|
from sentence_transformers.util import batch_to_device, cos_sim |
|
|
|
# Load the model |
|
model = SentenceTransformer("TechWolf/JobBERT-v2") |
|
|
|
def encode_batch(jobbert_model, texts): |
|
features = jobbert_model.tokenize(texts) |
|
features = batch_to_device(features, jobbert_model.device) |
|
features["text_keys"] = ["anchor"] |
|
with torch.no_grad(): |
|
out_features = jobbert_model.forward(features) |
|
return out_features["sentence_embedding"].cpu().numpy() |
|
|
|
def encode(jobbert_model, texts, batch_size: int = 8): |
|
# Sort texts by length and keep track of original indices |
|
sorted_indices = np.argsort([len(text) for text in texts]) |
|
sorted_texts = [texts[i] for i in sorted_indices] |
|
|
|
embeddings = [] |
|
|
|
# Encode in batches |
|
for i in tqdm(range(0, len(sorted_texts), batch_size)): |
|
batch = sorted_texts[i:i+batch_size] |
|
embeddings.append(encode_batch(jobbert_model, batch)) |
|
|
|
# Concatenate embeddings and reorder to original indices |
|
sorted_embeddings = np.concatenate(embeddings) |
|
original_order = np.argsort(sorted_indices) |
|
return sorted_embeddings[original_order] |
|
|
|
# Example usage |
|
job_titles = [ |
|
'Software Engineer', |
|
'Senior Software Developer', |
|
'Product Manager', |
|
'Data Scientist' |
|
] |
|
|
|
# Get embeddings |
|
embeddings = encode(model, job_titles) |
|
|
|
# Calculate cosine similarity matrix |
|
similarities = cos_sim(embeddings, embeddings) |
|
print(similarities) |
|
``` |
|
|
|
The output will be a similarity matrix where each value represents the cosine similarity between two job titles: |
|
|
|
``` |
|
tensor([[1.0000, 0.8723, 0.4821, 0.5447], |
|
[0.8723, 1.0000, 0.4822, 0.5019], |
|
[0.4821, 0.4822, 1.0000, 0.4328], |
|
[0.5447, 0.5019, 0.4328, 1.0000]]) |
|
``` |
|
|
|
In this example: |
|
- The diagonal values are 1.0000 (perfect similarity with itself) |
|
- 'Software Engineer' and 'Senior Software Developer' have high similarity (0.8723) |
|
- 'Product Manager' and 'Data Scientist' show lower similarity with other roles |
|
- All values range between 0 and 1, where higher values indicate greater similarity |
|
|
|
### Example Use Cases |
|
|
|
1. **Job Title Matching**: Find similar job titles for standardization or matching |
|
2. **Job Search**: Match job seekers with relevant positions based on title similarity |
|
3. **HR Analytics**: Analyze job title patterns and similarities across organizations |
|
4. **Talent Management**: Identify similar roles for career development and succession planning |
|
|
|
## Training Details |
|
|
|
### Training Dataset |
|
|
|
#### generator |
|
- Dataset: 5.5M+ job title pairs |
|
- Format: Anchor job titles paired with related skills/requirements |
|
- Training objective: Learn semantic similarity between job titles and their associated skills |
|
- Loss: CachedMultipleNegativesRankingLoss with cosine similarity |
|
|
|
### Training Hyperparameters |
|
- Batch Size: 2048 |
|
- Learning Rate: 5e-05 |
|
- Epochs: 1 |
|
- FP16 Training: Enabled |
|
- Optimizer: AdamW |
|
|
|
### Framework Versions |
|
- Python: 3.9.19 |
|
- Sentence Transformers: 3.1.0 |
|
- Transformers: 4.44.2 |
|
- PyTorch: 2.4.1+cu118 |
|
- Accelerate: 0.34.2 |
|
- Datasets: 3.0.0 |
|
- Tokenizers: 0.19.1 |
|
|
|
## Citation |
|
|
|
### BibTeX |
|
|
|
#### Sentence Transformers |
|
```bibtex |
|
@inproceedings{reimers-2019-sentence-bert, |
|
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", |
|
author = "Reimers, Nils and Gurevych, Iryna", |
|
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", |
|
month = "11", |
|
year = "2019", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://arxiv.org/abs/1908.10084", |
|
} |
|
``` |
|
|
|
#### CachedMultipleNegativesRankingLoss |
|
```bibtex |
|
@misc{gao2021scaling, |
|
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup}, |
|
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan}, |
|
year={2021}, |
|
eprint={2101.06983}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.LG} |
|
} |
|
``` |