SentenceTransformer based on BAAI/bge-base-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'How do I set a password for my Shopify development store?',
    "**Viewing or setting the password**\n\n1.  From your Shopify admin, go to **Online Store > Preferences**.\n2.  In the **Password protection > Password** field, enter a password. This is the password you'll give to the visitors who you want to access your online store. Don't use the same password you use to log in to your admin.\n3.  Click **Save**.",
    "There are many ways to enhance a store’s functionality. Depending on your client’s needs, you can choose and customize a theme, migrate information from another platform, add products to the store, organize products into collections, set up payment and shipping information, among other options.\n\nFollow this [general checklist for starting a new Shopify store](https://help.shopify.com/en/manual/intro-to-shopify/initial-setup/new-to-shopify-checklists/general-checklist) to cover the basics.\n\nHere are some ways to further enhance a store:\n\n*   Migrate your client's product, customers, and orders data from another platform using a Store Importer app.\n*   Add multiple products at one time by [importing them with a CSV file](https://help.shopify.com/en/manual/products/import-export/import-products#importing-products-with-a-csv-file).\n*   [Customize a theme](https://help.shopify.com/en/manual/online-store/themes/theme-structure/extend/edit-theme-code#before-you-customize-your-theme) to better suit a client’s needs.\n*   [Buy a domain from Shopify](https://help.shopify.com/en/manual/domains/add-a-domain/buying-domains). You can help create a store's brand by using a custom domain.\n*   Create [pages](https://help.shopify.com/en/manual/online-store/themes/theme-structure/pages#add-a-new-webpage-to-your-online-store) that describe the store. Common page types include About Us, contact information, FAQs, and [policy pages](https://help.shopify.com/en/manual/checkout-settings/refund-privacy-tos) (returns, shipping, and privacy).\n*   Work with your client to develop high-quality images for their products. Review Shopify's tips for creating and uploading images.\n*   Review your client's [tax settings](https://help.shopify.com/en/manual/taxes) or help your client to contact an appropriate tax expert. Many merchants need to charge taxes on their sales and report and remit those taxes to their government.\n*   Add apps from the Shopify App Store. Apps let you quickly extend the functionality of the client's store.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8044, 0.4979],
#         [0.8044, 1.0000, 0.4777],
#         [0.4979, 0.4777, 1.0000]])

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.9012

Training Details

Training Dataset

Unnamed Dataset

  • Size: 729 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 729 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 9 tokens
    • mean: 15.9 tokens
    • max: 32 tokens
    • min: 12 tokens
    • mean: 183.15 tokens
    • max: 512 tokens
    • min: 13 tokens
    • mean: 176.28 tokens
    • max: 512 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    Which theme is available for users on the Shopify Starter plan? * This page specifies that users on the Shopify Starter plan can only utilize the "Spotlight theme."
    * It provides a step-by-step guide on how to add a theme from the Shopify Theme Store.
    * Step 1 instructs users to navigate to "Online Store" and then "Themes" within their Shopify admin dashboard.
    Where can I find free training courses for my Shopify business? Looking for free online training with industry experts? Visit Shopify Academy to get access to courses and workshops designed to help you build your business. The Partner Directory connects you with Shopify Partners who you can hire for complicated tasks related to building your business. When you contact a Partner, you're not committed to hiring them. You decide whether you want to work with them. Planning to contact a Partner: Before you engage with a Partner, it's important to prepare by clearly outlining the requirements for the work needed. Consider the following best practices when planning to hire a Partner: familiarize yourself with any business or technical terms relevant to the work that you need help with; conduct preliminary research to gather insights and examples from similar businesses; define your budget, timeline, and expected outcomes in detail to allow the Partner to provide an accurate quote. Finding Partners: To find a Partner, browse the Partner Directory and explore the available services. You can use filtering options such as acceptance of new clients, price range, services offered, location, industry, and languages s...
    How do expert Shopify sellers manage finances and control costs? As you grow, maintaining healthy profit margins is crucial. A trap some entrepreneurs fall into is overspending on shiny new tools or large ad campaigns without a clear return. It's important to regularly review your expenses – app subscriptions, advertising spend, shipping costs, packaging, etc. – and ensure you're getting value from each. For example, you might be paying for several apps that have overlapping functions; consider consolidating or removing ones you don't truly need. Negotiate with suppliers or service providers for better rates once your volume increases (this can apply to product suppliers, shipping carriers, or even app providers if you move to a higher tier). When it comes to ads, closely monitor performance and pause campaigns that are consistently unprofitable. Sometimes, taking a lean approach (almost a bootstrap mindset) even as you scale can save you from cash flow crunches. Avoid tying up too much cash in inventory – yes, you want to avoid stockouts, but overs... One common trait among successful store owners is the ability to focus on what truly moves the needle. It's easy to get sidetracked by chasing every new trend or spreading yourself thin across too many tasks. Instead, step back and evaluate which marketing channels or product lines are most profitable, and concentrate on optimizing and expanding those. For example, if you notice that most of your traffic and sales are coming from Instagram and email campaigns, make those channels a priority and perhaps scale back efforts on a less responsive channel. Periodically conduct a Pareto analysis (80/20 rule) for your business: identify the roughly 20% of efforts that drive 80% of your results, and ensure those are well-resourced.
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.COSINE",
        "triplet_margin": 0.5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 5
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step bge-shopify-eval_cosine_accuracy
1.0 46 0.8395
2.0 92 0.9012

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.2
  • PyTorch: 2.9.0+cu126
  • Accelerate: 1.12.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
12
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for khacdiep2208/bge-base-en-v1.5-shopify-finetuned

Finetuned
(429)
this model

Evaluation results