embeddinggemma-300m fine-tuned on German Search Categories

This is a sentence-transformers model finetuned from google/embeddinggemma-300m on the search-categories-german-triplets dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (4): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ktcapraz/embeddinggemma-300m-german-search-categories")
# Run inference
queries = [
    "Wann endet die Verhandlungsfrist f\u00fcr Spielerlisten 2024?",
]
documents = [
    'A broad, general question, related to a specific time.',
    'A specific question about a person or entity, with no time constraint.',
    'A keyword-based search for a broad topic, with no time constraint.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.7721, 0.3525, 0.1172]])

Evaluation

Metrics

Information Retrieval

Metric german-search-cats-eval german-search-cats-test
cosine_accuracy@1 0.1989 0.7507
cosine_accuracy@3 0.5265 0.9884
cosine_accuracy@5 0.7575 0.9996
cosine_accuracy@10 1.0 1.0
cosine_precision@1 0.1989 0.7507
cosine_precision@3 0.1755 0.3295
cosine_precision@5 0.1515 0.1999
cosine_precision@10 0.1 0.1
cosine_recall@1 0.1989 0.7507
cosine_recall@3 0.5265 0.9884
cosine_recall@5 0.7575 0.9996
cosine_recall@10 1.0 1.0
cosine_ndcg@10 0.5624 0.9023
cosine_mrr@10 0.4255 0.8681
cosine_map@100 0.4255 0.8681

Information Retrieval

Metric Value
cosine_accuracy@1 0.7544
cosine_accuracy@3 0.988
cosine_accuracy@5 0.9997
cosine_accuracy@10 1.0
cosine_precision@1 0.7544
cosine_precision@3 0.3293
cosine_precision@5 0.1999
cosine_precision@10 0.1
cosine_recall@1 0.7544
cosine_recall@3 0.988
cosine_recall@5 0.9997
cosine_recall@10 1.0
cosine_ndcg@10 0.9035
cosine_mrr@10 0.8698
cosine_map@100 0.8698

Training Details

Training Dataset

search-categories-german-triplets

  • Dataset: search-categories-german-triplets at bbf15f6
  • Size: 73,141 training samples
  • Columns: anchor, positive, and negatives
  • Approximate statistics based on the first 1000 samples:
    anchor positive negatives
    type string string list
    details
    • min: 4 tokens
    • mean: 12.97 tokens
    • max: 25 tokens
    • min: 13 tokens
    • mean: 16.08 tokens
    • max: 18 tokens
    • size: 3 elements
  • Samples:
    anchor positive negatives
    Fleischerzeugung Umweltkosten Deutschland A keyword-based search for a broad topic, with no time constraint. ['A keyword-based search for a broad topic, related to a specific time.', 'A broad, general question, with no time constraint.', 'A specific keyword search for a person or entity, with no time constraint.']
    Schule Rösrath Mpox Schließung A specific keyword search for a person or entity, with no time constraint. ['A specific keyword search for a person or entity, related to a specific time.', 'A specific question about a person or entity, with no time constraint.', 'A keyword-based search for a broad topic, with no time constraint.']
    Was ändert sich bei der Passbeantragung ab Mai 2024? A broad, general question, related to a specific time. ['A keyword-based search for a broad topic, related to a specific time.', 'A broad, general question, with no time constraint.', 'A specific question about a person or entity, related to a specific time.']
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 8,
        "gather_across_devices": false
    }
    

Evaluation Dataset

search-categories-german-triplets

  • Dataset: search-categories-german-triplets at bbf15f6
  • Size: 15,674 evaluation samples
  • Columns: anchor, positive, and negatives
  • Approximate statistics based on the first 1000 samples:
    anchor positive negatives
    type string string list
    details
    • min: 4 tokens
    • mean: 12.59 tokens
    • max: 26 tokens
    • min: 13 tokens
    • mean: 16.21 tokens
    • max: 18 tokens
    • size: 3 elements
  • Samples:
    anchor positive negatives
    Weihnachtsgeschäft Einzelhandel November Dezember 2024 A specific keyword search for a person or entity, related to a specific time. ['A keyword-based search for a broad topic, related to a specific time.', 'A specific keyword search for a person or entity, with no time constraint.', 'A specific question about a person or entity, related to a specific time.']
    Wie reagiert die Union auf Mützenichs Vorstoß? A specific question about a person or entity, with no time constraint. ['A specific keyword search for a person or entity, with no time constraint.', 'A specific question about a person or entity, related to a specific time.', 'A broad, general question, with no time constraint.']
    Al-Manar TV Sperrverfügung Telekom A specific keyword search for a person or entity, with no time constraint. ['A specific keyword search for a person or entity, related to a specific time.', 'A specific question about a person or entity, with no time constraint.', 'A keyword-based search for a broad topic, with no time constraint.']
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 8,
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • learning_rate: 1e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • prompts: {'anchor': 'task: search result | query: ', 'positive': 'task: classification | query: '}
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: {'anchor': 'task: search result | query: ', 'positive': 'task: classification | query: '}
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss german-search-cats-eval_cosine_ndcg@10 german-search-cats-test_cosine_ndcg@10
-1 -1 - - 0.5624 -
0.0350 20 1.8489 - - -
0.0699 40 0.9438 - - -
-1 -1 - - 0.8633 -
0.0350 20 0.6601 - - -
0.0699 40 0.4921 - - -
0.1049 60 0.6323 - - -
0.1399 80 0.8749 - - -
0.1748 100 0.755 0.6962 0.8705 -
0.2098 120 0.6572 - - -
0.2448 140 0.7488 - - -
0.2797 160 0.7288 - - -
0.3147 180 0.9539 - - -
0.3497 200 0.7782 0.6118 0.8888 -
0.3846 220 0.5965 - - -
0.4196 240 0.6789 - - -
0.4545 260 0.6537 - - -
0.4895 280 0.6572 - - -
0.5245 300 0.6221 0.5587 0.8966 -
0.5594 320 0.5079 - - -
0.5944 340 0.5949 - - -
0.6294 360 0.6828 - - -
0.6643 380 0.6628 - - -
0.6993 400 0.6654 0.5398 0.8986 -
0.7343 420 0.5939 - - -
0.7692 440 0.6976 - - -
0.8042 460 0.6453 - - -
0.8392 480 0.5204 - - -
0.8741 500 0.5709 0.5383 0.9035 -
0.9091 520 0.5452 - - -
0.9441 540 0.635 - - -
0.9790 560 0.8884 - - -
-1 -1 - - - 0.9023

Framework Versions

  • Python: 3.10.18
  • Sentence Transformers: 5.1.0
  • Transformers: 4.56.1
  • PyTorch: 2.8.0+cu128
  • Accelerate: 1.10.1
  • Datasets: 4.1.0
  • Tokenizers: 0.22.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
34
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ktcapraz/embeddinggemma-300m-german-search-categories

Finetuned
(107)
this model

Evaluation results