πŸ¦‰ CodeSearch-ModernBERT-Owl-Plus: High-Performance Sentence-BERT for Code Search

CodeSearch-ModernBERT-Owl-Plus is a high-performance code search model fine-tuned in a Sentence-BERT architecture, based on the pretrained CodeModernBERT-Owl v1.0.

This model is optimized for function-level search within codebases and natural language queries, achieving state-of-the-art results on the MTEB benchmark.


πŸ›  Features

  • βœ… Fine-tuned in Sentence-BERT format from CodeModernBERT-Owl
  • βœ… Supports multiple languages (Python, Java, JavaScript, etc.)
  • βœ… Specialized encoder for high-accuracy code search
  • βœ… Ideal for multi-stage (dual encoder) retrieval setups
  • βœ… Generates rich semantic embeddings for code and queries

πŸ“Š Evaluation on MTEB Benchmark

πŸ† Main Scores in MTEB

This model achieved the following main scores (based on NDCG@10):

  • CodeSearchNetRetrieval: main_score = 0.8918
  • COIR-CodeSearchNetRetrieval: main_score = 0.8013

πŸ§ͺ CodeSearchNetRetrieval (MTEB)

Metric Score
MRR@10 0.8704
NDCG@10 0.8918
MAP@10 0.8704
Recall@10 0.9563
Precision@10 0.0956

This model achieves strong performance across all ranking metrics and demonstrates balanced retrieval capability.


πŸ§ͺ COIR-CodeSearchNetRetrieval (MTEB)

Metric Score
MRR@10 0.7751
NDCG@10 0.8013
MAP@10 0.7751
Recall@10 0.8826
Precision@10 0.0883

Robust and consistent performance is also maintained on the COIR dataset, demonstrating strong generalization.


πŸ“₯ Usage Example

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Shuu12121/CodeSearch-ModernBERT-Owl-Plus")
embeddings = model.encode(["binary search function", "def binary_search(arr, target): ..."])

πŸ“ Conclusion

  • βœ… An optimized Sentence-BERT model based on CodeModernBERT-Owl
  • βœ… Achieves MRR@10 > 0.87 on MTEB CodeSearchNetRetrieval
  • βœ… Ready for integration in production-level code search systems

πŸ“œ License

πŸ“„ Apache-2.0

πŸ“§ Contact

For questions or inquiries, feel free to reach out: πŸ“§ shun0212114@outlook.jp

Downloads last month
27
Safetensors
Model size
152M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Shuu12121/CodeSearch-ModernBERT-Owl-Plus

Finetuned
(1)
this model

Dataset used to train Shuu12121/CodeSearch-ModernBERT-Owl-Plus