SentenceTransformer based on microsoft/unixcoder-base

This is a sentence-transformers model finetuned from microsoft/unixcoder-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: microsoft/unixcoder-base
Maximum Sequence Length: 256 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'RobertaModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Name: Arithmetic Slices | Code: // Time:  O(n)\n// Space: O(1)\n\nclass Solution {\npublic:\n    int numberOfArithmeticSlices(vector<int>& A) {\n        int res = 0, i = 0;\n        for (int i = 0; i + 2 < A.size(); ++i) {\n            const auto start = i;\n            while (i + 2 < A.size() && A[i + 2] + A[i] == 2 * A[i + 1]) {\n                res += (i++) - start + 1;\n            }\n        }\n        return res;\n    }\n};\n | Tags: Array,Dynamic Programming,Sliding Window',
    'Name: Arithmetic Subarrays | Code: // Time:  O(n * q)\n// Space: O(n)\n\nclass Solution {\npublic:\n    vector<bool> checkArithmeticSubarrays(vector<int>& nums, vector<int>& l, vector<int>& r) {\n        vector<bool> result(size(l));\n        for (int i = 0; i < size(l); ++i) {\n            result[i] = isArith(vector<int>(cbegin(nums) + l[i], cbegin(nums) + r[i] + 1));\n        }\n        return result;\n    }\n\nprivate:\n    bool isArith(const vector<int>& nums) {\n        unordered_set<int> lookup(cbegin(nums), cend(nums));\n        int mn = *min_element(cbegin(nums), cend(nums));\n        int mx = *max_element(cbegin(nums), cend(nums));\n        if (mx == mn) {\n            return true;\n        }\n        if ((mx - mn) % (size(nums) - 1)) {\n            return false;\n        }\n        int d = (mx - mn) / (size(nums) - 1);\n        for (int i = mn; i <= mx; i += d) {\n            if (!lookup.count(i)) {\n                return false;\n            }\n        }\n        return true;\n    }\n};\n | Tags: Array,Hash Table,Sorting',
    'Name: Reverse Odd Levels of Binary Tree | Code: // Time:  O(n)\n// Space: O(n)\n\n// bfs\nclass Solution {\npublic:\n    TreeNode* reverseOddLevels(TreeNode* root) {\n        vector<TreeNode*> q = {root};\n        for (int parity = 0; !empty(q); parity ^= 1) {\n            if (parity) {\n                for (int left = 0, right = size(q) - 1; left < right; ++left, --right) {\n                    swap(q[left]->val, q[right]->val);\n                }\n            }\n            if (!q[0]->left) {\n                break;\n            }\n            vector<TreeNode*> new_q;\n            for (const auto& node : q) {\n                new_q.emplace_back(node->left);\n                new_q.emplace_back(node->right);\n            }\n            q = move(new_q);\n        }\n        return root;\n    }\n};\n | Tags: Binary Tree,Breadth-First Search,Depth-First Search,Tree',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7905, 0.5824],
#         [0.7905, 1.0000, 0.6028],
#         [0.5824, 0.6028, 1.0000]])

Evaluation

Metrics

Semantic Similarity

Dataset: unixcoder_leetcode_eval
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.8845
spearman_cosine	0.884

Training Details

Training Dataset

Unnamed Dataset

Size: 4,088 training samples
Columns: text1, text2, and score
Approximate statistics based on the first 1000 samples:
text1 text2 score
type string string float
details
min: 51 tokens
mean: 224.9 tokens
max: 256 tokens

min: 77 tokens
mean: 226.23 tokens
max: 256 tokens

min: 0.14
mean: 0.43
max: 1.0

	text1	text2	score
type	string	string	float
details	min: 51 tokens mean: 224.9 tokens max: 256 tokens	min: 77 tokens mean: 226.23 tokens max: 256 tokens	min: 0.14 mean: 0.43 max: 1.0

Samples:

text1	text2	score
Name: As Far from Land as Possible \| Code: // Time: O(m * n) // Space: O(m * n) class Solution { public: int maxDistance(vector>& grid) { static const vector> directions{{0, 1}, {1, 0}, {0, -1}, {-1, 0}}; queue> q; for (int i = 0; i < grid.size(); ++i) { for (int j = 0; j < grid[i].size(); ++j) { if (grid[i][j]) { q.emplace(i, j); } } } if (q.size() == grid.size() * grid[0].size()) { return -1; } int level = -1; while (!q.empty()) { queue> next_q; while (!q.empty()) { const auto [x, y] = q.front(); q.pop(); for (const auto& [dx, dy] : directions) { const auto& nx = x + dx; const auto& ny = y + dy; if (!(0 <= nx && nx < grid.size() && ...	`Name: Maximum Manhattan Distance After K Changes \| Code: // Time: O(n) // Space: O(1) // greedy class Solution { public: int maxDistance(string s, int k) { int result = 0; for (int i = 0, x = 0, y = 0; i < size(s); ++i) { if (s[i] == 'E') { ++x; } else if (s[i] == 'W') { --x; } else if (s[i] == 'N') { ++y; } else if (s[i] == 'S') { --y; } result = max(result, min(abs(x) + abs(y) + 2 * k, i + 1)); } return result; } }; \| Tags: Counting,Hash Table,Math,String`	`0.3427242051079267`
Name: Wiggle Sort II \| Code: // Time: O(n) ~ O(n^2), O(n) on average. // Space: O(1) // Tri Partition (aka Dutch National Flag Problem) with virtual index solution. (44ms) class Solution { public: void wiggleSort(vector& nums) { int mid = (nums.size() - 1) / 2; nth_element(nums.begin(), nums.begin() + mid, nums.end()); // O(n) ~ O(n^2) time reversedTriPartitionWithVI(nums, nums[mid]); // O(n) time, O(1) space } void reversedTriPartitionWithVI(vector& nums, int val) { const int N = nums.size() / 2 * 2 + 1; #define Nums(i) nums[(1 + 2 * (i)) % N] for (int i = 0, j = 0, n = nums.size() - 1; j <= n;) { if (Nums(j) > val) { swap(Nums(i++), Nums(j++)); } else if (Nums(j) < val) { swap(Nums(j), Nums(n--)); } else { ++j; } } } }; // Time: O(n) ~ O(n^2) // Space: O(n) // Tri Partition (aka Dutch National Flag Pro...	Name: Array With Elements Not Equal to Average of Neighbors \| Code: // Time: O(n) ~ O(n^2), O(n) on average // Space: O(1) // Tri Partition (aka Dutch National Flag Problem) with virtual index solution class Solution { public: vector rearrangeArray(vector& nums) { int mid = (size(nums) - 1) / 2; nth_element(begin(nums), begin(nums) + mid, end(nums)); // O(n) ~ O(n^2) time reversedTriPartitionWithVI(nums, nums[mid]); // O(n) time, O(1) space return nums; } private: void reversedTriPartitionWithVI(vector& nums, int val) { const int N = size(nums) / 2 * 2 + 1; #define Nums(i) nums[(1 + 2 * (i)) % N] for (int i = 0, j = 0, n = size(nums) - 1; j <= n;) { if (Nums(j) > val) { swap(Nums(i++), Nums(j++)); } else if (Nums(j) < val) { swap(Nums(j), Nums(n--)); } else { ++j; } } } }; // Time: O(nlogn) ...	`0.7248856548541956`
Name: Minimum Time to Visit a Cell In a Grid \| Code: // Time: O(m * n * log(m * n)) // Space: O(m * n) // dijkstra's algorithm class Solution { public: int minimumTime(vector>& grid) { static const vector> DIRECTIONS = {{1, 0}, {0, 1}, {-1, 0}, {0, -1}}; if (min(grid[0][1], grid[1][0]) > 1) { return -1; } const auto& dijkstra = [&](const pair& start, const pair& target) { vector> best(size(grid), vector(size(grid[0]), numeric_limits::max())); best[start.first][start.second] = 0; using Data = tuple; priority_queue, greater> min_heap; min_heap.emplace(0, start.first, start.second); while (!empty(min_heap)) { const auto [curr, i, j] = min_heap.top(); min_heap.pop(); if (best[i][j] < curr) { continue; ...	Name: Sentence Similarity III \| Code: // Time: O(n) // Space: O(1) class Solution { public: bool areSentencesSimilar(string sentence1, string sentence2) { if (size(sentence1) > size(sentence2)) { swap(sentence1, sentence2); } int count = 0; for (int step = 0; step < 2; ++step) { for (int i = 0; i <= size(sentence1); ++i) { char c1 = i != size(sentence1) ? sentence1[step == 0 ? i : size(sentence1) - 1 - i] : ' '; char c2 = i != size(sentence2) ? sentence2[step == 0 ? i : size(sentence2) - 1 - i] : ' '; if (c1 != c2) { break; } if (c1 == ' ') { ++count; } } } return count >= count_if(cbegin(sentence1), cend(sentence1), [](char x) { return x == ' '; }) + 1; } }; \| Tags: Array,String,Two Pointers	`0.2964020586887101`

Loss: CoSENTLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "pairwise_cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
learning_rate: 2e-05
num_train_epochs: 2
warmup_steps: 102
use_cpu: True
data_seed: 42
remove_unused_columns: False
load_best_model_at_end: True
dataloader_pin_memory: False
gradient_checkpointing: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 8
per_device_eval_batch_size: 8
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 2
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.0
warmup_steps: 102
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: True
use_mps_device: False
seed: 42
data_seed: 42
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: False
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: False
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: True
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Epoch	Step	Training Loss	unixcoder_leetcode_eval_spearman_cosine
1.6634	850	2.6601	-
1.7613	900	2.5066	0.8875
1.8591	950	2.3788	-
1.9569	1000	2.34	0.8840

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.12.11
Sentence Transformers: 5.1.0
Transformers: 4.55.2
PyTorch: 2.8.0+cu126
Accelerate: 1.10.0
Datasets: 4.0.0
Tokenizers: 0.21.4

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}

Sid-the-sloth
/

leetcode_unixcoder_final

You need to agree to share your contact information to access this model